1
|
Furuya S, Liu J, Sun Z, Lu Q, Fletcher JM. Understanding Internal Migration: A Research Note Providing an Assessment of Migration Selection With Genetic Data. Demography 2023; 60:1631-1648. [PMID: 37937916 DOI: 10.1215/00703370-11053145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Migration is selective, resulting in inequalities between migrants and nonmigrants. However, investigating migration selection is empirically challenging because combined pre- and post-migration data are rarely available. We propose an alternative approach to assessing internal migration selection by integrating genetic data, enabling an investigation of migration selection with cross-sectional data collected post-migration. Using data from the UK Biobank, we utilized standard tools from statistical genetics to conduct a genome-wide association study (GWAS) for migration distance. We then calculated genetic correlations to compare GWAS results for migration with those for other characteristics. Given that individual genetics are determined at conception, these analyses allow a unique exploration of the association between pre-migration characteristics and migration. Results are generally consistent with the healthy migrant literature: genetics correlated with longer migration distance are associated with higher socioeconomic status and better health. We also extended the analysis to 53 traits and found novel correlations between migration and several physical health, mental health, personality, and sociodemographic traits.
Collapse
Affiliation(s)
- Shiro Furuya
- Department of Sociology, Center for Demography of Health and Aging, and Center for Demography and Ecology, University of Wisconsin-Madison, Madison, WI, USA
| | - Jihua Liu
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Zhongxuan Sun
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Qiongshi Lu
- Center for Demography of Health and Aging, Department of Biostatistics and Medical Informatics, and Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
| | - Jason M Fletcher
- Center for Demography of Health and Aging, Center for Demography and Ecology, La Follette School of Public Affairs, Department of Population Health Science, and Department of Agricultural and Applied Economics, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
2
|
Ponsero AJ, Miller M, Hurwitz BL. Comparison of k-mer-based de novo comparative metagenomic tools and approaches. MICROBIOME RESEARCH REPORTS 2023; 2:27. [PMID: 38058765 PMCID: PMC10696585 DOI: 10.20517/mrr.2023.26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/28/2023] [Accepted: 07/12/2023] [Indexed: 12/08/2023]
Abstract
Aim: Comparative metagenomic analysis requires measuring a pairwise similarity between metagenomes in the dataset. Reference-based methods that compute a beta-diversity distance between two metagenomes are highly dependent on the quality and completeness of the reference database, and their application on less studied microbiota can be challenging. On the other hand, de-novo comparative metagenomic methods only rely on the sequence composition of metagenomes to compare datasets. While each one of these approaches has its strengths and limitations, their comparison is currently limited. Methods: We developed sets of simulated short-reads metagenomes to (1) compare k-mer-based and taxonomy-based distances and evaluate the impact of technical and biological variables on these metrics and (2) evaluate the effect of k-mer sketching and filtering. We used a real-world metagenomic dataset to provide an overview of the currently available tools for de novo metagenomic comparative analysis. Results: Using simulated metagenomes of known composition and controlled error rate, we showed that k-mer-based distance metrics were well correlated to the taxonomic distance metric for quantitative Beta-diversity metrics, but the correlation was low for presence/absence distances. The community complexity in terms of taxa richness and the sequencing depth significantly affected the quality of the k-mer-based distances, while the impact of low amounts of sequence contamination and sequencing error was limited. Finally, we benchmarked currently available de-novo comparative metagenomic tools and compared their output on two datasets of fecal metagenomes and showed that most k-mer-based tools were able to recapitulate the data structure observed using taxonomic approaches. Conclusion: This study expands our understanding of the strength and limitations of k-mer-based de novo comparative metagenomic approaches and aims to provide concrete guidelines for researchers interested in applying these approaches to their metagenomic datasets.
Collapse
Affiliation(s)
- Alise Jany Ponsero
- Human Microbiome Research Program, University of Helsinki, Helsinki 00290, Finland
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
- BIO5 Institute, The University of Arizona, Tucson, AZ 85721, USA
| | - Matthew Miller
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
| | - Bonnie Louise Hurwitz
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ 85721, USA
- BIO5 Institute, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
3
|
Zhai H, Fukuyama J. A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed β-diversity measures. PLoS Comput Biol 2023; 19:e1010821. [PMID: 36608056 PMCID: PMC9879504 DOI: 10.1371/journal.pcbi.1010821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 01/26/2023] [Accepted: 12/16/2022] [Indexed: 01/07/2023] Open
Abstract
k-mer-based distances are often used to describe the differences between communities in metagenome sequencing studies because of their computational convenience and history of effectiveness. Although k-mer-based distances do not use information about taxon abundances, we show that one class of k-mer distances between metagenomes (the Euclidean distance between k-mer spectra, or EKS distances) are very closely related to a class of phylogenetically-informed β-diversity measures that do explicitly use both the taxon abundances and information about the phylogenetic relationships among the taxa. Furthermore, we show that both of these distances can be interpreted as using certain features of the taxon abundances that are related to the phylogenetic tree. Our results allow practitioners to perform phylogenetically-informed analyses when they only have k-mer data available and provide a theoretical basis for using k-mer spectra with relatively small values of k (on the order of 4-5). They are also useful for analysts who wish to know more of the properties of any method based on k-mer spectra and provide insight into one class of phylogenetically-informed β-diversity measures.
Collapse
Affiliation(s)
- Hongxuan Zhai
- Department of Statistics, Indiana University Bloomington, Bloomington, Indiana, United States of America
| | - Julia Fukuyama
- Department of Statistics, Indiana University Bloomington, Bloomington, Indiana, United States of America
| |
Collapse
|
4
|
Ahrens KF, Neumann RJ, von Werthern NM, Kranz TM, Kollmann B, Mattes B, Puhlmann LMC, Weichert D, Lutz B, Basten U, Fiebach CJ, Wessa M, Kalisch R, Lieb K, Chiocchetti AG, Tüscher O, Reif A, Plichta MM. Association of polygenic risk scores and hair cortisol with mental health trajectories during COVID lockdown. Transl Psychiatry 2022; 12:396. [PMID: 36130942 PMCID: PMC9490720 DOI: 10.1038/s41398-022-02165-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/06/2022] [Accepted: 09/08/2022] [Indexed: 11/25/2022] Open
Abstract
The COVID-19 pandemic is a global stressor with inter-individually differing influences on mental health trajectories. Polygenic Risk Scores (PRSs) for psychiatric phenotypes are associated with individual mental health predispositions. Elevated hair cortisol concentrations (HCC) and high PRSs are related to negative mental health outcomes. We analyzed whether PRSs and HCC are related to different mental health trajectories during the first COVID lockdown in Germany. Among 523 participants selected from the longitudinal resilience assessment study (LORA), we previously reported three subgroups (acute dysfunction, delayed dysfunction, resilient) based on weekly mental health (GHQ-28) assessment during COVID lockdown. DNA from blood was collected at the baseline of the original LORA study (n = 364) and used to calculate the PRSs of 12 different psychopathological phenotypes. An explorative bifactor model with Schmid-Leiman transformation was calculated to extract a general genetic factor for psychiatric disorders. Hair samples were collected quarterly prior to the pandemic for determining HCC (n = 192). Bivariate logistic regressions were performed to test the associations of HCC and the PRS factors with the reported trajectories. The bifactor model revealed 1 general factor and 4 sub-factors. Results indicate a significant association between increased values on the general risk factor and the allocation to the acute dysfunction class. The same was found for elevated HCC and the exploratorily tested sub-factor "childhood-onset neurodevelopmental disorders". Genetic risk and long-term cortisol secretion as a potential indicator of stress, indicated by PRSs and HCC, respectively, predicted different mental health trajectories. Results indicate a potential for future studies on risk prediction.
Collapse
Affiliation(s)
- Kira F. Ahrens
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt, Germany
| | - Rebecca J. Neumann
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt, Germany
| | - Nina M. von Werthern
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt, Germany
| | - Thorsten M. Kranz
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt, Germany
| | - Bianca Kollmann
- grid.410607.4Department of Psychiatry and Psychotherapy, University Medical Center Mainz, Mainz, Germany ,grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany
| | - Björn Mattes
- grid.6546.10000 0001 0940 1669Institute of Psychology, Technical University of Darmstadt, Darmstadt, Germany
| | - Lara M. C. Puhlmann
- grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany
| | - Danuta Weichert
- grid.410607.4Department of Psychiatry and Psychotherapy, University Medical Center Mainz, Mainz, Germany
| | - Beat Lutz
- grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany ,grid.410607.4Institute of Physiological Chemistry, University Medical Center Mainz, Mainz, Germany
| | - Ulrike Basten
- grid.7839.50000 0004 1936 9721Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany ,grid.7839.50000 0004 1936 9721Brain Imaging Center, Goethe University, Frankfurt, Germany
| | - Christian J. Fiebach
- grid.7839.50000 0004 1936 9721Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany ,grid.7839.50000 0004 1936 9721Brain Imaging Center, Goethe University, Frankfurt, Germany
| | - Michèle Wessa
- grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany ,grid.5802.f0000 0001 1941 7111Department of Clinical Psychology and Neuropsychology, Institute for Psychology, Johannes Gutenberg University Mainz, Mainz, Germany
| | - Raffael Kalisch
- grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany ,grid.410607.4Neuroimaging Center (NIC), Focus Program Translational Neuroscience (FTN), Johannes Gutenberg University Medical Center Mainz, Mainz, Germany
| | - Klaus Lieb
- grid.410607.4Department of Psychiatry and Psychotherapy, University Medical Center Mainz, Mainz, Germany ,grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany
| | - Andreas G. Chiocchetti
- grid.7839.50000 0004 1936 9721Department of Child and Adolescent Psychiatry, Psychosomatics and Psychotherapy, Goethe University Frankfurt, Frankfurt, Germany
| | - Oliver Tüscher
- grid.410607.4Department of Psychiatry and Psychotherapy, University Medical Center Mainz, Mainz, Germany ,grid.509458.50000 0004 8087 0005Leibniz Institute for Resilience Research (LIR), Mainz, Germany
| | - Andreas Reif
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt, Germany
| | - Michael M. Plichta
- Goethe University Frankfurt, University Hospital, Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, Frankfurt, Germany
| |
Collapse
|
5
|
Tissink E, de Lange SC, Savage JE, Wightman DP, de Leeuw CA, Kelly KM, Nagel M, van den Heuvel MP, Posthuma D. Genome-wide association study of cerebellar volume provides insights into heritable mechanisms underlying brain development and mental health. Commun Biol 2022; 5:710. [PMID: 35842455 PMCID: PMC9288439 DOI: 10.1038/s42003-022-03672-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 07/05/2022] [Indexed: 12/24/2022] Open
Abstract
Cerebellar volume is highly heritable and associated with neurodevelopmental and neurodegenerative disorders. Understanding the genetic architecture of cerebellar volume may improve our insight into these disorders. This study aims to investigate the convergence of cerebellar volume genetic associations in close detail. A genome-wide associations study for cerebellar volume was performed in a discovery sample of 27,486 individuals from UK Biobank, resulting in 30 genome-wide significant loci and a SNP heritability of 39.82%. We pinpoint the likely causal variants and those that have effects on amino acid sequence or cerebellar gene-expression. Additionally, 85 genome-wide significant genes were detected and tested for convergence onto biological pathways, cerebellar cell types, human evolutionary genes or developmental stages. Local genetic correlations between cerebellar volume and neurodevelopmental and neurodegenerative disorders reveal shared loci with Parkinson's disease, Alzheimer's disease and schizophrenia. These results provide insights into the heritable mechanisms that contribute to developing a brain structure important for cognitive functioning and mental health.
Collapse
Affiliation(s)
- Elleke Tissink
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
| | - Siemon C de Lange
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands.,Department of Sleep and Cognition, Netherlands Institute for Neuroscience, an institute of the Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands
| | - Jeanne E Savage
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
| | - Douglas P Wightman
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
| | - Christiaan A de Leeuw
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
| | - Kristen M Kelly
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
| | - Mats Nagel
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands
| | - Martijn P van den Heuvel
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands.,Department of Child and Adolescent Psychiatry, Section Complex Trait Genetics, Amsterdam Neuroscience, Vrije Universiteit Medical Center, Amsterdam UMC, Amsterdam, The Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam Neuroscience, 1081 HV, Amsterdam, The Netherlands. .,Department of Child and Adolescent Psychiatry, Section Complex Trait Genetics, Amsterdam Neuroscience, Vrije Universiteit Medical Center, Amsterdam UMC, Amsterdam, The Netherlands.
| |
Collapse
|
6
|
Sauce B, Liebherr M, Judd N, Klingberg T. The impact of digital media on children's intelligence while controlling for genetic differences in cognition and socioeconomic background. Sci Rep 2022; 12:7720. [PMID: 35545630 PMCID: PMC9095723 DOI: 10.1038/s41598-022-11341-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 04/12/2022] [Indexed: 12/17/2022] Open
Abstract
Digital media defines modern childhood, but its cognitive effects are unclear and hotly debated. We believe that studies with genetic data could clarify causal claims and correct for the typically unaccounted role of genetic predispositions. Here, we estimated the impact of different types of screen time (watching, socializing, or gaming) on children’s intelligence while controlling for the confounding effects of genetic differences in cognition and socioeconomic status. We analyzed 9855 children from the USA who were part of the ABCD dataset with measures of intelligence at baseline (ages 9–10) and after two years. At baseline, time watching (r = − 0.12) and socializing (r = − 0.10) were negatively correlated with intelligence, while gaming did not correlate. After two years, gaming positively impacted intelligence (standardized β = + 0.17), but socializing had no effect. This is consistent with cognitive benefits documented in experimental studies on video gaming. Unexpectedly, watching videos also benefited intelligence (standardized β = + 0.12), contrary to prior research on the effect of watching TV. Although, in a posthoc analysis, this was not significant if parental education (instead of SES) was controlled for. Broadly, our results are in line with research on the malleability of cognitive abilities from environmental factors, such as cognitive training and the Flynn effect.
Collapse
Affiliation(s)
- Bruno Sauce
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
| | - Magnus Liebherr
- Department of General Psychology: Cognition, University Duisburg-Essen, Duisburg, Germany
| | - Nicholas Judd
- Department of Neuroscience, Karolinska Institutet, Solna, Sweden
| | - Torkel Klingberg
- Department of Neuroscience, Karolinska Institutet, Solna, Sweden.
| |
Collapse
|
7
|
Tung LH, Kingsford C. Practical selection of representative sets of RNA-seq samples using a hierarchical approach. Bioinformatics 2021; 37:i334-i341. [PMID: 34252927 PMCID: PMC8275344 DOI: 10.1093/bioinformatics/btab315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2021] [Indexed: 11/26/2022] Open
Abstract
MOTIVATION Despite numerous RNA-seq samples available at large databases, most RNA-seq analysis tools are evaluated on a limited number of RNA-seq samples. This drives a need for methods to select a representative subset from all available RNA-seq samples to facilitate comprehensive, unbiased evaluation of bioinformatics tools. In sequence-based approaches for representative set selection (e.g. a k-mer counting approach that selects a subset based on k-mer similarities between RNA-seq samples), because of the large numbers of available RNA-seq samples and of k-mers/sequences in each sample, computing the full similarity matrix using k-mers/sequences for the entire set of RNA-seq samples in a large database (e.g. the SRA) has memory and runtime challenges; this makes direct representative set selection infeasible with limited computing resources. RESULTS We developed a novel computational method called 'hierarchical representative set selection' to handle this challenge. Hierarchical representative set selection is a divide-and-conquer-like algorithm that breaks representative set selection into sub-selections and hierarchically selects representative samples through multiple levels. We demonstrate that hierarchical representative set selection can achieve summarization quality close to that of direct representative set selection, while largely reducing runtime and memory requirements of computing the full similarity matrix (up to 8.4× runtime reduction and 5.35× memory reduction for 10 000 and 12 000 samples respectively that could be practically run with direct subset selection). We show that hierarchical representative set selection substantially outperforms random sampling on the entire SRA set of RNA-seq samples, making it a practical solution to representative set selection on large databases like the SRA. AVAILABILITY AND IMPLEMENTATION The code is available at https://github.com/Kingsford-Group/hierrepsetselection and https://github.com/Kingsford-Group/jellyfishsim. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laura H Tung
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Carl Kingsford
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
8
|
Gialluisi A, Andlauer TFM, Mirza-Schreiber N, Moll K, Becker J, Hoffmann P, Ludwig KU, Czamara D, Pourcain BS, Honbolygó F, Tóth D, Csépe V, Huguet G, Chaix Y, Iannuzzi S, Demonet JF, Morris AP, Hulslander J, Willcutt EG, DeFries JC, Olson RK, Smith SD, Pennington BF, Vaessen A, Maurer U, Lyytinen H, Peyrard-Janvid M, Leppänen PHT, Brandeis D, Bonte M, Stein JF, Talcott JB, Fauchereau F, Wilcke A, Kirsten H, Müller B, Francks C, Bourgeron T, Monaco AP, Ramus F, Landerl K, Kere J, Scerri TS, Paracchini S, Fisher SE, Schumacher J, Nöthen MM, Müller-Myhsok B, Schulte-Körne G. Genome-wide association study reveals new insights into the heritability and genetic correlates of developmental dyslexia. Mol Psychiatry 2021; 26:3004-3017. [PMID: 33057169 PMCID: PMC8505236 DOI: 10.1038/s41380-020-00898-x] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 07/26/2020] [Accepted: 09/18/2020] [Indexed: 02/06/2023]
Abstract
Developmental dyslexia (DD) is a learning disorder affecting the ability to read, with a heritability of 40-60%. A notable part of this heritability remains unexplained, and large genetic studies are warranted to identify new susceptibility genes and clarify the genetic bases of dyslexia. We carried out a genome-wide association study (GWAS) on 2274 dyslexia cases and 6272 controls, testing associations at the single variant, gene, and pathway level, and estimating heritability using single-nucleotide polymorphism (SNP) data. We also calculated polygenic scores (PGSs) based on large-scale GWAS data for different neuropsychiatric disorders and cortical brain measures, educational attainment, and fluid intelligence, testing them for association with dyslexia status in our sample. We observed statistically significant (p < 2.8 × 10-6) enrichment of associations at the gene level, for LOC388780 (20p13; uncharacterized gene), and for VEPH1 (3q25), a gene implicated in brain development. We estimated an SNP-based heritability of 20-25% for DD, and observed significant associations of dyslexia risk with PGSs for attention deficit hyperactivity disorder (at pT = 0.05 in the training GWAS: OR = 1.23[1.16; 1.30] per standard deviation increase; p = 8 × 10-13), bipolar disorder (1.53[1.44; 1.63]; p = 1 × 10-43), schizophrenia (1.36[1.28; 1.45]; p = 4 × 10-22), psychiatric cross-disorder susceptibility (1.23[1.16; 1.30]; p = 3 × 10-12), cortical thickness of the transverse temporal gyrus (0.90[0.86; 0.96]; p = 5 × 10-4), educational attainment (0.86[0.82; 0.91]; p = 2 × 10-7), and intelligence (0.72[0.68; 0.76]; p = 9 × 10-29). This study suggests an important contribution of common genetic variants to dyslexia risk, and novel genomic overlaps with psychiatric conditions like bipolar disorder, schizophrenia, and cross-disorder susceptibility. Moreover, it revealed the presence of shared genetic foundations with a neural correlate previously implicated in dyslexia by neuroimaging evidence.
Collapse
Affiliation(s)
- Alessandro Gialluisi
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
- Department of Epidemiology and Prevention, IRCCS Istituto Neurologico Mediterraneo Neuromed, Pozzilli, Italy
| | - Till F M Andlauer
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany
- Department of Neurology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - Nazanin Mirza-Schreiber
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, Neuherberg, Germany
| | - Kristina Moll
- Department of Child and Adolescent Psychiatry, Psychosomatic, and Psychotherapy, Ludwig-Maximilians University, Munich, Germany
| | - Jessica Becker
- Department of Genomics, Life and Brain Center, Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Per Hoffmann
- Department of Genomics, Life and Brain Center, Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Kerstin U Ludwig
- Department of Genomics, Life and Brain Center, Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Darina Czamara
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany
| | - Beate St Pourcain
- Language and Genetics Department, Max Planck Institute for Psycholinguistics and Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Ferenc Honbolygó
- Brain Imaging Centre, Research Centre of Natural Sciences of the Hungarian Academy of Sciences, Budapest, Hungary
| | - Dénes Tóth
- Brain Imaging Centre, Research Centre of Natural Sciences of the Hungarian Academy of Sciences, Budapest, Hungary
| | - Valéria Csépe
- Brain Imaging Centre, Research Centre of Natural Sciences of the Hungarian Academy of Sciences, Budapest, Hungary
| | - Guillaume Huguet
- Human Genetics and Cognitive Functions Unit, Institut Pasteur and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Yves Chaix
- ToNIC, Toulouse NeuroImaging Center, Université de Toulouse, Inserm, UPS, Toulouse, France
- Children's Hospital, Purpan University Hospital, Toulouse, France
| | | | - Jean-Francois Demonet
- Leenaards Memory Centre, Department of Clinical Neurosciences Lausanne University Hospital (CHUV), University of Lausanne, Lausanne, Switzerland
| | - Andrew P Morris
- Department of Biostatistics, University of Liverpool, Liverpool, UK
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Jacqueline Hulslander
- Institute for Behavioral Genetics and Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - Erik G Willcutt
- Institute for Behavioral Genetics and Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - John C DeFries
- Institute for Behavioral Genetics and Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - Richard K Olson
- Institute for Behavioral Genetics and Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - Shelley D Smith
- Department of Neurological Sciences, University of Nebraska Medical Center, Omaha, NE, USA
| | - Bruce F Pennington
- Developmental Neuropsychology Lab and Clinic, Department of Psychology, University of Denver, Denver, CO, USA
| | - Anniek Vaessen
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience and Maastricht Brain Imaging Center (M-BIC), Maastricht University, Maastricht, The Netherlands
| | - Urs Maurer
- Department of Psychology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
| | - Heikki Lyytinen
- Centre for Research on Learning and Teaching, Department of Psychology, University of Jyväskylä, Jyväskylä, Finland
| | | | - Paavo H T Leppänen
- Centre for Research on Learning and Teaching, Department of Psychology, University of Jyväskylä, Jyväskylä, Finland
| | - Daniel Brandeis
- Department of Child and Adolescent Psychiatry and Psychotherapy, Psychiatric Hospital, University of Zurich, Zurich, Switzerland
- Zurich Center for Integrative Human Physiology (ZIHP), University of Zurich and ETH Zurich, Zurich, Switzerland
- Neuroscience Center Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Department of Child and Adolescent Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Milene Bonte
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience and Maastricht Brain Imaging Center (M-BIC), Maastricht University, Maastricht, The Netherlands
| | - John F Stein
- Department of Physiology, University of Oxford, Oxford, UK
| | - Joel B Talcott
- School of Life and Health Sciences, Aston University, Birmingham, UK
| | - Fabien Fauchereau
- Human Genetics and Cognitive Functions Unit, Institut Pasteur and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Arndt Wilcke
- Cognitive Genetics Unit, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | - Holger Kirsten
- Cognitive Genetics Unit, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
- Institute for Medical Informatics, Statistics and Epidemiology and LIFE-Leipzig Research Center for Civilization Diseases, University of Leipzig, Leipzig, Germany
| | - Bent Müller
- Cognitive Genetics Unit, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | - Clyde Francks
- Language and Genetics Department, Max Planck Institute for Psycholinguistics and Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Thomas Bourgeron
- Human Genetics and Cognitive Functions Unit, Institut Pasteur and University Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Anthony P Monaco
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Tufts University, Medford, MA, USA
| | - Franck Ramus
- Laboratoire de Sciences Cognitives et Psycholinguistique, Ecole Normale Supérieure, CNRS, EHESS, PSL University, Paris, France
| | - Karin Landerl
- Institute of Psychology, University of Graz and BioTechMed, Graz, Austria
| | - Juha Kere
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
- Stem Cells and Metabolism Research Program, Biomedicum, Folkhälsan Institute of Genetics, University of Helsinki, Helsinki, Finland
| | - Thomas S Scerri
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- The Walter and Eliza Hall Institute of Medical Research, Melbourne University, Melbourne, VIC, Australia
| | | | - Simon E Fisher
- Language and Genetics Department, Max Planck Institute for Psycholinguistics and Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Johannes Schumacher
- Department of Genomics, Life and Brain Center, Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Markus M Nöthen
- Department of Genomics, Life and Brain Center, Institute of Human Genetics, University of Bonn, Bonn, Germany
| | - Bertram Müller-Myhsok
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, Munich, Germany.
- Munich Cluster for Systems Neurology (SyNergy), Munich, Germany.
- Institute of Translational Medicine, University of Liverpool, Liverpool, UK.
| | - Gerd Schulte-Körne
- Department of Child and Adolescent Psychiatry, Psychosomatic, and Psychotherapy, Ludwig-Maximilians University, Munich, Germany.
| |
Collapse
|
9
|
Sauce B, Wiedenhoeft J, Judd N, Klingberg T. Change by challenge: A common genetic basis behind childhood cognitive development and cognitive training. NPJ SCIENCE OF LEARNING 2021; 6:16. [PMID: 34078902 PMCID: PMC8172838 DOI: 10.1038/s41539-021-00096-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 03/12/2021] [Indexed: 06/02/2023]
Abstract
The interplay of genetic and environmental factors behind cognitive development has preoccupied multiple fields of science and sparked heated debates over the decades. Here we tested the hypothesis that developmental genes rely heavily on cognitive challenges-as opposed to natural maturation. Starting with a polygenic score (cogPGS) that previously explained variation in cognitive performance in adults, we estimated its effect in 344 children and adolescents (mean age of 12 years old, ranging from 6 to 25) who showed changes in working memory (WM) in two distinct samples: (1) a developmental sample showing significant WM gains after 2 years of typical, age-related development, and (2) a training sample showing significant, experimentally-induced WM gains after 25 days of an intense WM training. We found that the same genetic factor, cogPGS, significantly explained the amount of WM gain in both samples. And there was no interaction of cogPGS with sample, suggesting that those genetic factors are neutral to whether the WM gains came from development or training. These results represent evidence that cognitive challenges are a central piece in the gene-environment interplay during cognitive development. We believe our study sheds new light on previous findings of interindividual differences in education (rich-get-richer and compensation effects), brain plasticity in children, and the heritability increase of intelligence across the lifespan.
Collapse
Affiliation(s)
- Bruno Sauce
- Department of Neuroscience, Karolinska Institute, Stockholm, Sweden
| | - John Wiedenhoeft
- Core Facility Medical Biometry and Statistical Bioinformatics, University Medical Center Göttingen, Göttingen, Germany
| | - Nicholas Judd
- Department of Neuroscience, Karolinska Institute, Stockholm, Sweden
| | - Torkel Klingberg
- Department of Neuroscience, Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
10
|
Callanan J, Stockdale SR, Shkoporov A, Draper LA, Ross RP, Hill C. Biases in Viral Metagenomics-Based Detection, Cataloguing and Quantification of Bacteriophage Genomes in Human Faeces, a Review. Microorganisms 2021; 9:524. [PMID: 33806607 PMCID: PMC8000950 DOI: 10.3390/microorganisms9030524] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 02/17/2021] [Accepted: 03/02/2021] [Indexed: 12/21/2022] Open
Abstract
The human gut is colonised by a vast array of microbes that include bacteria, viruses, fungi, and archaea. While interest in these microbial entities has largely focused on the bacterial constituents, recently the viral component has attracted more attention. Metagenomic advances, compared to classical isolation procedures, have greatly enhanced our understanding of the composition, diversity, and function of viruses in the human microbiome (virome). We highlight that viral extraction methodologies are crucial in terms of identifying and characterising communities of viruses infecting eukaryotes and bacteria. Different viral extraction protocols, including those used in some of the most significant human virome publications to date, have introduced biases affecting their a overall conclusions. It is important that protocol variations should be clearly highlighted across studies, with the ultimate goal of identifying and acknowledging biases associated with different protocols and, perhaps, the generation of an unbiased and standardised method for examining this portion of the human microbiome.
Collapse
Affiliation(s)
| | | | | | | | | | - Colin Hill
- APC Microbiome Ireland and School of Microbiology, University College Cork, T12 YT20 Cork, Ireland; (J.C.); (S.R.S.); (A.S.); (L.A.D.); (R.P.R.)
| |
Collapse
|
11
|
Ponsero AJ, Bomhoff M, Blumberg K, Youens-Clark K, Herz NM, Wood-Charlson EM, Delong EF, Hurwitz BL. Planet Microbe: a platform for marine microbiology to discover and analyze interconnected 'omics and environmental data. Nucleic Acids Res 2021; 49:D792-D802. [PMID: 32735679 PMCID: PMC7778950 DOI: 10.1093/nar/gkaa637] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 06/23/2020] [Accepted: 07/20/2020] [Indexed: 12/27/2022] Open
Abstract
In recent years, large-scale oceanic sequencing efforts have provided a deeper understanding of marine microbial communities and their dynamics. These research endeavors require the acquisition of complex and varied datasets through large, interdisciplinary and collaborative efforts. However, no unifying framework currently exists for the marine science community to integrate sequencing data with physical, geological, and geochemical datasets. Planet Microbe is a web-based platform that enables data discovery from curated historical and on-going oceanographic sequencing efforts. In Planet Microbe, each ‘omics sample is linked with other biological and physiochemical measurements collected for the same water samples or during the same sample collection event, to provide a broader environmental context. This work highlights the need for curated aggregation efforts that can enable new insights into high-quality metagenomic datasets. Planet Microbe is freely accessible from https://www.planetmicrobe.org/.
Collapse
Affiliation(s)
- Alise J Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, USA.,BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Matthew Bomhoff
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, USA.,BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Kai Blumberg
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, USA.,BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, USA.,BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Nina M Herz
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, USA
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Edward F Delong
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Manoa, Honolulu, HI 96822, USA
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, USA.,BIO5 Institute, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
12
|
Dennis JK, Sealock JM, Straub P, Lee YH, Hucks D, Actkins K, Faucon A, Feng YCA, Ge T, Goleva SB, Niarchou M, Singh K, Morley T, Smoller JW, Ruderfer DM, Mosley JD, Chen G, Davis LK. Clinical laboratory test-wide association scan of polygenic scores identifies biomarkers of complex disease. Genome Med 2021; 13:6. [PMID: 33441150 PMCID: PMC7807864 DOI: 10.1186/s13073-020-00820-8] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 12/08/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations. METHODS A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank. RESULTS Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB. CONCLUSIONS Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.
Collapse
Affiliation(s)
- Jessica K Dennis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Julia M Sealock
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Peter Straub
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Younga H Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Donald Hucks
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Ky'Era Actkins
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, TN, 37232, USA
| | - Annika Faucon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Yen-Chen Anne Feng
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Tian Ge
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Slavina B Goleva
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Maria Niarchou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kritika Singh
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Theodore Morley
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jordan W Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jonathan D Mosley
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lea K Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Ave, Nashville, TN, 37232, USA.
| |
Collapse
|
13
|
Chen JCY, Tyler AD. Systematic evaluation of supervised machine learning for sample origin prediction using metagenomic sequencing data. Biol Direct 2020; 15:29. [PMID: 33302990 PMCID: PMC7731568 DOI: 10.1186/s13062-020-00287-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 12/01/2020] [Indexed: 02/07/2023] Open
Abstract
Background The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets provided by the 2019 CAMDA challenge, we evaluated the influence of variable technical, analytical and machine learning approaches for result interpretation and novel source prediction. Results Comparison between 16S rRNA amplicon and shotgun sequencing approaches as well as metagenomic analytical tools showed differences in normalized microbial abundance, especially for organisms present at low abundance. Shotgun sequence data analyzed using Kraken2 and Bracken, for taxonomic annotation, had higher detection sensitivity. As classification models are limited to labeling pre-trained origins, we took an alternative approach using Lasso-regularized multivariate regression to predict geographic coordinates for comparison. In both models, the prediction errors were much higher in Leave-1-city-out than in 10-fold cross validation, of which the former realistically forecasted the increased difficulty in accurately predicting samples from new origins. This challenge was further confirmed when applying the model to a set of samples obtained from new origins. Overall, the prediction performance of the regression and classification models, as measured by mean squared error, were comparable on mystery samples. Due to higher prediction error rates for samples from new origins, we provided an additional strategy based on prediction ambiguity to infer whether a sample is from a new origin. Lastly, we report increased prediction error when data from different sequencing protocols were included as training data. Conclusions Herein, we highlight the capacity of predicting sample origin accurately with pre-trained origins and the challenge of predicting new origins through both regression and classification models. Overall, this work provides a summary of the impact of sequencing technique, protocol, taxonomic analytical approaches, and machine learning approaches on the use of metagenomics for prediction of sample origin. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-020-00287-y.
Collapse
Affiliation(s)
- Julie Chih-Yu Chen
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada.
| | - Andrea D Tyler
- National Microbiology Laboratory, Public Health Agency of Canada, 1015 Arlington Street, Winnipeg, Manitoba, R3E 3R2, Canada
| |
Collapse
|
14
|
Wu X, Shang Y, Wei Q, Chen J, Zhang H, Chen Y, Gao X, Wang Z, Zhang H. Gut Microbiota in Dholes During Estrus. Front Microbiol 2020; 11:575731. [PMID: 33329438 PMCID: PMC7734286 DOI: 10.3389/fmicb.2020.575731] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 11/09/2020] [Indexed: 12/11/2022] Open
Abstract
The co-evolution of gut microbes and the host plays a vital role in the survival and reproduction of the host. The dhole (Cuon alpinus) has been listed as endangered species by the International Union for Conservation of Nature; therefore, conservation and effective breeding of dholes are essential. Effective estrus can promote reproduction. However, little is known about the relative contribution of estrus in shaping the structure and the functions of fecal microbiota. Here, we investigated the potential association between estrus and the fecal microbiota in dholes using shotgun metagenomic sequencing. We found that the estrus stages in dholes vary significantly in terms of gut bacterial composition and microbiome metabolism and function. Compared with that of non-estrus, adult dholes, the microbiome of estrus adult dholes had a significantly higher abundance of Bacillus faecalis and Veillonella, which play a key role in the synthesis of sex hormones and nucleic acids, energy production, and reproductive cell division. The insulin and energy metabolism-related pathways are significantly enhanced in the gut microbes and the related gluconeogenic enzymes are significantly enriched during estrus. These findings suggest that the structure and metagenome of the fecal microbiome during estrus have a significant effect in promoting estrus in dholes, thus providing a new perspective for dhole conservation.
Collapse
Affiliation(s)
- Xiaoyang Wu
- College of Life Sciences, Qufu Normal University, Qufu, China
| | - Yongquan Shang
- College of Life Sciences, Qufu Normal University, Qufu, China
| | - Qinguo Wei
- College of Life Sciences, Qufu Normal University, Qufu, China
| | - Jun Chen
- College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Huanxin Zhang
- College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Yao Chen
- College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Xiaodong Gao
- College of Life Sciences, Qufu Normal University, Qufu, China
| | - Zhiyong Wang
- Shijiazhuang Wildlife Conservation Center, Shijiazhuang, China
| | - Honghai Zhang
- College of Life Sciences, Qufu Normal University, Qufu, China
| |
Collapse
|
15
|
Guerrini V, Louza FA, Rosone G. Metagenomic analysis through the extended Burrows-Wheeler transform. BMC Bioinformatics 2020; 21:299. [PMID: 32938362 PMCID: PMC7493373 DOI: 10.1186/s12859-020-03628-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 06/22/2020] [Indexed: 11/10/2022] Open
Abstract
Background The development of Next Generation Sequencing (NGS) has had a major impact on the study of genetic sequences. Among problems that researchers in the field have to face, one of the most challenging is the taxonomic classification of metagenomic reads, i.e., identifying the microorganisms that are present in a sample collected directly from the environment. The analysis of environmental samples (metagenomes) are particularly important to figure out the microbial composition of different ecosystems and it is used in a wide variety of fields: for instance, metagenomic studies in agriculture can help understanding the interactions between plants and microbes, or in ecology, they can provide valuable insights into the functions of environmental communities. Results In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. The tool LiME (Lightweight Metagenomics via eBWT) is available at https://github.com/veronicaguerrini/LiME. Conclusions In order to assess the reliability of our approach, we run several experiments on NGS data from two simulated metagenomes among those provided in benchmarking analysis and on a real metagenome from the Human Microbiome Project. The experiment results on the simulated data show that LiME is competitive with the widely used taxonomic classifiers. It achieves high levels of precision and specificity – e.g. 99.9% of the positive control reads are correctly assigned and the percentage of classified reads of the negative control is less than 0.01% – while keeping a high sensitivity. On the real metagenome, we show that LiME is able to deliver classification results comparable to that of MagicBlast. Overall, the experiments confirm the effectiveness of our method and its high accuracy even in negative control samples.
Collapse
Affiliation(s)
- Veronica Guerrini
- Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo, 3, Pisa, Italy
| | - Felipe A Louza
- Faculty of Electrical Engineering, Federal University of Uberlândia, Uberlândia, Brazil
| | - Giovanna Rosone
- Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo, 3, Pisa, Italy.
| |
Collapse
|
16
|
Comin M, Di Camillo B, Pizzi C, Vandin F. Comparison of microbiome samples: methods and computational challenges. Brief Bioinform 2020; 22:88-95. [PMID: 32577746 DOI: 10.1093/bib/bbaa121] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 05/09/2020] [Accepted: 05/18/2020] [Indexed: 12/14/2022] Open
Abstract
The study of microbial communities crucially relies on the comparison of metagenomic next-generation sequencing data sets, for which several methods have been designed in recent years. Here, we review three key challenges in the comparison of such data sets: species identification and quantification, the efficient computation of distances between metagenomic samples and the identification of metagenomic features associated with a phenotype such as disease status. We present current solutions for such challenges, considering both reference-based methods relying on a database of reference genomes and reference-free methods working directly on all sequencing reads from the samples.
Collapse
|
17
|
Abstract
The prevalence and clinical characteristics of depressive disorders differ between women and men; however, the genetic contribution to sex differences in depressive disorders has not been elucidated. To evaluate sex-specific differences in the genetic architecture of depression, whole exome sequencing of samples from 1000 patients (70.7% female) with depressive disorder was conducted. Control data from healthy individuals with no psychiatric disorder (n = 72, 26.4% female) and East-Asian subpopulation 1000 Genome Project data (n = 207, 50.7% female) were included. The genetic variation between men and women was directly compared using both qualitative and quantitative research designs. Qualitative analysis identified five genetic markers potentially associated with increased risk of depressive disorder in females, including three variants (rs201432982 within PDE4A, and rs62640397 and rs79442975 within FDX1L) mapping to chromosome 19p13.2 and two novel variants (rs820182 and rs820148) within MYO15B at the chromosome 17p25.1 locus. Depressed patients homozygous for these variants showed more severe depressive symptoms and higher suicidality than those who were not homozygotes (i.e., heterozygotes and homozygotes for the non-associated allele). Quantitative analysis demonstrated that the genetic burden of protein-truncating and deleterious variants was higher in males than females, even after permutation testing. Our study provides novel genetic evidence that the higher prevalence of depressive disorders in women may be attributable to inherited variants.
Collapse
|
18
|
Dong J, Liu S, Zhang Y, Dai Y, Wu Q. A New Alignment-Free Whole Metagenome Comparison Tool and Its Application on Gut Microbiomes of Wild Giant Pandas. Front Microbiol 2020; 11:1061. [PMID: 32612579 PMCID: PMC7309450 DOI: 10.3389/fmicb.2020.01061] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Accepted: 04/29/2020] [Indexed: 11/13/2022] Open
Abstract
The comparison of metagenomes is crucial for studying the relationship between microbial communities and environmental factors. One recently published alignment-free whole metagenome comparison method based on k-mer frequencies, Libra, showed higher resolutions than the present fastest method, Mash, on whole metagenomic sequencing reads, but it did not perform as well on the assembled contigs. Here, we developed a new alignment-free tool, KmerFreqCalc, for the comparison of the whole metagenomic data, which first calculated the frequencies of both forward and reverse complementary sequences of k-mers like Mash and then computed the cosine distance between the samples based on k-mer frequency vectors like Libra. We applied KmerFreqCalc on the assembled contigs of the gut microbiomes of wild giant pandas and compared the results to Libra and Mash. The results indicated that KmerFreqCalc was able to detect the subtle difference between giant panda samples caused by seasonal diet change, showing better clustering than Libra and Mash. Therefore, KmerFreqCalc has high resolution and accuracy for assembled contigs, being very suitable for comparison of samples with low dissimilarity.
Collapse
Affiliation(s)
- Jiuhong Dong
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Shuai Liu
- Institute of Physical Science and Information Technology, Anhui University, Hefei, China.,Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Yaran Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yi Dai
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Qi Wu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
19
|
Abstract
The Arctic is warming at an accelerating pace, and the rise in temperature has increasing impacts on the Arctic biome. Lakes are integrators of their surroundings and thus excellent sentinels of environmental change. Despite their importance in the regulation of key microbial processes, viruses remain largely uncharacterized in Arctic lacustrine environments. We sampled a highly stratified meromictic lake near the northern limit of the Canadian High Arctic, a region in rapid transition due to climate change. We found that the different layers of the lake harbored viral communities that were strikingly dissimilar and highly divergent from known viruses. Viruses were more abundant in the deepest part of the lake containing ancient Arctic Ocean seawater that was trapped during glacial retreat and were genomically unlike any viruses previously described. This research demonstrates the complexity and novelty of viral communities in an environment that is vulnerable to ongoing perturbation. High-latitude, perennially stratified (meromictic) lakes are likely to be especially vulnerable to climate warming because of the importance of ice in maintaining their water column structure and associated distribution of microbial communities. This study aimed to characterize viral abundance, diversity, and distribution in a meromictic lake of marine origin on the far northern coast of Ellesmere Island, in the Canadian High Arctic. We collected triplicate samples for double-stranded DNA (dsDNA) viromics from five depths that encompassed the major features of the lake, as determined by limnological profiling of the water column. Viral abundance and virus-to-prokaryote ratios were highest at greater depths, while bacterial and cyanobacterial counts were greatest in the surface waters. The viral communities from each zone of the lake defined by salinity, temperature, and dissolved oxygen concentrations were markedly distinct, suggesting that there was little exchange of viral types among lake strata. Ten viral assembled genomes were obtained from our libraries, and these also segregated with depth. This well-defined structure of viral communities was consistent with that of potential hosts. Viruses from the monimolimnion, a deep layer of ancient Arctic Ocean seawater, were more diverse and relatively abundant, with few similarities to available viral sequences. The Lake A viral communities also differed from published records from the Arctic Ocean and meromictic Ace Lake in Antarctica. This first characterization of viral diversity from this sentinel environment underscores the microbial richness and complexity of an ecosystem type that is increasingly exposed to major perturbations in the fast-changing Arctic. IMPORTANCE The Arctic is warming at an accelerating pace, and the rise in temperature has increasing impacts on the Arctic biome. Lakes are integrators of their surroundings and thus excellent sentinels of environmental change. Despite their importance in the regulation of key microbial processes, viruses remain largely uncharacterized in Arctic lacustrine environments. We sampled a highly stratified meromictic lake near the northern limit of the Canadian High Arctic, a region in rapid transition due to climate change. We found that the different layers of the lake harbored viral communities that were strikingly dissimilar and highly divergent from known viruses. Viruses were more abundant in the deepest part of the lake containing ancient Arctic Ocean seawater that was trapped during glacial retreat and were genomically unlike any viruses previously described. This research demonstrates the complexity and novelty of viral communities in an environment that is vulnerable to ongoing perturbation.
Collapse
|
20
|
Ruisch IH, Dietrich A, Klein M, Faraone SV, Oosterlaan J, Buitelaar JK, Hoekstra PJ. Aggression based genome-wide, glutamatergic, dopaminergic and neuroendocrine polygenic risk scores predict callous-unemotional traits. Neuropsychopharmacology 2020; 45:761-769. [PMID: 31918432 PMCID: PMC7075955 DOI: 10.1038/s41386-020-0608-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 12/03/2019] [Accepted: 12/30/2019] [Indexed: 12/17/2022]
Abstract
Aggression and callous, uncaring, and unemotional (CU) traits are clinically related behavioral constructs caused by genetic and environmental factors. We performed polygenic risk score (PRS) analyses to investigate shared genetic etiology between aggression and these three CU-traits. Furthermore, we studied interactions of PRS with smoking during pregnancy and childhood life events in relation to CU-traits. Summary statistics for the base phenotype were derived from the EAGLE-consortium genome-wide association study of children's aggressive behavior and were used to calculate individual-level genome-wide and gene-set PRS in the NeuroIMAGE target-sample. Target phenotypes were 'callousness', 'uncaring', and 'unemotional' sumscores of the Inventory of Callous-Unemotional traits. A total of 779 subjects and 1,192,414 single-nucleotide polymorphisms were available for PRS-analyses. Gene-sets comprised serotonergic, dopaminergic, glutamatergic, and neuroendocrine signaling pathways. Genome-wide PRS showed evidence of association with uncaring scores (explaining up to 1.59% of variance; self-contained Q = 0.0306, competitive-P = 0.0015). Dopaminergic, glutamatergic, and neuroendocrine PRS showed evidence of association with unemotional scores (explaining up to 1.33, 2.00, and 1.20% of variance respectively; self-contained Q-values 0.037, 0.0115, and 0.0473 respectively, competitive-P-values 0.0029, 0.0002, and 0.0045 respectively). Smoking during pregnancy related to callousness scores while childhood life events related to both callousness and unemotionality. Moreover, dopaminergic PRS appeared to interact with childhood life events in relation to unemotional scores. Our study provides evidence suggesting shared genetic etiology between aggressive behavior and uncaring, and unemotional CU-traits in children. Gene-set PRS confirmed involvement of shared glutamatergic, dopaminergic, and neuroendocrine genetic variation in aggression and CU-traits. Replication of current findings is needed.
Collapse
Affiliation(s)
- I Hyun Ruisch
- Department of Child and Adolescent Psychiatry, University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9713GZ, Groningen, The Netherlands.
| | - Andrea Dietrich
- Department of Child and Adolescent Psychiatry, University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9713GZ, Groningen, The Netherlands
| | - Marieke Klein
- Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525GA, Nijmegen, The Netherlands
- Department of Psychiatry, University Medical Center Utrecht, Brain Center Rudolf Magnus, Utrecht, The Netherlands
| | - Stephen V Faraone
- Department of Psychiatry and of Neuroscience and Physiology, State University of New York (SUNY) Upstate Medical University, Syracuse, NY, United States
- K.G. Jebsen Centre for Research on Neuropsychiatric Disorders, University of Bergen, Bergen, Norway
| | - Jaap Oosterlaan
- Department of Clinical Neuropsychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jan K Buitelaar
- Department of Cognitive Neuroscience, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525GA, Nijmegen, The Netherlands
- Karakter Child and Adolescent Psychiatry University Centre, Reinier Postlaan 12, 6525GC, Nijmegen, The Netherlands
| | - Pieter J Hoekstra
- Department of Child and Adolescent Psychiatry, University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9713GZ, Groningen, The Netherlands
| |
Collapse
|
21
|
Watts GS, Thornton JE, Youens-Clark K, Ponsero AJ, Slepian MJ, Menashi E, Hu C, Deng W, Armstrong DG, Reed S, Cranmer LD, Hurwitz BL. Identification and quantitation of clinically relevant microbes in patient samples: Comparison of three k-mer based classifiers for speed, accuracy, and sensitivity. PLoS Comput Biol 2019; 15:e1006863. [PMID: 31756192 PMCID: PMC6897419 DOI: 10.1371/journal.pcbi.1006863] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 12/06/2019] [Accepted: 10/10/2019] [Indexed: 12/15/2022] Open
Abstract
Infections are a serious health concern worldwide, particularly in vulnerable populations such as the immunocompromised, elderly, and young. Advances in metagenomic sequencing availability, speed, and decreased cost offer the opportunity to supplement or even replace culture-based identification of pathogens with DNA sequence-based diagnostics. Adopting metagenomic analysis for clinical use requires that all aspects of the workflow are optimized and tested, including data analysis and computational time and resources. We tested the accuracy, sensitivity, and resource requirements of three top metagenomic taxonomic classifiers that use fast k-mer based algorithms: Centrifuge, CLARK, and KrakenUniq. Binary mixtures of bacteria showed all three reliably identified organisms down to 1% relative abundance, while only the relative abundance estimates of Centrifuge and CLARK were accurate. All three classifiers identified the organisms present in their default databases from a mock bacterial community of 20 organisms, but only Centrifuge had no false positives. In addition, Centrifuge required far less computational resources and time for analysis. Centrifuge analysis of metagenomes obtained from samples of VAP, infected DFUs, and FN showed Centrifuge identified pathogenic bacteria and one virus that were corroborated by culture or a clinical PCR assay. Importantly, in both diabetic foot ulcer patients, metagenomic sequencing identified pathogens 4-6 weeks before culture. Finally, we show that Centrifuge results were minimally affected by elimination of time-consuming read quality control and host screening steps.
Collapse
Affiliation(s)
- George S. Watts
- University of Arizona Cancer Center and Department of Pharmacology, University of Arizona, Tucson, Arizona, United States of America
| | - James E. Thornton
- Department of Biosystems Engineering, University of Arizona, Tucson, Arizona, United States of America
| | - Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, Tucson, Arizona, United States of America
| | - Alise J. Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, Arizona, United States of America
| | - Marvin J. Slepian
- Department of Medicine, University of Arizona, Tucson, Arizona, United States of America
- Department of Biomedical Engineering, University of Arizona, Tucson, Arizona, United States of America
- Arizona Center for Accelerated Biomedical Innovation, University of Arizona, Tucson, Arizona, United States of America
| | - Emmanuel Menashi
- Honor Health Hospital, Scottsdale, Arizona, United States of America
| | - Charles Hu
- Dignity Health Chandler Regional Medical Center, Chandler, Arizona, United States of America
| | - Wuquan Deng
- Department of Endocrinology, Multidisciplinary Diabetic Foot Medical Center, Affiliated Central Hospital of Chongqing University, Chongqing, China
| | - David G. Armstrong
- Southwestern Academic Limb Salvage Alliance (SALSA), Department of Surgery, Keck School of Medicine of University of Southern California, Los Angeles, California, United States of America
| | - Spenser Reed
- University of Arizona Department of Family and Community Medicine, Tucson, Arizona, United States of America
| | - Lee D. Cranmer
- Department of Medicine, University of Washington and Fred Hutchinson Cancer Research Center, and Seattle Cancer Care Alliance, Seattle, Washington, United States of America
| | - Bonnie L. Hurwitz
- Department of Biosystems Engineering, University of Arizona, Tucson, Arizona, United States of America
- BIO5 Institute, University of Arizona, Tucson, Arizona, United States of America
| |
Collapse
|
22
|
Connor R, Brister R, Buchmann JP, Deboutte W, Edwards R, Martí-Carreras J, Tisza M, Zalunin V, Andrade-Martínez J, Cantu A, D'Amour M, Efremov A, Fleischmann L, Forero-Junco L, Garmaeva S, Giluso M, Glickman C, Henderson M, Kellman B, Kristensen D, Leubsdorf C, Levi K, Levi S, Pakala S, Peddu V, Ponsero A, Ribeiro E, Roy F, Rutter L, Saha S, Shakya M, Shean R, Miller M, Tully B, Turkington C, Youens-Clark K, Vanmechelen B, Busby B. NCBI's Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements. Genes (Basel) 2019; 10:E714. [PMID: 31527408 PMCID: PMC6771016 DOI: 10.3390/genes10090714] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 09/05/2019] [Accepted: 09/05/2019] [Indexed: 01/26/2023] Open
Abstract
A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Jan P Buchmann
- Charles Perkins Centre, School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia.
| | - Ward Deboutte
- KU Leuven, Department of Microbiology & Immunology, Rega Institute, Leuven BE3000, Belgium.
| | - Rob Edwards
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Joan Martí-Carreras
- KU Leuven, Department of Microbiology & Immunology, Rega Institute, Leuven BE3000, Belgium.
| | - Mike Tisza
- Lab of Cellular Oncology, NCI, NIH, Bethesda, MD 20892-4263, USA.
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Juan Andrade-Martínez
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia. Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá 111711, Colombia.
| | - Adrian Cantu
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Michael D'Amour
- D'Amour & Associates, 11839 Hilltop Drive, Los Altos Hills, CA 94024, USA.
| | - Alexandre Efremov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Lydia Fleischmann
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Laura Forero-Junco
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia. Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá 111711, Colombia.
| | - Sanzhima Garmaeva
- Department of Genetics, University Medical Center Groningen, Groningen 9713AV, The Netherlands.
| | - Melissa Giluso
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Cody Glickman
- Computational Bioscience Program, University of Colorado Anschutz, Aurora, CO 80045, USA.
| | - Margaret Henderson
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Benjamin Kellman
- Bioinformatics and Systems Biology Program, University of California at San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA.
| | - David Kristensen
- Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA.
| | - Carl Leubsdorf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Kyle Levi
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Shane Levi
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Suman Pakala
- Division of Infectious Diseases, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
| | - Vikas Peddu
- Department of Laboratory Medicine, University of Washington Virology, 1616 Eastlake Ave E, Seattle, WA 98102, USA.
| | - Alise Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85716, USA.
| | - Eldred Ribeiro
- MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102-7539, USA.
| | - Farrah Roy
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| | | | - Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14853, USA.
| | - Migun Shakya
- Bioscience Division, Los Alamos National Lab, Los Alamos, NM 87545, USA.
| | - Ryan Shean
- Department of Laboratory Medicine, University of Washington Virology, 1616 Eastlake Ave E, Seattle, WA 98102, USA.
| | - Matthew Miller
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85716, USA.
| | - Benjamin Tully
- Center for Dark Energy Biosphere Investigations, University of Southern California, Los Angeles, CA 90089, USA.
| | | | - Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85716, USA.
| | - Bert Vanmechelen
- KU Leuven, Department of Microbiology & Immunology, Rega Institute, Leuven BE3000, Belgium.
| | - Ben Busby
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| |
Collapse
|
23
|
Youens-Clark K, Bomhoff M, Ponsero AJ, Wood-Charlson EM, Lynch J, Choi I, Hartman JH, Hurwitz BL. iMicrobe: Tools and data-dreaiven discovery platform for the microbiome sciences. Gigascience 2019; 8:giz083. [PMID: 31289831 PMCID: PMC6615980 DOI: 10.1093/gigascience/giz083] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 05/30/2019] [Accepted: 06/18/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Scientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FINDINGS The iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation-supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). CONCLUSIONS iMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform.
Collapse
Affiliation(s)
- Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th St, Shantz Building, Room 403, Tucson, AZ, USA 85721-0038
| | - Matt Bomhoff
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th St, Shantz Building, Room 403, Tucson, AZ, USA 85721-0038
| | - Alise J Ponsero
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th St, Shantz Building, Room 403, Tucson, AZ, USA 85721-0038
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Joshua Lynch
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th St, Shantz Building, Room 403, Tucson, AZ, USA 85721-0038
| | - Illyoung Choi
- Department of Computer Science, University of Arizona, Tucson, AZ, USA
| | - John H Hartman
- Department of Computer Science, University of Arizona, Tucson, AZ, USA
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th St, Shantz Building, Room 403, Tucson, AZ, USA 85721-0038
- BIO5 Institute, University of Arizona, Tucson, AZ, USA
| |
Collapse
|
24
|
Choi I, Ponsero AJ, Bomhoff M, Youens-Clark K, Hartman JH, Hurwitz BL. Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons. Gigascience 2019; 8:5266304. [PMID: 30597002 PMCID: PMC6354030 DOI: 10.1093/gigascience/giy165] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 12/17/2018] [Indexed: 11/23/2022] Open
Abstract
Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.
Collapse
Affiliation(s)
- Illyoung Choi
- Department of Computer Science, University of Arizona, 1040 E. 4th Street, Tucson, Arizona, 85721, USA
| | - Alise J Ponsero
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th Street, Tucson, Arizona, 85721, USA
| | - Matthew Bomhoff
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th Street, Tucson, Arizona, 85721, USA
| | - Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th Street, Tucson, Arizona, 85721, USA
| | - John H Hartman
- Department of Computer Science, University of Arizona, 1040 E. 4th Street, Tucson, Arizona, 85721, USA
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, University of Arizona, 1177 E. 4th Street, Tucson, Arizona, 85721, USA.,BIO5 Institute, University of Arizona, 1657 E. Helen Street, Tucson, Arizona, 85719, USA
| |
Collapse
|