1
|
Chen PB, Chen R, LaPierre N, Chen Z, Mefford J, Marcus E, Heffel MG, Soto DC, Ernst J, Luo C, Flint J. Complementation testing identifies genes mediating effects at quantitative trait loci underlying fear-related behavior. Cell Genom 2024; 4:100545. [PMID: 38697120 DOI: 10.1016/j.xgen.2024.100545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 02/23/2024] [Accepted: 04/04/2024] [Indexed: 05/04/2024]
Abstract
Knowing the genes involved in quantitative traits provides an entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six quantitative trait loci (QTLs) by quantitative complementation, and identified six genes. Four genes, Lamp, Ptprd, Nptx2, and Sh3gl, have known roles in synapse function; the fifth, Psip1, was not previously implicated in behavior; and the sixth is a long non-coding RNA, 4933413L06Rik, of unknown function. Variation in transcriptome and epigenetic modalities occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results relieve a bottleneck in using genetic mapping of QTLs to uncover biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.
Collapse
Affiliation(s)
- Patrick B Chen
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Rachel Chen
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nathan LaPierre
- Department of Computer Science, Samueli School of Engineering, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Zeyuan Chen
- Department of Computer Science, Samueli School of Engineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Joel Mefford
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Emilie Marcus
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, USA
| | - Matthew G Heffel
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Daniela C Soto
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jason Ernst
- Department of Computer Science, Samueli School of Engineering, University of California, Los Angeles, Los Angeles, CA, USA; Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, USA
| | - Chongyuan Luo
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jonathan Flint
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Alser M, Lawlor B, Abdill RJ, Waymost S, Ayyala R, Rajkumar N, LaPierre N, Brito J, Ribeiro-Dos-Santos AM, Almadhoun N, Sarwal V, Firtina C, Osinski T, Eskin E, Hu Q, Strong D, Kim BDBD, Abedalthagafi MS, Mutlu O, Mangul S. Packaging and containerization of computational methods. Nat Protoc 2024:10.1038/s41596-024-00986-0. [PMID: 38565959 DOI: 10.1038/s41596-024-00986-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 02/12/2024] [Indexed: 04/04/2024]
Abstract
Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.
Collapse
Affiliation(s)
- Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Brendan Lawlor
- Department of Computer Science, Munster Technological University, Cork, Ireland
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | - Richard J Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Sharon Waymost
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ram Ayyala
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | - Neha Rajkumar
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jaqueline Brito
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | | | - Nour Almadhoun
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Varuni Sarwal
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Tomasz Osinski
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Qiyang Hu
- Office of Advanced Research Computing, University of California, Los Angeles, CA, USA
| | - Derek Strong
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Byoung-Do B D Kim
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Malak S Abedalthagafi
- Department of Pathology & Laboratory Medicine, Emory University Hospital, Atlanta, GA, USA
- King Salman Center for Disability Research, Riyadh, Saudi Arabia
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Serghei Mangul
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
3
|
LaPierre N, Pimentel H. Accounting for isoform expression increases power to identify genetic regulation of gene expression. PLoS Comput Biol 2024; 20:e1011857. [PMID: 38346082 PMCID: PMC10890775 DOI: 10.1371/journal.pcbi.1011857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 02/23/2024] [Accepted: 01/23/2024] [Indexed: 02/25/2024] Open
Abstract
A core problem in genetics is molecular quantitative trait locus (QTL) mapping, in which genetic variants associated with changes in the molecular phenotypes are identified. One of the most-studied molecular QTL mapping problems is expression QTL (eQTL) mapping, in which the molecular phenotype is gene expression. It is common in eQTL mapping to compute gene expression by aggregating the expression levels of individual isoforms from the same gene and then performing linear regression between SNPs and this aggregated gene expression level. However, SNPs may regulate isoforms from the same gene in different directions due to alternative splicing, or only regulate the expression level of one isoform, causing this approach to lose power. Here, we examine a broader question: which genes have at least one isoform whose expression level is regulated by genetic variants? In this study, we propose and evaluate several approaches to answering this question, demonstrating that "isoform-aware" methods-those that account for the expression levels of individual isoforms-have substantially greater power to answer this question than standard "gene-level" eQTL mapping methods. We identify settings in which different approaches yield an inflated number of false discoveries or lose power. In particular, we show that calling an eGene if there is a significant association between a SNP and any isoform fails to control False Discovery Rate, even when applying standard False Discovery Rate correction. We show that similar trends are observed in real data from the GEUVADIS and GTEx studies, suggesting the possibility that similar effects are present in these consortia.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, University of Chicago, Illinois, United States of America
| | - Harold Pimentel
- Department of Human Genetics, University of California, Los Angeles, California, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
- Department of Computational Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
4
|
Chen PB, Chen R, LaPierre N, Chen Z, Mefford J, Marcus E, Heffel MG, Soto DC, Ernst J, Luo C, Flint J. Complementation testing identifies causal genes at quantitative trait loci underlying fear related behavior. bioRxiv 2024:2024.01.03.574060. [PMID: 38260483 PMCID: PMC10802323 DOI: 10.1101/2024.01.03.574060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Knowing the genes involved in quantitative traits provides a critical entry point to understanding the biological bases of behavior, but there are very few examples where the pathway from genetic locus to behavioral change is known. Here we address a key step towards that goal by deploying a test that directly queries whether a gene mediates the effect of a quantitative trait locus (QTL). To explore the role of specific genes in fear behavior, we mapped three fear-related traits, tested fourteen genes at six QTLs, and identified six genes. Four genes, Lsamp, Ptprd, Nptx2 and Sh3gl, have known roles in synapse function; the fifth gene, Psip1, is a transcriptional co-activator not previously implicated in behavior; the sixth is a long non-coding RNA 4933413L06Rik with no known function. Single nucleus transcriptomic and epigenetic analyses implicated excitatory neurons as likely mediating the genetic effects. Surprisingly, variation in transcriptome and epigenetic modalities between inbred strains occurred preferentially in excitatory neurons, suggesting that genetic variation is more permissible in excitatory than inhibitory neuronal circuits. Our results open a bottleneck in using genetic mapping of QTLs to find novel biology underlying behavior and prompt a reconsideration of expected relationships between genetic and functional variation.
Collapse
|
5
|
Karlin J, Gai L, LaPierre N, Danesh K, Farajzadeh J, Palileo B, Taraszka K, Zheng J, Wang W, Eskin E, Rootman D. Ensemble neural network model for detecting thyroid eye disease using external photographs. Br J Ophthalmol 2023; 107:1722-1729. [PMID: 36126104 DOI: 10.1136/bjo-2022-321833] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/22/2022] [Indexed: 11/03/2022]
Abstract
PURPOSE To describe an artificial intelligence platform that detects thyroid eye disease (TED). DESIGN Development of a deep learning model. METHODS 1944 photographs from a clinical database were used to train a deep learning model. 344 additional images ('test set') were used to calculate performance metrics. Receiver operating characteristic, precision-recall curves and heatmaps were generated. From the test set, 50 images were randomly selected ('survey set') and used to compare model performance with ophthalmologist performance. 222 images obtained from a separate clinical database were used to assess model recall and to quantitate model performance with respect to disease stage and grade. RESULTS The model achieved test set accuracy of 89.2%, specificity 86.9%, recall 93.4%, precision 79.7% and an F1 score of 86.0%. Heatmaps demonstrated that the model identified pixels corresponding to clinical features of TED. On the survey set, the ensemble model achieved accuracy, specificity, recall, precision and F1 score of 86%, 84%, 89%, 77% and 82%, respectively. 27 ophthalmologists achieved mean performance of 75%, 82%, 63%, 72% and 66%, respectively. On the second test set, the model achieved recall of 91.9%, with higher recall for moderate to severe (98.2%, n=55) and active disease (98.3%, n=60), as compared with mild (86.8%, n=68) or stable disease (85.7%, n=63). CONCLUSIONS The deep learning classifier is a novel approach to identify TED and is a first step in the development of tools to improve diagnostic accuracy and lower barriers to specialist evaluation.
Collapse
Affiliation(s)
- Justin Karlin
- Division of Orbital and Ophthalmic Plastic Surgery, Stein and Doheny Eye Institutes, University of California, Los Angeles, CA, USA
| | - Lisa Gai
- Department of Computer Science, University of California, Los Angeles, California, USA
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, California, USA
| | - Kayla Danesh
- Division of Orbital and Ophthalmic Plastic Surgery, Stein and Doheny Eye Institutes, University of California, Los Angeles, CA, USA
| | - Justin Farajzadeh
- Division of Orbital and Ophthalmic Plastic Surgery, Stein and Doheny Eye Institutes, University of California, Los Angeles, CA, USA
| | - Bea Palileo
- Division of Orbital and Ophthalmic Plastic Surgery, Stein and Doheny Eye Institutes, University of California, Los Angeles, CA, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California, Los Angeles, California, USA
| | - Jie Zheng
- Department of Computer Science, University of California, Los Angeles, California, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, California, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, California, USA
- Department of Human Genetics, University of California, Los Angeles, California, USA
| | - Daniel Rootman
- Division of Orbital and Ophthalmic Plastic Surgery, Stein and Doheny Eye Institutes, University of California, Los Angeles, CA, USA
| |
Collapse
|
6
|
Khan AH, Bagley JR, LaPierre N, Gonzalez-Figueroa C, Spencer TC, Choudhury M, Xiao X, Eskin E, Jentsch JD, Smith DJ. Genetic pathways regulating the longitudinal acquisition of cocaine self-administration in a panel of inbred and recombinant inbred mice. Cell Rep 2023; 42:112856. [PMID: 37481717 PMCID: PMC10530068 DOI: 10.1016/j.celrep.2023.112856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 06/06/2023] [Accepted: 07/10/2023] [Indexed: 07/25/2023] Open
Abstract
To identify addiction genes, we evaluate intravenous self-administration of cocaine or saline in 84 inbred and recombinant inbred mouse strains over 10 days. We integrate the behavior data with brain RNA-seq data from 41 strains. The self-administration of cocaine and that of saline are genetically distinct. We maximize power to map loci for cocaine intake by using a linear mixed model to account for this longitudinal phenotype while correcting for population structure. A total of 15 unique significant loci are identified in the genome-wide association study. A transcriptome-wide association study highlights the Trpv2 ion channel as a key locus for cocaine self-administration as well as identifying 17 additional genes, including Arhgef26, Slc18b1, and Slco5a1. We find numerous instances where alternate splice site selection or RNA editing altered transcript abundance. Our work emphasizes the importance of Trpv2, an ionotropic cannabinoid receptor, for the response to cocaine.
Collapse
Affiliation(s)
- Arshad H Khan
- Department of Molecular and Medical Pharmacology, Geffen School of Medicine, UCLA, Los Angeles, CA 90095, USA
| | - Jared R Bagley
- Department of Psychology, Binghamton University, Binghamton, NY, USA
| | - Nathan LaPierre
- Department of Computer Science, UCLA, Los Angeles, CA 90095, USA
| | | | - Tadeo C Spencer
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA 90095, USA
| | - Mudra Choudhury
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA 90095, USA
| | - Xinshu Xiao
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computational Medicine, UCLA, Los Angeles, CA 90095, USA
| | - James D Jentsch
- Department of Psychology, Binghamton University, Binghamton, NY, USA
| | - Desmond J Smith
- Department of Molecular and Medical Pharmacology, Geffen School of Medicine, UCLA, Los Angeles, CA 90095, USA.
| |
Collapse
|
7
|
LaPierre N, Fu B, Turnbull S, Eskin E, Sankararaman S. Leveraging family data to design Mendelian randomization that is provably robust to population stratification. Genome Res 2023; 33:1032-1041. [PMID: 37197991 PMCID: PMC10538495 DOI: 10.1101/gr.277664.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/16/2023] [Indexed: 05/19/2023]
Abstract
Mendelian randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases owing to weak instruments, as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We show in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, whereas standard MR methods yield inflated false positive rates. We then conduct an exploratory analysis of MR-Twin and other MR methods applied to 121 trait pairs in the UK Biobank data set. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, whereas MR-Twin is immune to this type of confounding, and that MR-Twin can help assess whether traditional approaches may be inflated owing to confounding from population stratification.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA;
| | - Boyang Fu
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Steven Turnbull
- Department of Statistics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, California 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, California 90095, USA;
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, California 90095, USA
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
8
|
LaPierre N, Fu B, Turnbull S, Eskin E, Sankararaman S. Leveraging family data to design Mendelian Randomization that is provably robust to population stratification. bioRxiv 2023:2023.01.05.522936. [PMID: 36711635 PMCID: PMC9881984 DOI: 10.1101/2023.01.05.522936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We demonstrate in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, while standard MR methods yield inflated false positive rates. We applied MR-Twin to 121 trait pairs in the UK Biobank dataset and found that MR-Twin identifies likely causal trait pairs and does not identify trait pairs that are unlikely to be causal. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, while MR-Twin is immune to this type of confounding.
Collapse
Affiliation(s)
| | - Boyang Fu
- Department of Computer Science, UCLA, Los Angeles CA
| | | | - Eleazar Eskin
- Department of Computer Science, UCLA, Los Angeles CA
- Department of Computational Medicine, UCLA, Los Angeles CA
- Department of Human Genetics, UCLA, Los Angeles CA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles CA
- Department of Computational Medicine, UCLA, Los Angeles CA
- Department of Human Genetics, UCLA, Los Angeles CA
| |
Collapse
|
9
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | | | - Till Robin Lesker
- German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany.,Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | | | | | | | | | - Jan Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University, Düsseldorf, Germany
| | - Aydin Buluç
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Bo Chen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | | | - Philip T L C Clausen
- National Food Institute, Division of Global Surveillance, Technical University of Denmark, Lyngby, Denmark
| | - Alexandru Cristian
- Drexel University, Philadelphia, PA, USA.,Google Inc., Philadelphia, PA, USA
| | - Piotr Wojciech Dabrowski
- Robert Koch-Institut, Berlin, Germany.,Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany
| | | | - Rob Egan
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Eleazar Eskin
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Melissa A Gray
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA
| | - Lars Hestbjerg Hansen
- University of Copenhagen, Department of Plant and Environmental Science, Frederiksberg, Denmark
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Pingqin Huang
- School of Computer Science, Fudan University, Shanghai, China
| | - Luiz Irber
- University of California, Davis, Davis, CA, USA
| | - Huijue Jia
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | - Tue Sparholt Jørgensen
- Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Lyngby, Denmark.,Aarhus University, Department of Environmental Science, Roskilde, Denmark
| | - Silas D Kieser
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Axel Kola
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| | - Jason Kwan
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chenhao Li
- Genome Institute of Singapore, Singapore, Singapore
| | | | - Fabio Malcher-Miranda
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Vanessa R Marcelino
- Sydney Medical School, The University of Sydney, Sydney, Australia.,Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, Australia
| | | | - Pierre Marijon
- Department of Computer Science, Inria, University of Lille, CNRS, Lille, France
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daniel R Mende
- Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland.,Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Niranjan Nagarajan
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National University of Singapore, Singapore, Singapore
| | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Vitor C Piro
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Evan R Rees
- University of Wisconsin-Madison, Madison, WI, USA
| | - Knut Reinert
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Bernhard Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
| | | | - Gail L Rosen
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA.,Center for Biological Discovery from Big Data, Philadelphia, PA, USA
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Varuni Sarwal
- University of California, Los Angeles, Los Angeles, CA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Enrico Seiler
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Lizhen Shi
- Florida Polytechnic University, Lakeland, FL, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Ashleigh Thomas
- DOE Joint Genome Institute, Berkeley, CA, USA.,University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mirko Trajkovski
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Diabetes Center, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Tremblay
- Energy, Mining and Environment, National Research Council Canada, Montreal, Quebec, Canada
| | | | | | - Zhengyang Wang
- School of Computer Science, Fudan University, Shanghai, China
| | - Ziye Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| | | | | | - Katherine Yelick
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Ronghui You
- School of Computer Science, Fudan University, Shanghai, China
| | - Georg Zeller
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | | | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Jie Zhu
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Susanne Häußler
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ariane Khaledi
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Fantin Mesny
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| | | | | | - Nathiana Smit
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Strowig
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany. .,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. .,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany. .,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany.
| |
Collapse
|
10
|
Cinelli C, LaPierre N, Hill BL, Sankararaman S, Eskin E. Robust Mendelian randomization in the presence of residual population stratification, batch effects and horizontal pleiotropy. Nat Commun 2022; 13:1093. [PMID: 35232963 DOI: 10.1101/2020.10.21.347773] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 01/14/2022] [Indexed: 05/25/2023] Open
Abstract
Mendelian Randomization (MR) studies are threatened by population stratification, batch effects, and horizontal pleiotropy. Although a variety of methods have been proposed to mitigate those problems, residual biases may still remain, leading to highly statistically significant false positives in large databases. Here we describe a suite of sensitivity analysis tools that enables investigators to quantify the robustness of their findings against such validity threats. Specifically, we propose the routine reporting of sensitivity statistics that reveal the minimal strength of violations necessary to explain away the MR results. We further provide intuitive displays of the robustness of the MR estimate to any degree of violation, and formal bounds on the worst-case bias caused by violations multiple times stronger than observed variables. We demonstrate how these tools can aid researchers in distinguishing robust from fragile findings by examining the effect of body mass index on diastolic blood pressure and Townsend deprivation index.
Collapse
Affiliation(s)
- Carlos Cinelli
- Department of Statistics, University of Washington, Seattle, WA, USA.
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, CA, USA
| | - Brian L Hill
- Department of Computer Science, University of California, Los Angeles, CA, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, USA
| |
Collapse
|
11
|
LaPierre N, Taraszka K, Huang H, He R, Hormozdiari F, Eskin E. Identifying causal variants by fine mapping across multiple studies. PLoS Genet 2021; 17:e1009733. [PMID: 34543273 PMCID: PMC8491908 DOI: 10.1371/journal.pgen.1009733] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/05/2021] [Accepted: 07/21/2021] [Indexed: 11/18/2022] Open
Abstract
Increasingly large Genome-Wide Association Studies (GWAS) have yielded numerous variants associated with many complex traits, motivating the development of "fine mapping" methods to identify which of the associated variants are causal. Additionally, GWAS of the same trait for different populations are increasingly available, raising the possibility of refining fine mapping results further by leveraging different linkage disequilibrium (LD) structures across studies. Here, we introduce multiple study causal variants identification in associated regions (MsCAVIAR), a method that extends the popular CAVIAR fine mapping framework to a multiple study setting using a random effects model. MsCAVIAR only requires summary statistics and LD as input, accounts for uncertainty in association statistics using a multivariate normal model, allows for multiple causal variants at a locus, and explicitly models the possibility of different SNP effect sizes in different populations. We demonstrate the efficacy of MsCAVIAR in both a simulation study and a trans-ethnic, trans-biobank fine mapping analysis of High Density Lipoprotein (HDL).
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, California, United States
| | - Kodi Taraszka
- Department of Computer Science, University of California, Los Angeles, California, United States
| | - Helen Huang
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, California, United States
| | - Rosemary He
- Department of Mathematics, University of California, Los Angeles, California, United States
| | - Farhad Hormozdiari
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, California, United States
- Department of Human Genetics, University of California, Los Angeles, California, United States
- Department of Computational Medicine, University of California, Los Angeles, California, United States
| |
Collapse
|
12
|
Bloom JS, Sathe L, Munugala C, Jones EM, Gasperini M, Lubock NB, Yarza F, Thompson EM, Kovary KM, Park J, Marquette D, Kay S, Lucas M, Love T, Sina Booeshaghi A, Brandenberg OF, Guo L, Boocock J, Hochman M, Simpkins SW, Lin I, LaPierre N, Hong D, Zhang Y, Oland G, Choe BJ, Chandrasekaran S, Hilt EE, Butte MJ, Damoiseaux R, Kravit C, Cooper AR, Yin Y, Pachter L, Garner OB, Flint J, Eskin E, Luo C, Kosuri S, Kruglyak L, Arboleda VA. Massively scaled-up testing for SARS-CoV-2 RNA via next-generation sequencing of pooled and barcoded nasal and saliva samples. Nat Biomed Eng 2021; 5:657-665. [PMID: 34211145 PMCID: PMC10810734 DOI: 10.1038/s41551-021-00754-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 05/20/2021] [Indexed: 02/02/2023]
Abstract
Frequent and widespread testing of members of the population who are asymptomatic for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is essential for the mitigation of the transmission of the virus. Despite the recent increases in testing capacity, tests based on quantitative polymerase chain reaction (qPCR) assays cannot be easily deployed at the scale required for population-wide screening. Here, we show that next-generation sequencing of pooled samples tagged with sample-specific molecular barcodes enables the testing of thousands of nasal or saliva samples for SARS-CoV-2 RNA in a single run without the need for RNA extraction. The assay, which we named SwabSeq, incorporates a synthetic RNA standard that facilitates end-point quantification and the calling of true negatives, and that reduces the requirements for automation, purification and sample-to-sample normalization. We used SwabSeq to perform 80,000 tests, with an analytical sensitivity and specificity comparable to or better than traditional qPCR tests, in less than two months with turnaround times of less than 24 h. SwabSeq could be rapidly adapted for the detection of other pathogens.
Collapse
Affiliation(s)
- Joshua S Bloom
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
- Octant Inc., Emeryville, CA, USA.
| | - Laila Sathe
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Chetan Munugala
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | | | | | | | | | | | | | - Dawn Marquette
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Stephania Kay
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Mark Lucas
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - TreQuan Love
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | | | - Oliver F Brandenberg
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Biological Chemistry, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Longhua Guo
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Biological Chemistry, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - James Boocock
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Biological Chemistry, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | | | | | - Isabella Lin
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Nathan LaPierre
- Department of Computer Science, Samueli School of Engineering, UCLA, Los Angeles, CA, USA
| | - Duke Hong
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Yi Zhang
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Gabriel Oland
- Department of Surgery, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Bianca Judy Choe
- Department of Emergency Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Sukantha Chandrasekaran
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Evann E Hilt
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Manish J Butte
- Department of Pediatrics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Microbiology, Immunology & Molecular Genetics, UCLA, Los Angeles, CA, USA
| | - Robert Damoiseaux
- California NanoSystems Institute, UCLA, Los Angeles, CA, USA
- Department of Bioengineering, Samueli School of Engineering, UCLA, Los Angeles, CA, USA
- Department of Medical and Molecular Pharmacology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Clifford Kravit
- Department of Digital Technology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | | | - Yi Yin
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Lior Pachter
- Division of Biology and Bioengineering, Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA
| | - Omai B Garner
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Jonathan Flint
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
- Department of Computer Science, Samueli School of Engineering, UCLA, Los Angeles, CA, USA
| | - Chongyuan Luo
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Sriram Kosuri
- Octant Inc., Emeryville, CA, USA.
- Department of Chemistry and Biochemistry, UCLA, Los Angeles, CA, USA.
| | - Leonid Kruglyak
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
- Department of Biological Chemistry, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| | - Valerie A Arboleda
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
13
|
Bloom JS, Sathe L, Munugala C, Jones EM, Gasperini M, Lubock NB, Yarza F, Thompson EM, Kovary KM, Park J, Marquette D, Kay S, Lucas M, Love T, Booeshaghi AS, Brandenberg OF, Guo L, Boocock J, Hochman M, Simpkins SW, Lin I, LaPierre N, Hong D, Zhang Y, Oland G, Choe BJ, Chandrasekaran S, Hilt EE, Butte MJ, Damoiseaux R, Kravit C, Cooper AR, Yin Y, Pachter L, Garner OB, Flint J, Eskin E, Luo C, Kosuri S, Kruglyak L, Arboleda VA. Swab-Seq: A high-throughput platform for massively scaled up SARS-CoV-2 testing. medRxiv 2021. [PMID: 32909008 PMCID: PMC7480060 DOI: 10.1101/2020.08.04.20167874] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The rapid spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is due to the high rates of transmission by individuals who are asymptomatic at the time of transmission1,2. Frequent, widespread testing of the asymptomatic population for SARS-CoV-2 is essential to suppress viral transmission. Despite increases in testing capacity, multiple challenges remain in deploying traditional reverse transcription and quantitative PCR (RT-qPCR) tests at the scale required for population screening of asymptomatic individuals. We have developed SwabSeq, a high-throughput testing platform for SARS-CoV-2 that uses next-generation sequencing as a readout. SwabSeq employs sample-specific molecular barcodes to enable thousands of samples to be combined and simultaneously analyzed for the presence or absence of SARS-CoV-2 in a single run. Importantly, SwabSeq incorporates an in vitro RNA standard that mimics the viral amplicon, but can be distinguished by sequencing. This standard allows for end-point rather than quantitative PCR, improves quantitation, reduces requirements for automation and sample-to-sample normalization, enables purification-free detection, and gives better ability to call true negatives. After setting up SwabSeq in a high-complexity CLIA laboratory, we performed more than 80,000 tests for COVID-19 in less than two months, confirming in a real world setting that SwabSeq inexpensively delivers highly sensitive and specific results at scale, with a turn-around of less than 24 hours. Our clinical laboratory uses SwabSeq to test both nasal and saliva samples without RNA extraction, while maintaining analytical sensitivity comparable to or better than traditional RT-qPCR tests. Moving forward, SwabSeq can rapidly scale up testing to mitigate devastating spread of novel pathogens.
Collapse
Affiliation(s)
- Joshua S Bloom
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Howard Hughes Medical Institute, HHMI.,Octant, Inc
| | - Laila Sathe
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA
| | - Chetan Munugala
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Howard Hughes Medical Institute, HHMI
| | | | | | | | | | | | | | | | - Dawn Marquette
- Department of Computational Medicine, David Geffen School of Medicine, UCLA
| | - Stephania Kay
- Department of Computational Medicine, David Geffen School of Medicine, UCLA
| | - Mark Lucas
- Department of Computational Medicine, David Geffen School of Medicine, UCLA
| | - TreQuan Love
- Department of Computational Medicine, David Geffen School of Medicine, UCLA
| | | | - Oliver F Brandenberg
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Howard Hughes Medical Institute, HHMI.,Department of Biological Chemistry, David Geffen School of Medicine, UCLA
| | - Longhua Guo
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Howard Hughes Medical Institute, HHMI.,Department of Biological Chemistry, David Geffen School of Medicine, UCLA
| | - James Boocock
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Howard Hughes Medical Institute, HHMI.,Department of Biological Chemistry, David Geffen School of Medicine, UCLA
| | | | | | - Isabella Lin
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA
| | - Nathan LaPierre
- Department of Computer Science, Samueli School of Engineering, UCLA
| | - Duke Hong
- Department of Computational Medicine, David Geffen School of Medicine, UCLA
| | - Yi Zhang
- Department of Human Genetics, David Geffen School of Medicine, UCLA
| | - Gabriel Oland
- Department of Surgery, David Geffen School of Medicine, UCLA
| | - Bianca Judy Choe
- Department of Emergency Medicine, David Geffen School of Medicine, UCLA
| | | | - Evann E Hilt
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA
| | - Manish J Butte
- Department of Pediatrics, David Geffen School of Medicine, UCLA.,Department of Microbiology, Immunology & Molecular Genetics, David Geffen School of Medicine, UCLA
| | - Robert Damoiseaux
- California NanoSystems Institute, UCLA.,Department of Bioengineering, Samueli School of Engineering, UCLA.,David Geffen School of Medicine, Research Information Technology
| | - Clifford Kravit
- David Geffen School of Medicine, Research Information Technology
| | | | - Yi Yin
- Department of Human Genetics, David Geffen School of Medicine, UCLA
| | - Lior Pachter
- Division of Biology and Bioengineering & Department of Computing and Mathematical Sciences, Caltech
| | - Omai B Garner
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA
| | - Jonathan Flint
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, UCLA
| | - Eleazar Eskin
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Department of Computer Science, Samueli School of Engineering, UCLA.,Department of Computational Medicine, David Geffen School of Medicine, UCLA
| | - Chongyuan Luo
- Department of Human Genetics, David Geffen School of Medicine, UCLA
| | - Sriram Kosuri
- Octant, Inc.,Department of Chemistry and Biochemistry, UCLA
| | - Leonid Kruglyak
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Howard Hughes Medical Institute, HHMI.,Department of Biological Chemistry, David Geffen School of Medicine, UCLA
| | - Valerie A Arboleda
- Department of Human Genetics, David Geffen School of Medicine, UCLA.,Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, UCLA
| |
Collapse
|
14
|
LaPierre N, Alser M, Eskin E, Koslicki D, Mangul S. Metalign: efficient alignment-based metagenomic profiling via containment min hash. Genome Biol 2020; 21:242. [PMID: 32912225 PMCID: PMC7488264 DOI: 10.1186/s13059-020-02159-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 08/26/2020] [Indexed: 12/31/2022] Open
Abstract
Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA.
| | - Mohammed Alser
- Department of Computer Science, ETH Zurich, Rämistrasse 101, CH-8092, Zurich, Switzerland
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biology, The Pennsylvania State University, University Park, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park,, PA, USA.
| | - Serghei Mangul
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
15
|
Rahman MA, LaPierre N, Rangwala H. Phenotype Prediction from Metagenomic Data Using Clustering and Assembly with Multiple Instance Learning (CAMIL). IEEE/ACM Trans Comput Biol Bioinform 2020; 17:828-840. [PMID: 28981422 DOI: 10.1109/tcbb.2017.2758782] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The recent advent of Metagenome Wide Association Studies (MGWAS) provides insight into the role of microbes on human health and disease. However, the studies present several computational challenges. In this paper, we demonstrate a novel, efficient, and effective Multiple Instance Learning (MIL) based computational pipeline to predict patient phenotype from metagenomic data. MIL methods have the advantage that besides predicting the clinical phenotype, we can infer the instance level label or role of microbial sequence reads in the specific disease. Specifically, we use a Bag of Words method, which has been shown to be one of the most effective and efficient MIL methods. This involves assembly of the metagenomic sequence data, clustering of the assembled contigs, extracting features from the contigs, and using an SVM classifier to predict patient labels and identify the most relevant sequence clusters. With the exception of the given labels for the patients, this entire process is de novo (unsupervised). We call our pipeline "CAMIL", which stands for Clustering and Assembly with Multiple Instance Learning. We use multiple state-of-the-art clustering methods for feature extraction, evaluation, and comparison of the performance of our proposed approach for each of these clustering methods. We also present a fast and scalable pre-clustering algorithm as a preprocessing step for our proposed pipeline. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using locality sensitive hashing (LSH). These canopies are then refined by using state-of-the-art sequence clustering algorithms. We use data from a well-known MGWAS study of patients with Type-2 Diabetes and show that our pipeline significantly outperforms the classifier used in that paper, as well as other common MIL methods.
Collapse
|
16
|
Bondar G, Bao T, Kurani M, Oh E, Patel K, Shah K, Nelson S, Savvidou S, Kupiec-Weglinsky S, Fadly G, Higuchi E, Silacheva I, LaPierre N, Li Z, Genewick K, Yu S, Grogan T, Elashoff D, Wang W, Ping P, Rossetti M, Reed E, Li X, Deng M. Exercise-Induced Genomic and Transcriptomic Changes in Heart Failure. J Heart Lung Transplant 2020. [DOI: 10.1016/j.healun.2020.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
17
|
Abstract
Background Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. Results Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. Conclusions MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Rob Egan
- Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - Wei Wang
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA. .,EGSB Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. .,School of Natural Sciences, University of California at Merced, Merced, CA, 95343, USA.
| |
Collapse
|
18
|
LaPierre N, Ju CJT, Zhou G, Wang W. MetaPheno: A critical evaluation of deep learning and machine learning in metagenome-based disease prediction. Methods 2019; 166:74-82. [PMID: 30885720 PMCID: PMC6708502 DOI: 10.1016/j.ymeth.2019.03.003] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 02/14/2019] [Accepted: 03/04/2019] [Indexed: 01/21/2023] Open
Abstract
The human microbiome plays a number of critical roles, impacting almost every aspect of human health and well-being. Conditions in the microbiome have been linked to a number of significant diseases. Additionally, revolutions in sequencing technology have led to a rapid increase in publicly-available sequencing data. Consequently, there have been growing efforts to predict disease status from metagenomic sequencing data, with a proliferation of new approaches in the last few years. Some of these efforts have explored utilizing a powerful form of machine learning called deep learning, which has been applied successfully in several biological domains. Here, we review some of these methods and the algorithms that they are based on, with a particular focus on deep learning methods. We also perform a deeper analysis of Type 2 Diabetes and obesity datasets that have eluded improved results, using a variety of machine learning and feature extraction methods. We conclude by offering perspectives on study design considerations that may impact results and future directions the field can take to improve results and offer more valuable conclusions. The scripts and extracted features for the analyses conducted in this paper are available via GitHub:https://github.com/nlapier2/metapheno.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Chelsea J-T Ju
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Guangyu Zhou
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Wei Wang
- Department of Computer Science, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| |
Collapse
|
19
|
LaPierre N, Mangul S, Alser M, Mandric I, Wu NC, Koslicki D, Eskin E. MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples. BMC Genomics 2019; 20:423. [PMID: 31167634 PMCID: PMC6551237 DOI: 10.1186/s12864-019-5699-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes. Results Here we present a method, MiCoP (Microbiome Community Profiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project. Conclusions MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at: https://github.com/smangul1/MiCoP.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA
| | - Serghei Mangul
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA.
| | - Mohammed Alser
- Department of Computer Science, ETH Zürich, Zürich, 8092, Switzerland
| | - Igor Mandric
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA92037, USA
| | - David Koslicki
- Department of Mathematics, Oregon State University, Corvallis, 97331, OR, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA.,Department of Human Genetics, University of California, Los Angeles, 90095, CA, USA
| |
Collapse
|
20
|
Abstract
Metagenomics is the collective sequencing of co-existing microbial communities which are ubiquitous across various clinical and ecological environments. Due to the large volume and random short sequences (reads) obtained from community sequences, analysis of diversity, abundance and functions of different organisms within these communities are challenging tasks. We present a fast and scalable clustering algorithm for analyzing large-scale metagenome sequence data. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using hashing. These canopies are then refined by using state-of-the-art sequence clustering algorithms. This canopy-clustering (CC) algorithm can be used as a pre-processing phase for computationally expensive clustering algorithms. We use and compare three hashing schemes for canopy construction with five popular and state-of-the-art sequence clustering methods. We evaluate our clustering algorithm on synthetic and real-world 16S and whole metagenome benchmarks. We demonstrate the ability of our proposed approach to determine meaningful Operational Taxonomic Units (OTU) and observe significant speedup with regards to run time when compared to different clustering algorithms. We also make our source code publicly available on Github. a.
Collapse
Affiliation(s)
| | - Nathan LaPierre
- † Department of Computer Science, University of California, Los Angeles, California, USA
| | - Huzefa Rangwala
- * Department of Computer Science, George Mason University, Fairfax, Virginia, USA
| | - Daniel Barbara
- * Department of Computer Science, George Mason University, Fairfax, Virginia, USA
| |
Collapse
|
21
|
Abstract
Norepinephrine (NE) overflow from field-stimulated rat was deferens preparations was quantified directly by electrochemical detection using high performance liquid chromatography. The effect of agonist (BHT 920) and antagonist (rauwolscine) of prejunctional alpha 2-adrenoceptors on NE overflow was assessed and compared with their effect on the smooth muscle mechanical response to field stimulation. Increasing the stimulation frequency from 2 to 30 Hz resulted in an increase in muscle tension together with an increase in NE overflow. Addition of 1 microM rauwolscine to the medium resulted in a significant increase in muscle contraction to field stimulation which reached a maximum at 5 Hz. On the other hand, NE overflow increased linearly with the frequency of stimulation within the range studied. Addition of 0.1 microM BHT 920 to the medium significantly decreased the amplitude of contractions at lower stimulation frequencies (2 to 10 Hz) but elicited no significant changes at high frequencies. BHT 920 did not significantly affect NE overflow for all range of stimulation frequency. The simultaneous recording of field-stimulation induced contractions and NE overflow indicates that in the rat vas deferens, rauwolscine acts like a pure alpha 2 adrenoceptor antagonist at a prejunctional level. BHT 920 did not appear to affect selectively prejunctional alpha 2 adrenoceptors but also may activate postjunctional alpha 1 adrenoceptors.
Collapse
Affiliation(s)
- N LaPierre
- Department of Biomedical Sciences, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
| | | | | | | | | | | |
Collapse
|