1
|
El Nahhas OSM, Loeffler CML, Carrero ZI, van Treeck M, Kolbinger FR, Hewitt KJ, Muti HS, Graziani M, Zeng Q, Calderaro J, Ortiz-Brüchle N, Yuan T, Hoffmeister M, Brenner H, Brobeil A, Reis-Filho JS, Kather JN. Regression-based Deep-Learning predicts molecular biomarkers from pathology slides. Nat Commun 2024; 15:1253. [PMID: 38341402 PMCID: PMC10858881 DOI: 10.1038/s41467-024-45589-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 01/29/2024] [Indexed: 02/12/2024] Open
Abstract
Deep Learning (DL) can predict biomarkers from cancer histopathology. Several clinically approved applications use this technology. Most approaches, however, predict categorical labels, whereas biomarkers are often continuous measurements. We hypothesize that regression-based DL outperforms classification-based DL. Therefore, we develop and evaluate a self-supervised attention-based weakly supervised regression method that predicts continuous biomarkers directly from 11,671 images of patients across nine cancer types. We test our method for multiple clinically and biologically relevant biomarkers: homologous recombination deficiency score, a clinically used pan-cancer biomarker, as well as markers of key biological processes in the tumor microenvironment. Using regression significantly enhances the accuracy of biomarker prediction, while also improving the predictions' correspondence to regions of known clinical relevance over classification. In a large cohort of colorectal cancer patients, regression-based prediction scores provide a higher prognostic value than classification-based scores. Our open-source regression approach offers a promising alternative for continuous biomarker analysis in computational pathology.
Collapse
Grants
- P30 CA008748 NCI NIH HHS
- JNK is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111) and the Max-Eder-Programme of the German Cancer Aid (grant #70113864), the German Federal Ministry of Education and Research (PEARL, 01KD2104C; CAMINO, 01EO2101; SWAG, 01KD2215A; TRANSFORM LIVER, 031L0312A), the German Academic Exchange Service (SECAI, 57616814), the German Federal Joint Committee (Transplant.KI, 01VSF21048) the European Union (ODELIA, 101057091; GENIAL, 101096312) and the National Institute for Health and Care Research (NIHR, NIHR213331) Leeds Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Collapse
Affiliation(s)
- Omar S M El Nahhas
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Chiara M L Loeffler
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Department of Medicine 1, University Hospital and Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Zunamys I Carrero
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Marko van Treeck
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Fiona R Kolbinger
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Katherine J Hewitt
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Hannah S Muti
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- Department of Visceral, Thoracic and Vascular Surgery, University Hospital and Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Mara Graziani
- University of Applied Sciences of Western Switzerland (HES-SO Valais), Rue du Technopole 3, 3960, Sierre, Valais, Switzerland
| | - Qinghe Zeng
- Centre d'Histologie, d'Imagerie et de Cytométrie (CHIC), Centre de Recherche des Cordeliers, INSERM, Sorbonne Université, Université Paris Cité, Paris, France
| | - Julien Calderaro
- Assistance Publique-Hôpitaux de Paris, Département de Pathologie, CHU Henri Mondor, F-94000, Créteil, France
| | - Nadina Ortiz-Brüchle
- Institute of Pathology, University Hospital RWTH Aachen, Aachen, Germany
- Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf (CIO ABCD), Cologne, Germany
| | - Tanwei Yuan
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alexander Brobeil
- Institute of Pathology, University Hospital Heidelberg, 69120, Heidelberg, Germany
- Tissue Bank, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, 69120, Heidelberg, Germany
| | - Jorge S Reis-Filho
- Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Department of Medicine 1, University Hospital and Faculty of Medicine Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
- Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, United Kingdom.
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
| |
Collapse
|
2
|
Becker D, Champredon D, Chato C, Gugan G, Poon A. SUP: a probabilistic framework to propagate genome sequence uncertainty, with applications. NAR Genom Bioinform 2023; 5:lqad038. [PMID: 37101658 PMCID: PMC10124968 DOI: 10.1093/nargab/lqad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 02/15/2023] [Accepted: 04/06/2023] [Indexed: 04/28/2023] Open
Abstract
Genetic sequencing is subject to many different types of errors, but most analyses treat the resultant sequences as if they are known without error. Next generation sequencing methods rely on significantly larger numbers of reads than previous sequencing methods in exchange for a loss of accuracy in each individual read. Still, the coverage of such machines is imperfect and leaves uncertainty in many of the base calls. In this work, we demonstrate that the uncertainty in sequencing techniques will affect downstream analysis and propose a straightforward method to propagate the uncertainty. Our method (which we have dubbed Sequence Uncertainty Propagation, or SUP) uses a probabilistic matrix representation of individual sequences which incorporates base quality scores as a measure of uncertainty that naturally lead to resampling and replication as a framework for uncertainty propagation. With the matrix representation, resampling possible base calls according to quality scores provides a bootstrap- or prior distribution-like first step towards genetic analysis. Analyses based on these re-sampled sequences will include a more complete evaluation of the error involved in such analyses. We demonstrate our resampling method on SARS-CoV-2 data. The resampling procedures add a linear computational cost to the analyses, but the large impact on the variance in downstream estimates makes it clear that ignoring this uncertainty may lead to overly confident conclusions. We show that SARS-CoV-2 lineage designations via Pangolin are much less certain than the bootstrap support reported by Pangolin would imply and the clock rate estimates for SARS-CoV-2 are much more variable than reported.
Collapse
Affiliation(s)
- Devan Becker
- To whom correspondence should be addressed. Tel: +1 519 884 1970 (Ext 2464);
| | | | - Connor Chato
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Gopi Gugan
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Art Poon
- Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
3
|
Severson AL, Korneliussen TS, Moltke I. LocalNgsRelate: a software tool for inferring IBD sharing along the genome between pairs of individuals from low-depth NGS data. Bioinformatics 2022; 38:1159-1161. [PMID: 34718411 PMCID: PMC8796377 DOI: 10.1093/bioinformatics/btab732] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 09/28/2021] [Accepted: 10/24/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Inference of identity-by-descent (IBD) sharing along the genome between pairs of individuals has important uses. But all existing inference methods are based on genotypes, which is not ideal for low-depth Next Generation Sequencing (NGS) data from which genotypes can only be called with high uncertainty. RESULTS We present a new probabilistic software tool, LocalNgsRelate, for inferring IBD sharing along the genome between pairs of individuals from low-depth NGS data. Its inference is based on genotype likelihoods instead of genotypes, and thereby it takes the uncertainty of the genotype calling into account. Using real data from the 1000 Genomes project, we show that LocalNgsRelate provides more accurate IBD inference for low-depth NGS data than two state-of-the-art genotype-based methods, Albrechtsen et al. (2009) and hap-IBD. We also show that the method works well for NGS data down to a depth of 2×. AVAILABILITY AND IMPLEMENTATION LocalNgsRelate is freely available at https://github.com/idamoltke/LocalNgsRelate. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alissa L Severson
- Department of Genetics, Stanford University, Stanford, CA 94305-5020, USA
| | | | - Ida Moltke
- Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
4
|
Zheng H, Zhao X, Wang H, Ding Y, Lu X, Zhang G, Yang J, Wang L, Zhang H, Bai Y, Li J, Wu J, Jiang Y, Xu L. Location deviations of DNA functional elements affected SNP mapping in the published databases and references. Brief Bioinform 2021; 21:1293-1301. [PMID: 31392334 DOI: 10.1093/bib/bbz073] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 05/24/2019] [Accepted: 05/27/2019] [Indexed: 12/20/2022] Open
Abstract
The recent extensive application of next-generation sequencing has led to the rapid accumulation of multiple types of data for functional DNA elements. With the advent of precision medicine, the fine-mapping of risk loci based on these elements has become of paramount importance. In this study, we obtained the human reference genome (GRCh38) and the main DNA sequence elements, including protein-coding genes, miRNAs, lncRNAs and single nucleotide polymorphism flanking sequences, from different repositories. We then realigned these elements to identify their exact locations on the genome. Overall, 5%-20% of all sequence element locations deviated among databases, on the scale of kilobase-pair to megabase-pair. These deviations even affected the selection of genome-wide association study risk-associated genes. Our results implied that the location information for functional DNA elements may deviate among public databases. Researchers should take care when using cross-database sources and should perform pilot sequence alignments before element location-based studies.
Collapse
Affiliation(s)
- Hewei Zheng
- Harbin Medical University and Wenzhou Medical University
| | - Xueying Zhao
- Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences
| | - Hong Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, P R China
| | - Yu Ding
- Harbin Medical University and Wenzhou Medical University
| | - Xiaoyan Lu
- Harbin Medical University and Wenzhou Medical University
| | - Guosi Zhang
- Harbin Medical University and Wenzhou Medical University
| | - Jiaxin Yang
- Harbin Medical University and Wenzhou Medical University
| | - Lianzong Wang
- Harbin Medical University and Wenzhou Medical University
| | - Haotian Zhang
- Harbin Medical University and Wenzhou Medical University
| | - Yu Bai
- Harbin Medical University and Wenzhou Medical University
| | - Jing Li
- Harbin Medical University and Wenzhou Medical University
| | - Jingqi Wu
- Harbin Medical University and Wenzhou Medical University
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, P. R. China
| | - Liangde Xu
- School of Ophthalmology & Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou 325027, and Training Center for Students Innovation and Entrepreneurship Education, Harbin Medical University, Harbin 150081, P. R. China
| |
Collapse
|
5
|
Jiao F, Guo R, Beckmann JS, Yan Z, Yang Y, Hu J, Wang X, Xie S. Great future or greedy venture: Precision medicine needs philosophy. Health Sci Rep 2021; 4:e376. [PMID: 34541334 PMCID: PMC8439431 DOI: 10.1002/hsr2.376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 08/06/2021] [Accepted: 08/16/2021] [Indexed: 11/07/2022] Open
Abstract
INTRODUCTION Over the past decade, we have witnessed the initiation and implementation of precision medicine (PM), a discipline that promises to individualize and personalize medical management and treatment, rendering them ultimately more precise and effective. Despite of the continuing advances and numerous clinical applications, the potential of PM remains highly controversial, sparking heated debates about its future. METHOD The present article reviews the philosophical issues and practical challenges that are critical to the feasibility and implementation of PM. OUTCOME The explanation and argument about the relations between PM and computability, uncertainty as well as complexity, show that key foundational assumptions of PM might not be fully validated. CONCLUSION The present analysis suggests that our current understanding of PM is probably oversimplified and too superficial. More efforts are needed to realize the hope that PM has elicited, rather than make the term just as a hype.
Collapse
Affiliation(s)
- Fei Jiao
- Department of Biochemistry and Molecular BiologyBinzhou Medical UniversityYantaiChina
| | - Ruoyu Guo
- Department of Biochemistry and Molecular BiologyBinzhou Medical UniversityYantaiChina
| | | | - Zhonghai Yan
- Department of Medicine, College of Physicians and SurgeonsColumbia UniversityNew YorkNew YorkUSA
| | - Yun Yang
- Department of Biochemistry and Molecular BiologyBinzhou Medical UniversityYantaiChina
| | - Jinxia Hu
- Department of Biochemistry and Molecular BiologyBinzhou Medical UniversityYantaiChina
| | - Xin Wang
- Department of Clinical Laboratory & Center of Health Service Training970 Hospital of the PLA Joint Logistic Support ForceYantaiChina
| | - Shuyang Xie
- Department of Biochemistry and Molecular BiologyBinzhou Medical UniversityYantaiChina
| |
Collapse
|
6
|
Toquenaga Y, Gagné T. The Evidential Statistics of Genetic Assembly: Bootstrapping a Reference Sequence. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.614374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The reference sequences play an essential role in genome assembly, like type specimens in taxonomy. Those references are also samples obtained at some time and location with a specific method. How can we evaluate or discriminate uncertainties of the reference itself and assembly methods? Here we bootstrapped 50 random read data sets from a small circular genome of a Escherichia coli bacteriophage, phiX174, and tried to reconstruct the reference with 14 free assembly programs. Nine out of 14 assembly programs were capable of circular genome reconstruction. Unicycler correctly reconstructed the reference for 44 out of 50 data sets, but each reconstructed contig of the failed six data sets had minor defects. The other assembly software could reconstruct the reference with minor defects. The defect regions differed among the assembly programs, and the defect locations were far from randomly distributed in the reference genome. All contigs of Trinity included one, but Minia had two perfect copies other than an imperfect reference copy. The centroid of contigs for assembly programs except Unicycler differed from the reference with 75bases at most. Nonmetric multidimensional scaling (NMDS) plots of the centroids indicated that even the reference sequence was located slightly off from the estimated location of the true reference. We propose that the combination of bootstrapping a reference, making consensus contigs as centroids in an edit distance, and NMDS plotting will provide an evidential statistic way of genetic assembly for non-fragmented base sequences.
Collapse
|
7
|
Colella JP, Tigano A, Dudchenko O, Omer AD, Khan R, Bochkov ID, Aiden EL, MacManes MD. Limited Evidence for Parallel Evolution Among Desert-Adapted Peromyscus Deer Mice. J Hered 2021; 112:286-302. [PMID: 33686424 PMCID: PMC8141686 DOI: 10.1093/jhered/esab009] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 02/27/2021] [Indexed: 01/14/2023] Open
Abstract
Warming climate and increasing desertification urge the identification of genes involved in heat and dehydration tolerance to better inform and target biodiversity conservation efforts. Comparisons among extant desert-adapted species can highlight parallel or convergent patterns of genome evolution through the identification of shared signatures of selection. We generate a chromosome-level genome assembly for the canyon mouse (Peromyscus crinitus) and test for a signature of parallel evolution by comparing signatures of selective sweeps across population-level genomic resequencing data from another congeneric desert specialist (Peromyscus eremicus) and a widely distributed habitat generalist (Peromyscus maniculatus), that may be locally adapted to arid conditions. We identify few shared candidate loci involved in desert adaptation and do not find support for a shared pattern of parallel evolution. Instead, we hypothesize divergent molecular mechanisms of desert adaptation among deer mice, potentially tied to species-specific historical demography, which may limit or enhance adaptation. We identify a number of candidate loci experiencing selective sweeps in the P. crinitus genome that are implicated in osmoregulation (Trypsin, Prostasin) and metabolic tuning (Kallikrein, eIF2-alpha kinase GCN2, APPL1/2), which may be important for accommodating hot and dry environmental conditions.
Collapse
Affiliation(s)
- Jocelyn P Colella
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH.,Hubbard Genome Center, University of New Hampshire, Durham, NH.,Biodiversity Institute, University of Kansas, Lawrence, KS
| | - Anna Tigano
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH.,Hubbard Genome Center, University of New Hampshire, Durham, NH
| | - Olga Dudchenko
- Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX.,Center for Theoretical and Biological Physics, Rice University, Houston, TX.,Department of Computer Science, Department of Computational and Applied Mathematics, Rice University, Houston, TX
| | - Arina D Omer
- Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Ruqayya Khan
- Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX.,Department of Computer Science, Department of Computational and Applied Mathematics, Rice University, Houston, TX
| | - Ivan D Bochkov
- Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX.,Department of Computer Science, Department of Computational and Applied Mathematics, Rice University, Houston, TX
| | - Erez L Aiden
- Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX.,Center for Theoretical and Biological Physics, Rice University, Houston, TX.,Department of Computer Science, Department of Computational and Applied Mathematics, Rice University, Houston, TX.,Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, Shanghai 201210, China.,School of Agriculture and Environment, University of Western Australia, Perth, WA, Australia
| | - Matthew D MacManes
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH.,Hubbard Genome Center, University of New Hampshire, Durham, NH
| |
Collapse
|
8
|
High-Throughput Genotyping Technologies in Plant Taxonomy. Methods Mol Biol 2021; 2222:149-166. [PMID: 33301093 DOI: 10.1007/978-1-0716-0997-2_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Molecular markers provide researchers with a powerful tool for variation analysis between plant genomes. They are heritable and widely distributed across the genome and for this reason have many applications in plant taxonomy and genotyping. Over the last decade, molecular marker technology has developed rapidly and is now a crucial component for genetic linkage analysis, trait mapping, diversity analysis, and association studies. This chapter focuses on molecular marker discovery, its application, and future perspectives for plant genotyping through pangenome assemblies. Included are descriptions of automated methods for genome and sequence distance estimation, genome contaminant analysis in sequence reads, genome structural variation, and SNP discovery methods.
Collapse
|
9
|
Patrone PN, Kearsley AJ, Majikes JM, Liddle JA. Analysis and uncertainty quantification of DNA fluorescence melt data: Applications of affine transformations. Anal Biochem 2020; 607:113773. [PMID: 32526200 DOI: 10.1016/j.ab.2020.113773] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 04/22/2020] [Accepted: 05/10/2020] [Indexed: 12/17/2022]
Abstract
Fluorescence-based measurements are a standard tool for characterizing the thermodynamic properties of DNA systems. Nonetheless, experimental melt data obtained from polymerase chain-reaction (PCR) machines (for example) often leads to signals that vary significantly between datasets. In many cases, this lack of reproducibility has led to difficulties in analyzing results and computing reasonable uncertainty estimates. To address this problem, we propose a data analysis procedure based on constrained, convex optimization of affine transformations, which can determine when and how melt curves collapse onto one another. A key aspect of this approach is its ability to provide a reproducible and more objective measure of whether a collection of datasets yields a consistent "universal" signal according to an appropriate model of the raw signals. Importantly, integrating this validation step into the analysis hardens the measurement protocol by allowing one to identify experimental conditions and/or modeling assumptions that may corrupt a measurement. Moreover, this robustness facilitates extraction of thermodynamic information at no additional cost in experimental time. We illustrate and test our approach on experiments of Förster resonance energy transfer (FRET) pairs used study the thermodynamics of DNA loops.
Collapse
|
10
|
Mistry HB, Orrell D. Small Models for Big Data. Clin Pharmacol Ther 2020; 107:710-711. [PMID: 31994177 DOI: 10.1002/cpt.1770] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/17/2019] [Indexed: 11/05/2022]
Affiliation(s)
- Hitesh B Mistry
- Division of Pharmacy, University of Manchester, Manchester, UK
| | | |
Collapse
|
11
|
Ribeiro AH, Soler JMP, Hirata R. Variance-Preserving Estimation of Intensity Values Obtained From Omics Experiments. Front Genet 2019; 10:855. [PMID: 31616468 PMCID: PMC6764481 DOI: 10.3389/fgene.2019.00855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 08/16/2019] [Indexed: 11/29/2022] Open
Abstract
Faced with the lack of reliability and reproducibility in omics studies, more careful and robust methods are needed to overcome the existing challenges in the multi-omics analysis. In conventional omics data analysis, signal intensity values (denoted by M and values) are estimated neglecting pixel-level uncertainties, which may reflect noise and systematic artifacts. For example, intensity values from two-color microarray data are estimated by taking the mean or median of the pixel intensities within the spot and then subjected to a within-slide normalization by LOWESS. Thus, focusing on estimation and normalization of gene expression profiles, we propose a spot quantification method that takes into account pixel-level variability. Also, to preserve relevant variation that may be removed in LOWESS normalization with poorly chosen parameters, we propose a parameter selection method that is parsimonious and considers intrinsic characteristics of microarray data, such as heteroskedasticity. The usefulness of the proposed methods is illustrated by an application to real intestinal metaplasia data. Compared with the conventional approaches, the analysis is more robust and conservative, identifying fewer but more reliable differentially expressed genes. Also, the variability preservation allowed the identification of new differentially expressed genes. Using the proposed approach, we have identified differentially expressed genes involved in pathways in cancer and confirmed some molecular markers already reported in the literature.
Collapse
Affiliation(s)
- Adèle H. Ribeiro
- Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil
- *Correspondence: Adèle H. Ribeiro,
| | - Julia Maria Pavan Soler
- Department of Statistics, Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil
| | - Roberto Hirata
- Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil
| |
Collapse
|
12
|
Scott R, Zhan A, Brown EA, Chain FJJ, Cristescu ME, Gras R, MacIsaac HJ. Optimization and performance testing of a sequence processing pipeline applied to detection of nonindigenous species. Evol Appl 2018; 11:891-905. [PMID: 29928298 PMCID: PMC5999198 DOI: 10.1111/eva.12604] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Accepted: 01/20/2018] [Indexed: 01/10/2023] Open
Abstract
Genetic taxonomic assignment can be more sensitive than morphological taxonomic assignment, particularly for small, cryptic or rare species. Sequence processing is essential to taxonomic assignment, but can also produce errors because optimal parameters are not known a priori. Here, we explored how sequence processing parameters influence taxonomic assignment of 18S sequences from bulk zooplankton samples produced by 454 pyrosequencing. We optimized a sequence processing pipeline for two common research goals, estimation of species richness and early detection of aquatic invasive species (AIS), and then tested most optimal models' performances through simulations. We tested 1,050 parameter sets on 18S sequences from 20 AIS to determine optimal parameters for each research goal. We tested optimized pipelines' performances (detectability and sensitivity) by computationally inoculating sequences of 20 AIS into ten bulk zooplankton samples from ports across Canada. We found that optimal parameter selection generally depends on the research goal. However, regardless of research goal, we found that metazoan 18S sequences produced by 454 pyrosequencing should be trimmed to 375-400 bp and sequence quality filtering should be relaxed (1.5 ≤ maximum expected error ≤ 3.0, Phred score = 10). Clustering and denoising were only viable for estimating species richness, because these processing steps made some species undetectable at low sequence abundances which would not be useful for early detection of AIS. With parameter sets optimized for early detection of AIS, 90% of AIS were detected with fewer than 11 target sequences, regardless of whether clustering or denoising was used. Despite developments in next-generation sequencing, sequence processing remains an important issue owing to difficulties in balancing false-positive and false-negative errors in metabarcoding data.
Collapse
Affiliation(s)
- Ryan Scott
- School of Computer ScienceUniversity of WindsorWindsorONCanada
| | - Aibin Zhan
- Research Centre for Eco‐Environmental SciencesChinese Academy of SciencesHaidan DistrictBeijingChina
| | | | - Frédéric J. J. Chain
- Department of BiologyMcGill UniversityMontrealQCCanada
- Present address:
Frédéric J. J. Chain, Department of Biological SciencesUniversity of Massachusetts LowellLowellMAUSA
| | | | - Robin Gras
- School of Computer ScienceUniversity of WindsorWindsorONCanada
- Great Lakes Institute for Environmental ResearchUniversity of WindsorWindsorONCanada
| | - Hugh J. MacIsaac
- Great Lakes Institute for Environmental ResearchUniversity of WindsorWindsorONCanada
| |
Collapse
|
13
|
Bengtsson-Palme J, Larsson DGJ, Kristiansson E. Using metagenomics to investigate human and environmental resistomes. J Antimicrob Chemother 2018; 72:2690-2703. [PMID: 28673041 DOI: 10.1093/jac/dkx199] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Antibiotic resistance is a global health concern declared by the WHO as one of the largest threats to modern healthcare. In recent years, metagenomic DNA sequencing has started to be applied as a tool to study antibiotic resistance in different environments, including the human microbiota. However, a multitude of methods exist for metagenomic data analysis, and not all methods are suitable for the investigation of resistance genes, particularly if the desired outcome is an assessment of risks to human health. In this review, we outline the current state of methods for sequence handling, mapping to databases of resistance genes, statistical analysis and metagenomic assembly. In addition, we provide an overview of important considerations related to the analysis of resistance genes, and recommend some of the currently used tools and methods that are best equipped to inform research and clinical practice related to antibiotic resistance.
Collapse
Affiliation(s)
- Johan Bengtsson-Palme
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10, SE-41346, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden
| | - D G Joakim Larsson
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Guldhedsgatan 10, SE-41346, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden
| | - Erik Kristiansson
- Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Box 440, SE-40530, Gothenburg, Sweden.,Department of Mathematical Sciences, Chalmers University of Technology, SE-41296, Gothenburg, Sweden
| |
Collapse
|
14
|
Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes. ENTROPY 2018; 20:e20060393. [PMID: 33265483 PMCID: PMC7512912 DOI: 10.3390/e20060393] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/16/2018] [Accepted: 05/21/2018] [Indexed: 11/26/2022]
Abstract
An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA.
Collapse
|
15
|
Kleyner R, Malcolmson J, Tegay D, Ward K, Maughan A, Maughan G, Nelson L, Wang K, Robison R, Lyon GJ. KBG syndrome involving a single-nucleotide duplication in ANKRD11. Cold Spring Harb Mol Case Stud 2017; 2:a001131. [PMID: 27900361 PMCID: PMC5111005 DOI: 10.1101/mcs.a001131] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
KBG syndrome is a rare autosomal dominant genetic condition characterized by neurological involvement and distinct facial, hand, and skeletal features. More than 70 cases have been reported; however, it is likely that KBG syndrome is underdiagnosed because of lack of comprehensive characterization of the heterogeneous phenotypic features. We describe the clinical manifestations in a male currently 13 years of age, who exhibited symptoms including epilepsy, severe developmental delay, distinct facial features, and hand anomalies, without a positive genetic diagnosis. Subsequent exome sequencing identified a novel de novo heterozygous single base pair duplication (c.6015dupA) in ANKRD11, which was validated by Sanger sequencing. This single-nucleotide duplication is predicted to lead to a premature stop codon and loss of function in ANKRD11, thereby implicating it as contributing to the proband's symptoms and yielding a molecular diagnosis of KBG syndrome. Before molecular diagnosis, this syndrome was not recognized in the proband, as several key features of the disorder were mild and were not recognized by clinicians, further supporting the concept of variable expressivity in many disorders. Although a diagnosis of cerebral folate deficiency has also been given, its significance for the proband's condition remains uncertain.
Collapse
Affiliation(s)
- Robert Kleyner
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Janet Malcolmson
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;; Genetic Counseling Graduate Program, Long Island University (LIU), Brookville, New York 11548, USA
| | - David Tegay
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Kenneth Ward
- Affiliated Genetics, Inc., Salt Lake City, Utah 84109, USA
| | | | - Glenn Maughan
- KBG Syndrome Foundation, West Jordan, Utah 84088, USA
| | - Lesa Nelson
- Affiliated Genetics, Inc., Salt Lake City, Utah 84109, USA
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California 90089, USA;; Department of Psychiatry & Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA;; Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA
| | - Reid Robison
- Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA
| | - Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;; Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA
| |
Collapse
|
16
|
Malcolmson J, Kleyner R, Tegay D, Adams W, Ward K, Coppinger J, Nelson L, Meisler MH, Wang K, Robison R, Lyon GJ. SCN8A mutation in a child presenting with seizures and developmental delays. Cold Spring Harb Mol Case Stud 2017; 2:a001073. [PMID: 27900360 PMCID: PMC5111007 DOI: 10.1101/mcs.a001073] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The SCN8A gene encodes the sodium voltage-gated channel alpha subunit 8. Mutations in this gene have been associated with early infantile epileptic encephalopathy type 13. With the use of whole-exome sequencing, a de novo missense mutation in SCN8A was identified in a 4-yr-old female who initially exhibited symptoms of epilepsy at the age of 5 mo that progressed to a severe condition with very little movement, including being unable to sit or walk on her own.
Collapse
Affiliation(s)
- Janet Malcolmson
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;; Genetic Counseling Graduate Program, Long Island University (LIU), Brookville, New York 11548, USA
| | - Robert Kleyner
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - David Tegay
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Whit Adams
- Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA
| | - Kenneth Ward
- Affiliated Genetics, Salt Lake City, Utah 84109, USA
| | | | - Lesa Nelson
- Affiliated Genetics, Salt Lake City, Utah 84109, USA
| | - Miriam H Meisler
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109-5618, USA
| | - Kai Wang
- Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA;; Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California 90089, USA;; Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA
| | - Reid Robison
- Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA
| | - Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;; Utah Foundation for Biomedical Research, Salt Lake City, Utah 84107, USA
| |
Collapse
|
17
|
Degois J, Clerc F, Simon X, Bontemps C, Leblond P, Duquenne P. First Metagenomic Survey of the Microbial Diversity in Bioaerosols Emitted in Waste Sorting Plants. Ann Work Expo Health 2017; 61:1076-1086. [DOI: 10.1093/annweh/wxx075] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 09/03/2017] [Indexed: 11/13/2022] Open
|
18
|
Sarig O, Sprecher E. The Molecular Revolution in Cutaneous Biology: Era of Next-Generation Sequencing. J Invest Dermatol 2017; 137:e79-e82. [PMID: 28411851 DOI: 10.1016/j.jid.2016.02.818] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 12/22/2015] [Accepted: 02/01/2016] [Indexed: 11/20/2022]
Abstract
Like any true conceptual revolution, next-generation sequencing (NGS) has not only radically changed research and clinical practice, it has also modified scientific culture. With the possibility to investigate DNA contents of any organism and in any context, including in somatic disorders or in tissues carrying complex microbial populations, it initially seemed as if the genetic underpinning of any biological phenomenon could now be deciphered in an almost streamlined fashion. However, over the past recent years, we have once again come to understand that there is no such a thing as great opportunities without great challenges. The steadily expanding use of NGS and related applications is now facing biologists and physicians with novel technological obstacles, analytical hurdles and increasingly pressing ethical questions.
Collapse
Affiliation(s)
- Ofer Sarig
- Department of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Eli Sprecher
- Department of Dermatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel; Department of Human Molecular Genetics & Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
19
|
Österlund T, Jonsson V, Kristiansson E. HirBin: high-resolution identification of differentially abundant functions in metagenomes. BMC Genomics 2017; 18:316. [PMID: 28431529 PMCID: PMC5399828 DOI: 10.1186/s12864-017-3686-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 04/06/2017] [Indexed: 12/16/2022] Open
Abstract
Background Gene-centric analysis of metagenomics data provides information about the biochemical functions present in a microbiome under a certain condition. The ability to identify significant differences in functions between metagenomes is dependent on accurate classification and quantification of the sequence reads (binning). However, biological effects acting on specific functions may be overlooked if the classes are too general. Methods Here we introduce High-Resolution Binning (HirBin), a new method for gene-centric analysis of metagenomes. HirBin combines supervised annotation with unsupervised clustering to bin sequence reads at a higher resolution. The supervised annotation is performed by matching sequence fragments to genes using well-established protein domains, such as TIGRFAM, PFAM or COGs, followed by unsupervised clustering where each functional domain is further divided into sub-bins based on sequence similarity. Finally, differential abundance of the sub-bins is statistically assessed. Results We show that HirBin is able to identify biological effects that are only present at more specific functional levels. Furthermore we show that changes affecting more specific functional levels are often diluted at the more general level and therefore overlooked when analyzed using standard binning approaches. Conclusions HirBin improves the resolution of the gene-centric analysis of metagenomes and facilitates the biological interpretation of the results. HirBin is implemented as a Python package and is freely available for download at http://bioinformatics.math.chalmers.se/hirbin. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3686-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tobias Österlund
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-41296, Gothenburg, Sweden.
| | - Viktor Jonsson
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-41296, Gothenburg, Sweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-41296, Gothenburg, Sweden
| |
Collapse
|
20
|
Marine genomics: News and views. Mar Genomics 2017; 31:1-8. [DOI: 10.1016/j.margen.2016.09.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 09/08/2016] [Accepted: 09/09/2016] [Indexed: 11/23/2022]
|
21
|
Limit theorems for empirical Rényi entropy and divergence with applications to molecular diversity analysis. TEST-SPAIN 2016. [DOI: 10.1007/s11749-016-0489-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
From next-generation resequencing reads to a high-quality variant data set. Heredity (Edinb) 2016; 118:111-124. [PMID: 27759079 DOI: 10.1038/hdy.2016.102] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 09/03/2016] [Accepted: 09/06/2016] [Indexed: 12/11/2022] Open
Abstract
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
Collapse
|
23
|
Paradis E, Gosselin T, Goudet J, Jombart T, Schliep K. Linking genomics and population genetics with R. Mol Ecol Resour 2016; 17:54-66. [PMID: 27461508 DOI: 10.1111/1755-0998.12577] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Revised: 07/01/2016] [Accepted: 07/19/2016] [Indexed: 11/29/2022]
Abstract
Population genetics and genomics have developed and been treated as independent fields of study despite having common roots. The continuous progress of sequencing technologies is contributing to (re-)connect these two disciplines. We review the challenges faced by data analysts and software developers when handling very big genetic data sets collected on many individuals. We then expose how r, as a computing language and development environment, proposes some solutions to meet these challenges. We focus on some specific issues that are often encountered in practice: handling and analysing single-nucleotide polymorphism data, handling and reading variant call format files, analysing haplotypes and linkage disequilibrium and performing multivariate analyses. We illustrate these implementations with some analyses of three recently published data sets that contain between 60 000 and 1 000 000 loci. We conclude with some perspectives on future developments of r software for population genomics.
Collapse
Affiliation(s)
- Emmanuel Paradis
- Institut des Sciences de l'Évolution, Université Montpellier - CNRS - IRD - EPHE, Place Eugène Bataillon - CC 065, 34095, Montpellier cédex 05, France
| | - Thierry Gosselin
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, G1V 0A6, Canada
| | - Jérôme Goudet
- Department of Ecology and Evolution, Swiss Institute of Bioinformatics, Lausanne, CH-1015, Switzerland
| | - Thibaut Jombart
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College, London, W2 1PG, UK
| | - Klaus Schliep
- Department of Biology, University of Massachusetts Boston, Boston, MA, 02125, USA
| |
Collapse
|
24
|
Cammen KM, Andrews KR, Carroll EL, Foote AD, Humble E, Khudyakov JI, Louis M, McGowen MR, Olsen MT, Van Cise AM. Genomic Methods Take the Plunge: Recent Advances in High-Throughput Sequencing of Marine Mammals. J Hered 2016; 107:481-95. [PMID: 27511190 DOI: 10.1093/jhered/esw044] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 07/12/2016] [Indexed: 12/18/2022] Open
Abstract
The dramatic increase in the application of genomic techniques to non-model organisms (NMOs) over the past decade has yielded numerous valuable contributions to evolutionary biology and ecology, many of which would not have been possible with traditional genetic markers. We review this recent progression with a particular focus on genomic studies of marine mammals, a group of taxa that represent key macroevolutionary transitions from terrestrial to marine environments and for which available genomic resources have recently undergone notable rapid growth. Genomic studies of NMOs utilize an expanding range of approaches, including whole genome sequencing, restriction site-associated DNA sequencing, array-based sequencing of single nucleotide polymorphisms and target sequence probes (e.g., exomes), and transcriptome sequencing. These approaches generate different types and quantities of data, and many can be applied with limited or no prior genomic resources, thus overcoming one traditional limitation of research on NMOs. Within marine mammals, such studies have thus far yielded significant contributions to the fields of phylogenomics and comparative genomics, as well as enabled investigations of fitness, demography, and population structure. Here we review the primary options for generating genomic data, introduce several emerging techniques, and discuss the suitability of each approach for different applications in the study of NMOs.
Collapse
Affiliation(s)
- Kristina M Cammen
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise).
| | - Kimberly R Andrews
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Emma L Carroll
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Andrew D Foote
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Emily Humble
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Jane I Khudyakov
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Marie Louis
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Michael R McGowen
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Morten Tange Olsen
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| | - Amy M Van Cise
- From the School of Marine Sciences, University of Maine, Orono, ME 04469 (Cammen); Department of Fish and Wildlife Sciences, University of Idaho, 875 Perimeter Drive MS 1136, Moscow, ID 83844-1136 (Andrews); Scottish Oceans Institute, University of St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK (Carroll and Louis); Computational and Molecular Population Genetics Lab, Institute of Ecology and Evolution, University of Bern, Bern CH-3012, Switzerland (Foote); Department of Animal Behaviour, University of Bielefeld, Postfach 100131, 33501 Bielefeld, Germany (Humble); British Antarctic Survey, High Cross, Madingley Road, Cambridge CB3 OET, UK (Humble); Department of Biology, Sonoma State University, Rohnert Park, CA 94928 (Khudyakov); School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK (Mcgowen); Evolutionary Genomics Section, Natural History Museum of Denmark, University of Copenhagen, DK-1353 Copenhagen K, Denmark (Olsen); and Scripps Institution of Oceanography, University of California San Diego, 8622 Kennel Way, La Jolla, CA 92037 (Van Cise)
| |
Collapse
|
25
|
Endrullat C, Glökler J, Franke P, Frohme M. Standardization and quality management in next-generation sequencing. Appl Transl Genom 2016; 10:2-9. [PMID: 27668169 PMCID: PMC5025460 DOI: 10.1016/j.atg.2016.06.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 05/13/2016] [Accepted: 06/29/2016] [Indexed: 11/30/2022]
Abstract
DNA sequencing continues to evolve quickly even after > 30 years. Many new platforms suddenly appeared and former established systems have vanished in almost the same manner. Since establishment of next-generation sequencing devices, this progress gains momentum due to the continually growing demand for higher throughput, lower costs and better quality of data. In consequence of this rapid development, standardized procedures and data formats as well as comprehensive quality management considerations are still scarce. Here, we listed and summarized current standardization efforts and quality management initiatives from companies, organizations and societies in form of published studies and ongoing projects. These comprise on the one hand quality documentation issues like technical notes, accreditation checklists and guidelines for validation of sequencing workflows. On the other hand, general standard proposals and quality metrics are developed and applied to the sequencing workflow steps with the main focus on upstream processes. Finally, certain standard developments for downstream pipeline data handling, processing and storage are discussed in brief. These standardization approaches represent a first basis for continuing work in order to prospectively implement next-generation sequencing in important areas such as clinical diagnostics, where reliable results and fast processing is crucial. Additionally, these efforts will exert a decisive influence on traceability and reproducibility of sequence data.
Collapse
Key Words
- ABRF, Association of Biomolecular Resource Facilities
- BAM, binary alignment/map
- CAP, College of American Pathologist's
- CEN, European Committee for Standardization
- CLIA, Clinical Laboratory Improvement Amendments
- Data quality
- ERCC, External RNA Controls Consortium
- FDA, Food and Drug Administration
- FFPE, formalin-fixed, paraffin-embedded
- FMEA, failure mode and effects analysis
- GATK, genome analysis toolkit
- GSC, Genomic Standards Consortium
- Guideline
- HGP, Human Genome Project
- Indel, insertion or deletion
- MAQC, MicroArray Quality Control Project
- MIGS, minimum information about a genome sequence
- MOL, molecular pathology checklist
- NGS, next-generation sequencing
- NIST, National Institute of Standards and Technology
- NTC, no-template control
- Nex-StoCT, next generation sequencing — standardization of clinical testing
- Next-generation sequencing
- PT, proficiency testing
- QA, quality assurance
- QC, quality control
- QM, quality management
- QMS, quality management system
- Quality management
- RIN, RNA integrity number
- SAM, sequence alignment/map
- SEQC, sequencing quality control
- SNP, single nucleotide polymorphism
- SOP, standard operating procedure
- Standardization
- TN, technical note
- VCF, variant call format
- Validation
- ddPCR, digital droplet PCR
- mtDNA, mitochondrial DNA
- qPCR, quantitative PCR
Collapse
Affiliation(s)
- Christoph Endrullat
- Molecular Biotechnology and Functional Genomics, Institute of Applied Biosciences, Technical University of Applied Sciences Wildau, Hochschulring 1, 15745 Wildau, Germany
| | - Jörn Glökler
- Molecular Biotechnology and Functional Genomics, Institute of Applied Biosciences, Technical University of Applied Sciences Wildau, Hochschulring 1, 15745 Wildau, Germany
| | - Philipp Franke
- Molecular Biotechnology and Functional Genomics, Institute of Applied Biosciences, Technical University of Applied Sciences Wildau, Hochschulring 1, 15745 Wildau, Germany
| | - Marcus Frohme
- Molecular Biotechnology and Functional Genomics, Institute of Applied Biosciences, Technical University of Applied Sciences Wildau, Hochschulring 1, 15745 Wildau, Germany
| |
Collapse
|
26
|
Schang C, Henry R, Kolotelo PA, Prosser T, Crosbie N, Grant T, Cottam D, O’Brien P, Coutts S, Deletic A, McCarthy DT. Evaluation of Techniques for Measuring Microbial Hazards in Bathing Waters: A Comparative Study. PLoS One 2016; 11:e0155848. [PMID: 27213772 PMCID: PMC4877094 DOI: 10.1371/journal.pone.0155848] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 05/05/2016] [Indexed: 11/23/2022] Open
Abstract
Recreational water quality is commonly monitored by means of culture based faecal indicator organism (FIOs) assays. However, these methods are costly and time-consuming; a serious disadvantage when combined with issues such as non-specificity and user bias. New culture and molecular methods have been developed to counter these drawbacks. This study compared industry-standard IDEXX methods (Colilert and Enterolert) with three alternative approaches: 1) TECTA™ system for E. coli and enterococci; 2) US EPA’s 1611 method (qPCR based enterococci enumeration); and 3) Next Generation Sequencing (NGS). Water samples (233) were collected from riverine, estuarine and marine environments over the 2014–2015 summer period and analysed by the four methods. The results demonstrated that E. coli and coliform densities, inferred by the IDEXX system, correlated strongly with the TECTA™ system. The TECTA™ system had further advantages in faster turnaround times (~12 hrs from sample receipt to result compared to 24 hrs); no staff time required for interpretation and less user bias (results are automatically calculated, compared to subjective colorimetric decisions). The US EPA Method 1611 qPCR method also showed significant correlation with the IDEXX enterococci method; but had significant disadvantages such as highly technical analysis and higher operational costs (330% of IDEXX). The NGS method demonstrated statistically significant correlations between IDEXX and the proportions of sequences belonging to FIOs, Enterobacteriaceae, and Enterococcaceae. While costs (3,000% of IDEXX) and analysis time (300% of IDEXX) were found to be significant drawbacks of NGS, rapid technological advances in this field will soon see it widely adopted.
Collapse
Affiliation(s)
- Christelle Schang
- Environmental and Public Health Microbiology Laboratory (EPHM Lab), Monash University, Clayton, Victoria, Australia
| | - Rebekah Henry
- Environmental and Public Health Microbiology Laboratory (EPHM Lab), Monash University, Clayton, Victoria, Australia
| | - Peter A. Kolotelo
- Environmental and Public Health Microbiology Laboratory (EPHM Lab), Monash University, Clayton, Victoria, Australia
| | | | | | - Trish Grant
- Melbourne Water, Docklands, Victoria, Australia
| | - Darren Cottam
- Environment Protection Authority Victoria, Melbourne, Victoria, Australia
| | - Peter O’Brien
- Mornington Peninsula Shire, Rosebud, Victoria, Australia
| | - Scott Coutts
- Micromon, Monash University, Clayton, Victoria, Australia
| | - Ana Deletic
- Environmental and Public Health Microbiology Laboratory (EPHM Lab), Monash University, Clayton, Victoria, Australia
| | - David T. McCarthy
- Environmental and Public Health Microbiology Laboratory (EPHM Lab), Monash University, Clayton, Victoria, Australia
- * E-mail:
| |
Collapse
|
27
|
Vieira FG, Albrechtsen A, Nielsen R. Estimating IBD tracts from low coverage NGS data. Bioinformatics 2016; 32:2096-102. [DOI: 10.1093/bioinformatics/btw212] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 04/12/2016] [Indexed: 11/13/2022] Open
|
28
|
Remnant EJ, Ashe A, Young PE, Buchmann G, Beekman M, Allsopp MH, Suter CM, Drewell RA, Oldroyd BP. Parent-of-origin effects on genome-wide DNA methylation in the Cape honey bee (Apis mellifera capensis) may be confounded by allele-specific methylation. BMC Genomics 2016; 17:226. [PMID: 26969617 PMCID: PMC4788913 DOI: 10.1186/s12864-016-2506-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 02/19/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Intersexual genomic conflict sometimes leads to unequal expression of paternal and maternal alleles in offspring, resulting in parent-of-origin effects. In honey bees reciprocal crosses can show strong parent-of-origin effects, supporting theoretical predictions that genomic imprinting occurs in this species. Mechanisms behind imprinting in honey bees are unclear but differential DNA methylation in eggs and sperm suggests that DNA methylation could be involved. Nonetheless, because DNA methylation is multifunctional, it is difficult to separate imprinting from other roles of methylation. Here we use a novel approach to investigate parent-of-origin DNA methylation in honey bees. In the subspecies Apis mellifera capensis, reproduction of females occurs either sexually by fertilization of eggs with sperm, or via thelytokous parthenogenesis, producing female embryos derived from two maternal genomes. RESULTS We compared genome-wide methylation patterns of sexually-produced, diploid embryos laid by a queen, with parthenogenetically-produced diploid embryos laid by her daughters. Thelytokous embryos inheriting two maternal genomes had fewer hypermethylated genes compared to fertilized embryos, supporting the prediction that fertilized embryos have increased methylation due to inheritance of a paternal genome. However, bisulfite PCR and sequencing of a differentially methylated gene, Stan (GB18207) showed strong allele-specific methylation that was maintained in both fertilized and thelytokous embryos. For this gene, methylation was associated with haplotype, not parent of origin. CONCLUSIONS The results of our study are consistent with predictions from the kin theory of genomic imprinting. However, our demonstration of allele-specific methylation based on sequence shows that genome-wide differential methylation studies can potentially confound imprinting and allele-specific methylation. It further suggests that methylation patterns are heritable or that specific sequence motifs are targets for methylation in some genes.
Collapse
Affiliation(s)
- Emily J. Remnant
- />Behavior and Genetics of Social Insects Laboratory, School of Life and Environmental Sciences A12, University of Sydney, Room 248, Macleay Building (A12), Sydney, NSW 2006 Australia
| | - Alyson Ashe
- />School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006 Australia
| | - Paul E. Young
- />Victor Chang Cardiac Research Institute, Lowy Packer Building, 405 Liverpool Street, Darlinghurst, NSW 2010 Australia
- />University of New South Wales, Kensington, NSW 2033 Australia
| | - Gabriele Buchmann
- />Behavior and Genetics of Social Insects Laboratory, School of Life and Environmental Sciences A12, University of Sydney, Room 248, Macleay Building (A12), Sydney, NSW 2006 Australia
| | - Madeleine Beekman
- />Behavior and Genetics of Social Insects Laboratory, School of Life and Environmental Sciences A12, University of Sydney, Room 248, Macleay Building (A12), Sydney, NSW 2006 Australia
| | - Michael H. Allsopp
- />Honey Bee Research Section, ARC-Plant Protection Research Institute, Private Bag X5017, Stellenbosch, South Africa
| | - Catherine M. Suter
- />Victor Chang Cardiac Research Institute, Lowy Packer Building, 405 Liverpool Street, Darlinghurst, NSW 2010 Australia
- />University of New South Wales, Kensington, NSW 2033 Australia
| | - Robert A. Drewell
- />Biology Department, Clark University, 950 Main Street, Worcester, MA 01610 USA
| | - Benjamin P. Oldroyd
- />Behavior and Genetics of Social Insects Laboratory, School of Life and Environmental Sciences A12, University of Sydney, Room 248, Macleay Building (A12), Sydney, NSW 2006 Australia
| |
Collapse
|
29
|
Chen Z, Huang C, Liu W, Zhang L, Tong P, Zhang L. Simultaneous determination of nucleoside and purine compounds in human urine based on a hydrophobic monolithic column using capillary electrochromatography. Electrophoresis 2015; 36:2727-2735. [DOI: 10.1002/elps.201500194] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Revised: 08/05/2015] [Accepted: 08/06/2015] [Indexed: 12/26/2022]
Affiliation(s)
- Zongbao Chen
- Ministry of Education Key Laboratory of Analysis and Detection for Food Safety, College of Chemistry, Testing Center; Fuzhou University; Fuzhou Fujian China
- Key Laboratory of Applied Organic Chemistry, Department of Chemistry; Shangrao Normal University; Shangrao Jiangxi China
| | - Chuanghui Huang
- Ministry of Education Key Laboratory of Analysis and Detection for Food Safety, College of Chemistry, Testing Center; Fuzhou University; Fuzhou Fujian China
| | - Wei Liu
- Ministry of Education Key Laboratory of Analysis and Detection for Food Safety, College of Chemistry, Testing Center; Fuzhou University; Fuzhou Fujian China
| | - Lin Zhang
- Ministry of Education Key Laboratory of Analysis and Detection for Food Safety, College of Chemistry, Testing Center; Fuzhou University; Fuzhou Fujian China
| | - Ping Tong
- Ministry of Education Key Laboratory of Analysis and Detection for Food Safety, College of Chemistry, Testing Center; Fuzhou University; Fuzhou Fujian China
| | - Lan Zhang
- Ministry of Education Key Laboratory of Analysis and Detection for Food Safety, College of Chemistry, Testing Center; Fuzhou University; Fuzhou Fujian China
| |
Collapse
|
30
|
Korneliussen TS, Moltke I. NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data. Bioinformatics 2015; 31:4009-11. [PMID: 26323718 PMCID: PMC4673978 DOI: 10.1093/bioinformatics/btv509] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 08/24/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Pairwise relatedness estimation is important in many contexts such as disease mapping and population genetics. However, all existing estimation methods are based on called genotypes, which is not ideal for next-generation sequencing (NGS) data of low depth from which genotypes cannot be called with high certainty. RESULTS We present a software tool, NgsRelate, for estimating pairwise relatedness from NGS data. It provides maximum likelihood estimates that are based on genotype likelihoods instead of genotypes and thereby takes the inherent uncertainty of the genotypes into account. Using both simulated and real data, we show that NgsRelate provides markedly better estimates for low-depth NGS data than two state-of-the-art genotype-based methods. AVAILABILITY NgsRelate is implemented in C++ and is available under the GNU license at www.popgen.dk/software.
Collapse
Affiliation(s)
| | - Ida Moltke
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
31
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|