1
|
Dolzhenko E, Weisburd B, Ibañez K, Rajan-Babu IS, Anyansi C, Bennett MF, Billingsley K, Carroll A, Clamons S, Danzi MC, Deshpande V, Ding J, Fazal S, Halman A, Jadhav B, Qiu Y, Richmond PA, Saunders CT, Scheffler K, van Vugt JJFA, Zwamborn RRAJ, Chong SS, Friedman JM, Tucci A, Rehm HL, Eberle MA. REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med 2022; 14:84. [PMID: 35948990 PMCID: PMC9367089 DOI: 10.1186/s13073-022-01085-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 07/11/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Expansions of short tandem repeats are the cause of many neurogenetic disorders including familial amyotrophic lateral sclerosis, Huntington disease, and many others. Multiple methods have been recently developed that can identify repeat expansions in whole genome or exome sequencing data. Despite the widely recognized need for visual assessment of variant calls in clinical settings, current computational tools lack the ability to produce such visualizations for repeat expansions. Expanded repeats are difficult to visualize because they correspond to large insertions relative to the reference genome and involve many misaligning and ambiguously aligning reads. RESULTS We implemented REViewer, a computational method for visualization of sequencing data in genomic regions containing long repeat expansions and FlipBook, a companion image viewer designed for manual curation of large collections of REViewer images. To generate a read pileup, REViewer reconstructs local haplotype sequences and distributes reads to these haplotypes in a way that is most consistent with the fragment lengths and evenness of read coverage. To create appropriate training materials for onboarding new users, we performed a concordance study involving 12 scientists involved in short tandem repeat research. We used the results of this study to create a user guide that describes the basic principles of using REViewer as well as a guide to the typical features of read pileups that correspond to low confidence repeat genotype calls. Additionally, we demonstrated that REViewer can be used to annotate clinically relevant repeat interruptions by comparing visual assessment results of 44 FMR1 repeat alleles with the results of triplet repeat primed PCR. For 38 of these alleles, the results of visual assessment were consistent with triplet repeat primed PCR. CONCLUSIONS Read pileup plots generated by REViewer offer an intuitive way to visualize sequencing data in regions containing long repeat expansions. Laboratories can use REViewer and FlipBook to assess the quality of repeat genotype calls as well as to visually detect interruptions or other imperfections in the repeat sequence and the surrounding flanking regions. REViewer and FlipBook are available under open-source licenses at https://github.com/illumina/REViewer and https://github.com/broadinstitute/flipbook respectively.
Collapse
Affiliation(s)
- Egor Dolzhenko
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Ben Weisburd
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA ,grid.32224.350000 0004 0386 9924Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
| | - Kristina Ibañez
- grid.4868.20000 0001 2171 1133William Harvey Research Institute, Queen Mary University of London, London, EC1M 6BQ UK
| | - Indhu-Shree Rajan-Babu
- grid.17091.3e0000 0001 2288 9830Department of Medical Genetics, University of British Columbia and Children’s & Women’s Hospital, Vancouver, BC V6H3N1 Canada ,grid.13097.3c0000 0001 2322 6764Department of Medical and Molecular Genetics, King’s College London, Strand, London, WC2R 2LS UK
| | - Christine Anyansi
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Mark F. Bennett
- grid.1042.70000 0004 0432 4889Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052 Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Parkville, VIC 3052 Australia ,grid.410678.c0000 0000 9374 3516Epilepsy Research Centre, Department of Medicine, University of Melbourne, Austin Health, Heidelberg, VIC 3084 Australia
| | - Kimberley Billingsley
- grid.419475.a0000 0000 9372 4913Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD USA ,grid.419475.a0000 0000 9372 4913Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD USA
| | - Ashley Carroll
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Samuel Clamons
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Matt C. Danzi
- grid.26790.3a0000 0004 1936 8606Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136 USA
| | - Viraj Deshpande
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Jinhui Ding
- grid.419475.a0000 0000 9372 4913Computational Biology Group, Laboratory of Neurogenetics, National Institute on Aging, NIH, Bethesda, MD 20892 USA
| | - Sarah Fazal
- grid.26790.3a0000 0004 1936 8606Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136 USA
| | - Andreas Halman
- grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC 3000 Australia ,grid.1008.90000 0001 2179 088XSir Peter MacCallum Department of Oncology, The University of Melbourne, Parkville, VIC 3010 Australia
| | - Bharati Jadhav
- grid.59734.3c0000 0001 0670 2351Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029 USA
| | - Yunjiang Qiu
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Phillip A. Richmond
- grid.414137.40000 0001 0684 7788BC Children’s Hospital Research Institute, Vancouver, BC V5Z 4H4 Canada
| | | | - Konrad Scheffler
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| | - Joke J. F. A. van Vugt
- grid.5477.10000000120346234Department of Neurology, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - Ramona R. A. J. Zwamborn
- grid.5477.10000000120346234Department of Neurology, University Medical Center Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | | | - Samuel S. Chong
- grid.4280.e0000 0001 2180 6431Department of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore ,grid.4280.e0000 0001 2180 6431Department of Obstetrics and Gynecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228 Singapore ,grid.412106.00000 0004 0621 9599Department of Laboratory Medicine, National University Hospital, Singapore, 119074 Singapore
| | - Jan M. Friedman
- grid.17091.3e0000 0001 2288 9830Department of Medical Genetics, University of British Columbia and Children’s & Women’s Hospital, Vancouver, BC V6H3N1 Canada
| | - Arianna Tucci
- grid.4868.20000 0001 2171 1133William Harvey Research Institute, Queen Mary University of London, London, EC1M 6BQ UK
| | - Heidi L. Rehm
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA ,grid.32224.350000 0004 0386 9924Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
| | - Michael A. Eberle
- grid.185669.50000 0004 0507 3954Illumina Inc., San Diego, CA 92122 USA
| |
Collapse
|
2
|
Abstract
Models of well-mixed chemical reaction networks (CRNs) have provided a solid foundation for the study of programmable molecular systems, but the importance of spatial organization in such systems has increasingly been recognized. In this paper, we explore an alternative chemical computing model introduced by Qian & Winfree in 2014, the surface CRN, which uses molecules attached to a surface such that each molecule only interacts with its immediate neighbours. Expanding on the constructions in that work, we first demonstrate that surface CRNs can emulate asynchronous and synchronous deterministic cellular automata and implement continuously active Boolean logic circuits. We introduce three new techniques for enforcing synchronization within local regions, each with a different trade-off in spatial and chemical complexity. We also demonstrate that surface CRNs can manufacture complex spatial patterns from simple initial conditions and implement interesting swarm robotic behaviours using simple local rules. Throughout all example constructions of surface CRNs, we highlight the trade-off between the ability to precisely place molecules and the ability to precisely control molecular interactions. Finally, we provide a Python simulator for surface CRNs with an easy-to-use web interface, so that readers may follow along with our examples or create their own surface CRN designs.
Collapse
Affiliation(s)
- Samuel Clamons
- Bioengineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lulu Qian
- Bioengineering, California Institute of Technology, Pasadena, CA 91125, USA
- Computer Science, California Institute of Technology, Pasadena, CA 91125, USA
| | - Erik Winfree
- Bioengineering, California Institute of Technology, Pasadena, CA 91125, USA
- Computer Science, California Institute of Technology, Pasadena, CA 91125, USA
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
3
|
Perez R, Luccioni M, Kamakaka R, Clamons S, Gaut N, Stirling F, Adamala KP, Silver PA, Endy D. Enabling community-based metrology for wood-degrading fungi. Fungal Biol Biotechnol 2020; 7:2. [PMID: 32206323 PMCID: PMC7081594 DOI: 10.1186/s40694-020-00092-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 02/25/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Lignocellulosic biomass could support a greatly-expanded bioeconomy. Current strategies for using biomass typically rely on single-cell organisms and extensive ancillary equipment to produce precursors for downstream manufacturing processes. Alternative forms of bioproduction based on solid-state fermentation and wood-degrading fungi could enable more direct means of manufacture. However, basic methods for cultivating wood-degrading fungi are often ad hoc and not readily reproducible. Here, we developed standard reference strains, substrates, measurements, and methods sufficient to begin to enable reliable reuse of mycological materials and products in simple laboratory settings. RESULTS We show that a widely-available and globally-regularized consumer product (Pringles™) can support the growth of wood-degrading fungi, and that growth on Pringles™-broth can be correlated with growth on media made from a fully-traceable and compositionally characterized substrate (National Institute of Standards and Technology Reference Material 8492 Eastern Cottonwood Whole Biomass Feedstock). We also establish a Relative Extension Unit (REU) framework that is designed to reduce variation in quantification of radial growth measurements. So enabled, we demonstrate that five laboratories were able to compare measurements of wood-fungus performance via a simple radial extension growth rate assay, and that our REU-based approach reduced variation in reported measurements by up to ~ 75%. CONCLUSIONS Reliable reuse of materials, measures, and methods is necessary to enable distributed bioproduction processes that can be adopted at all scales, from local to industrial. Our community-based measurement methods incentivize practitioners to coordinate the reuse of standard materials, methods, strains, and to share information supporting work with wood-degrading fungi.
Collapse
Affiliation(s)
- Rolando Perez
- Department of Bioengineering, Schools of Engineering and Medicine, Stanford University, Room 252, Shriram Center, 443 Via Ortega, Stanford, CA 94305 USA
| | - Marina Luccioni
- Department of Bioengineering, Schools of Engineering and Medicine, Stanford University, Room 252, Shriram Center, 443 Via Ortega, Stanford, CA 94305 USA
| | - Rohinton Kamakaka
- Department of MCD Biology, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA 95064 USA
| | - Samuel Clamons
- Department of Chemistry and Molecular Biophysics, California Institute of Technology, 1200 E. California Blvd, MC 138-78, Pasadena, CA 91125 USA
- Department of Control and Dynamical Systems, California Institute of Technology, 1200 E. California Blvd, MC 138-78, Pasadena, CA 91125 USA
| | - Nathaniel Gaut
- Department of Genetics, Cell Biology, and Development, College of Biological Sciences, University of Minnesota, 420 Washington Ave. SE, 5-178 MCB, Minneapolis, MN 55455 USA
| | - Finn Stirling
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert Building, Boston, MA 02115 USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 200 Longwood Avenue, Warren Alpert Building, Boston, MA 02115 USA
| | - Katarzyna P. Adamala
- Department of Genetics, Cell Biology, and Development, College of Biological Sciences, University of Minnesota, 420 Washington Ave. SE, 5-178 MCB, Minneapolis, MN 55455 USA
| | - Pamela A. Silver
- Department of Systems Biology, Harvard Medical School, 200 Longwood Avenue, Warren Alpert Building, Boston, MA 02115 USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, 200 Longwood Avenue, Warren Alpert Building, Boston, MA 02115 USA
| | - Drew Endy
- Department of Bioengineering, Schools of Engineering and Medicine, Stanford University, Room 252, Shriram Center, 443 Via Ortega, Stanford, CA 94305 USA
| |
Collapse
|
4
|
Halleran A, Clamons S, Saha M. Transcriptomic Characterization of an Infection of Mycobacterium smegmatis by the Cluster A4 Mycobacteriophage Kampy. PLoS One 2015; 10:e0141100. [PMID: 26513661 PMCID: PMC4626039 DOI: 10.1371/journal.pone.0141100] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 10/04/2015] [Indexed: 01/29/2023] Open
Abstract
The mycobacteriophages, phages that infect the genus Mycobacterium, display profound genetic diversity and widespread geographical distribution, and possess significant medical and ecological importance. However, most of the majority of functions of mycobacteriophage proteins and the identity of most genetic regulatory elements remain unknown. We characterized the gene expression profile of Kampy, a cluster A4 mycobacteriophage, during infection of its host, Mycobacterium smegmatis, using RNA-Seq and mass spectrometry. We show that mycobacteriophage Kampy transcription occurs in roughly two phases, an early phase consisting of genes for metabolism, DNA synthesis, and gene regulation, and a late phase consisting of structural genes and lysis genes. Additionally, we identify the earliest genes transcribed during infection, along with several other possible regulatory units not obvious from inspection of Kampy's genomic structure. The transcriptional profile of Kampy appears similar to that of mycobacteriophage TM4 but unlike that of mycobacteriophage Giles, a result which further expands our understanding of the diversity of mycobacteriophage gene expression programs during infection.
Collapse
Affiliation(s)
- Andrew Halleran
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
| | - Samuel Clamons
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
| | - Margaret Saha
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|
5
|
Vasiliu D, Clamons S, McDonough M, Rabe B, Saha M. A regression-based differential expression detection algorithm for microarray studies with ultra-low sample size. PLoS One 2015; 10:e0118198. [PMID: 25738861 PMCID: PMC4349782 DOI: 10.1371/journal.pone.0118198] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Accepted: 01/08/2015] [Indexed: 02/03/2023] Open
Abstract
Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.
Collapse
Affiliation(s)
- Daniel Vasiliu
- Department of Mathematics, College of William and Mary, Williamsburg, Virginia, United States of America
| | - Samuel Clamons
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
| | - Molly McDonough
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
| | - Brian Rabe
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
| | - Margaret Saha
- Department of Biology, College of William and Mary, Williamsburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|