1
|
Sobral AF, Dinis-Oliveira RJ, Barbosa DJ. CRISPR-Cas technology in forensic investigations: Principles, applications, and ethical considerations. Forensic Sci Int Genet 2025; 74:103163. [PMID: 39437497 DOI: 10.1016/j.fsigen.2024.103163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/08/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024]
Abstract
CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated proteins) systems are adaptive immune systems originally present in bacteria, where they are essential to protect against external genetic elements, including viruses and plasmids. Taking advantage of this system, CRISPR-Cas-based technologies have emerged as incredible tools for precise genome editing, thus significantly advancing several research fields. Forensic sciences represent a multidisciplinary field that explores scientific methods to investigate and resolve legal issues, particularly criminal investigations and subject identification. Consequently, it plays a critical role in the justice system, providing scientific evidence to support judicial investigations. Although less explored, CRISPR-Cas-based methodologies demonstrate strong potential in the field of forensic sciences due to their high accuracy and sensitivity, including DNA profiling and identification, interpretation of crime scene investigations, detection of food contamination or fraud, and other aspects related to environmental forensics. However, using CRISPR-Cas-based methodologies in human samples raises several ethical issues and concerns regarding the potential misuse of individual genetic information. In this manuscript, we provide an overview of potential applications of CRISPR-Cas-based methodologies in several areas of forensic sciences and discuss the legal implications that challenge their routine implementation in this research field.
Collapse
Affiliation(s)
- Ana Filipa Sobral
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Toxicologic Pathology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal.
| | - Ricardo Jorge Dinis-Oliveira
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Translational Toxicology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal; Department of Public Health and Forensic Sciences and Medical Education, Faculty of Medicine, University of Porto, Porto 4200-319, Portugal; FOREN - Forensic Science Experts, Dr. Mário Moutinho Avenue, No. 33-A, Lisbon 1400-136, Portugal.
| | - Daniel José Barbosa
- Associate Laboratory i4HB - Institute for Health and Bioeconomy, University Institute of Health Sciences - CESPU, Gandra 4585-116, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Translational Toxicology Research Laboratory, University Institute of Health Sciences (1H-TOXRUN, IUCS-CESPU), Gandra 4585-116, Portugal.
| |
Collapse
|
2
|
Andersson D, Kebede FT, Escobar M, Österlund T, Ståhlberg A. Principles of digital sequencing using unique molecular identifiers. Mol Aspects Med 2024; 96:101253. [PMID: 38367531 DOI: 10.1016/j.mam.2024.101253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/26/2024] [Accepted: 02/03/2024] [Indexed: 02/19/2024]
Abstract
Massively parallel sequencing technologies have long been used in both basic research and clinical routine. The recent introduction of digital sequencing has made previously challenging applications possible by significantly improving sensitivity and specificity to now allow detection of rare sequence variants, even at single molecule level. Digital sequencing utilizes unique molecular identifiers (UMIs) to minimize sequencing-induced errors and quantification biases. Here, we discuss the principles of UMIs and how they are used in digital sequencing. We outline the properties of different UMI types and the consequences of various UMI approaches in relation to experimental protocols and bioinformatics. Finally, we describe how digital sequencing can be applied in specific research fields, focusing on cancer management where it can be used in screening of asymptomatic individuals, diagnosis, treatment prediction, prognostication, monitoring treatment efficacy and early detection of treatment resistance as well as relapse.
Collapse
Affiliation(s)
- Daniel Andersson
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden
| | - Firaol Tamiru Kebede
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden
| | - Mandy Escobar
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden
| | - Tobias Österlund
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90, Gothenburg, Sweden; Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, 413 45, Gothenburg, Sweden
| | - Anders Ståhlberg
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 413 90, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 413 90, Gothenburg, Sweden; Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, 413 45, Gothenburg, Sweden.
| |
Collapse
|
3
|
Malekshoar M, Azimi SA, Kaki A, Mousazadeh L, Motaei J, Vatankhah M. CRISPR-Cas9 Targeted Enrichment and Next-Generation Sequencing for Mutation Detection. J Mol Diagn 2023; 25:249-262. [PMID: 36841425 DOI: 10.1016/j.jmoldx.2023.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 01/08/2023] [Accepted: 01/27/2023] [Indexed: 02/27/2023] Open
Abstract
Despite the rapid application of next-generation sequencing (NGS) technologies, target sequencing in regions of the genome is often required to diagnose many genetic diseases. Target enrichment can be an effective factor in reducing the cost of sequencing and the duration of sequencing. Recently, several clustered system regularly interspaced short palindromic repeats (CRISPR)-based methods (amplification-free sequencing) have been developed to target enrichment in combination with one of the NGS platforms. CRISPR-based target enrichment strategies act as an auxiliary tool to improve NGS analytical performance, thereby indirectly facilitating nucleic acid detection. The direct DNA cleavage approach by CRISPR-Cas at genome-specific sites enhances the possibility of separating native large fragments from disease-related genomic regions. The CRISPR-Cas can isolate the target region without any amplification; subsequently, long-read sequencing technologies were also implemented. These methods, as promising tools, have the ability to assess genetic and epigenetic composition for clinical application and treatment responses in cancer precision medicine. By modifying CRISPR-based enrichment protocols, it was possible to identify different types of mutations, including structural variants, short tandem repeats, fusion genes, and mobile elements. The Cas9 can specifically eliminate wild-type sequences, and it also enables the enrichment and detection of small amounts of tumor DNA fragments among the highly heterogeneous fragments of wild-type DNA.
Collapse
Affiliation(s)
- Mehrdad Malekshoar
- Anesthesiology, Critical Care and Pain Management Research Center, Hormozgan University of Medical Sciences, Bandar Abbas, Iran
| | - Sajad Ataei Azimi
- Department of Hematology-Oncology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Arastoo Kaki
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Leila Mousazadeh
- Department of Medical Biotechnology, School of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Jamshid Motaei
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Majid Vatankhah
- Anesthesiology, Critical Care and Pain Management Research Center, Hormozgan University of Medical Sciences, Bandar Abbas, Iran.
| |
Collapse
|
4
|
Biezuner T, Brilon Y, Arye AB, Oron B, Kadam A, Danin A, Furer N, Minden MD, Hwan Kim DD, Shapira S, Arber N, Dick J, Thavendiranathan P, Moskovitz Y, Kaushansky N, Chapal-Ilani N, Shlush LI. An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency. NAR Genom Bioinform 2022; 4:lqab125. [PMID: 35156021 PMCID: PMC8826764 DOI: 10.1093/nargab/lqab125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 11/25/2021] [Accepted: 01/25/2022] [Indexed: 11/23/2022] Open
Abstract
Deep targeted sequencing technologies are still not widely used in clinical practice due to the complexity of the methods and their cost. The Molecular Inversion Probes (MIP) technology is cost effective and scalable in the number of targets, however, suffers from low overall performance especially in GC rich regions. In order to improve the MIP performance, we sequenced a large cohort of healthy individuals (n = 4417), with a panel of 616 MIPs, at high depth in duplicates. To improve the previous state-of-the-art statistical model for low variant allele frequency, we selected 4635 potentially positive variants and validated them using amplicon sequencing. Using machine learning prediction tools, we significantly improved precision of 10–56.25% (P < 0.0004) to detect variants with VAF > 0.005. We further developed biochemically modified MIP protocol and improved its turn-around-time to ∼4 h. Our new biochemistry significantly improved uniformity, GC-Rich regions coverage, and enabled 95% on target reads in a large MIP panel of 8349 genomic targets. Overall, we demonstrate an enhancement of the MIP targeted sequencing approach in both detection of low frequency variants and in other key parameters, paving its way to become an ultrafast cost-effective research and clinical diagnostic tool.
Collapse
Affiliation(s)
- Tamir Biezuner
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Yardena Brilon
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Asaf Ben Arye
- Department of Statistics and Operations Research, Tel Aviv University, Ramat Aviv, Israel
| | - Barak Oron
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Aditee Kadam
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Adi Danin
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Nili Furer
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Medical Oncology & Hematology, Toronto, ON, Canada
| | - Dennis Dong Hwan Kim
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Medical Oncology & Hematology, Toronto, ON, Canada
| | | | | | - John Dick
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Molecular Genetics, Toronto, ON, Canada
| | - Paaladinesh Thavendiranathan
- Department of Medicine, Division of Cardiology, Ted Rogers Program in Cardiotoxicity Prevention, Peter Munk Cardiac Center, Toronto General Hospital, University Health Network, University of Toronto, Toronto, ON, Canada
| | - Yoni Moskovitz
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Nathali Kaushansky
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Noa Chapal-Ilani
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Liran I Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| |
Collapse
|
5
|
PolyG-DS: An ultrasensitive polyguanine tract-profiling method to detect clonal expansions and trace cell lineage. Proc Natl Acad Sci U S A 2021; 118:2023373118. [PMID: 34330826 DOI: 10.1073/pnas.2023373118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Polyguanine tracts (PolyGs) are short guanine homopolymer repeats that are prone to accumulating mutations when cells divide. This feature makes them especially suitable for cell lineage tracing, which has been exploited to detect and characterize precancerous and cancerous somatic evolution. PolyG genotyping, however, is challenging because of the inherent biochemical difficulties in amplifying and sequencing repetitive regions. To overcome this limitation, we developed PolyG-DS, a next-generation sequencing (NGS) method that combines the error-correction capabilities of duplex sequencing (DS) with enrichment of PolyG loci using CRISPR-Cas9-targeted genomic fragmentation. PolyG-DS markedly reduces technical artifacts by comparing the sequences derived from the complementary strands of each original DNA molecule. We demonstrate that PolyG-DS genotyping is accurate, reproducible, and highly sensitive, enabling the detection of low-frequency alleles (<0.01) in spike-in samples using a panel of only 19 PolyG markers. PolyG-DS replicated prior results based on PolyG fragment length analysis by capillary electrophoresis, and exhibited higher sensitivity for identifying clonal expansions in the nondysplastic colon of patients with ulcerative colitis. We illustrate the utility of this method for resolving the phylogenetic relationship among precancerous lesions in ulcerative colitis and for tracing the metastatic dissemination of ovarian cancer. PolyG-DS enables the study of tumor evolution without prior knowledge of tumor driver mutations and provides a tool to perform cost-effective and easily scalable ultra-accurate NGS-based PolyG genotyping for multiple applications in biology, genetics, and cancer research.
Collapse
|
6
|
Tao L, Raz O, Marx Z, Ghosh MS, Huber S, Greindl-Junghans J, Biezuner T, Amir S, Milo L, Adar R, Levy R, Onn A, Chapal-Ilani N, Berman V, Ben Arie A, Rom G, Oron B, Halaban R, Czyz ZT, Werner-Klein M, Klein CA, Shapiro E. Retrospective cell lineage reconstruction in humans by using short tandem repeats. CELL REPORTS METHODS 2021; 1:None. [PMID: 34341783 PMCID: PMC8313865 DOI: 10.1016/j.crmeth.2021.100054] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 04/17/2021] [Accepted: 06/24/2021] [Indexed: 12/18/2022]
Abstract
Cell lineage analysis aims to uncover the developmental history of an organism back to its cell of origin. Recently, novel in vivo methods utilizing genome editing enabled important insights into the cell lineages of animals. In contrast, human cell lineage remains restricted to retrospective approaches, which still lack resolution and cost-efficient solutions. Here, we demonstrate a scalable platform based on short tandem repeats targeted by duplex molecular inversion probes. With this human cell lineage tracing method, we accurately reproduced a known lineage of DU145 cells and reconstructed lineages of healthy and metastatic single cells from a melanoma patient who matched the anatomical reference while adding further refinements. This platform allowed us to faithfully recapitulate lineages of developmental tissue formation in healthy cells. In summary, our lineage discovery platform can profile informative somatic mutations efficiently and provides solid lineage reconstructions even in challenging low-mutation-rate healthy single cells.
Collapse
Affiliation(s)
- Liming Tao
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Ofir Raz
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Zipora Marx
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Manjusha S. Ghosh
- Experimental Medicine and Therapy Research, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
| | - Sandra Huber
- Experimental Medicine and Therapy Research, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
| | - Julia Greindl-Junghans
- Experimental Medicine and Therapy Research, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
| | - Tamir Biezuner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Shiran Amir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Lilach Milo
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Rivka Adar
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Ron Levy
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Amos Onn
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Noa Chapal-Ilani
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Veronika Berman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Asaf Ben Arie
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Guy Rom
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Barak Oron
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Ruth Halaban
- Department of Dermatology, Yale University School of Medicine, New Haven, CT 06520-8059, USA
| | - Zbigniew T. Czyz
- Experimental Medicine and Therapy Research, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
| | - Melanie Werner-Klein
- Experimental Medicine and Therapy Research, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
| | - Christoph A. Klein
- Experimental Medicine and Therapy Research, University of Regensburg, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany
- Division of Personalized Tumor Therapy, Fraunhofer Institute for Experimental Medicine and Toxicology Regensburg, Am Biopark 9, 93053 Regensburg, Germany
| | - Ehud Shapiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| |
Collapse
|
7
|
Ciosi M, Cumming SA, Chatzi A, Larson E, Tottey W, Lomeikaite V, Hamilton G, Wheeler VC, Pinto RM, Kwak S, Morton AJ, Monckton DG. Approaches to Sequence the HTT CAG Repeat Expansion and Quantify Repeat Length Variation. J Huntingtons Dis 2021; 10:53-74. [PMID: 33579864 PMCID: PMC7990409 DOI: 10.3233/jhd-200433] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
BACKGROUND Huntington's disease (HD) is an autosomal dominant neurodegenerative disorder caused by the expansion of the HTT CAG repeat. Affected individuals inherit ≥36 repeats and longer alleles cause earlier onset, greater disease severity and faster disease progression. The HTT CAG repeat is genetically unstable in the soma in a process that preferentially generates somatic expansions, the proportion of which is associated with disease onset, severity and progression. Somatic mosaicism of the HTT CAG repeat has traditionally been assessed by semi-quantitative PCR-electrophoresis approaches that have limitations (e.g., no information about sequence variants). Genotyping-by-sequencing could allow for some of these limitations to be overcome. OBJECTIVE To investigate the utility of PCR sequencing to genotype large (>50 CAGs) HD alleles and to quantify the associated somatic mosaicism. METHODS We have applied MiSeq and PacBio sequencing to PCR products of the HTT CAG repeat in transgenic R6/2 mice carrying ∼55, ∼110, ∼255 and ∼470 CAGs. For each of these alleles, we compared the repeat length distributions generated for different tissues at two ages. RESULTS We were able to sequence the CAG repeat full length in all samples. However, the repeat length distributions for samples with ∼470 CAGs were biased towards shorter repeat lengths. CONCLUSION PCR sequencing can be used to sequence all the HD alleles considered, but this approach cannot be used to estimate modal allele size or quantify somatic expansions for alleles ⪢250 CAGs. We review the limitations of PCR sequencing and alternative approaches that may allow the quantification of somatic contractions and very large somatic expansions.
Collapse
Affiliation(s)
- Marc Ciosi
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Sarah A. Cumming
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Afroditi Chatzi
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Eloise Larson
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - William Tottey
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Vilija Lomeikaite
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Graham Hamilton
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
- Glasgow Polyomics, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Vanessa C. Wheeler
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Ricardo Mouro Pinto
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Seung Kwak
- CHDI Management/CHDI Foundation, Princeton, NJ, USA
| | - A. Jennifer Morton
- Department of Physiology, Development and Neuroscience, University of Cambridge, Tennis Court Road, Cambridge, UK
| | - Darren G. Monckton
- Institute of Molecular, Cell and Systems Biology, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| |
Collapse
|
8
|
Oh S, Jo Y, Jung S, Yoon S, Yoo KH. From genome sequencing to the discovery of potential biomarkers in liver disease. BMB Rep 2020. [PMID: 32475383 PMCID: PMC7330805 DOI: 10.5483/bmbrep.2020.53.6.074] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Chronic liver disease progresses through several stages, fatty liver, steatohepatitis, cirrhosis, and eventually, it leads to hepatocellular carcinoma (HCC) over a long period of time. Since a large proportion of patients with HCC are accompanied by cirrhosis, it is considered to be an important factor in the diagnosis of liver cancer. This is because cirrhosis leads to an irreversible harmful effect, but the early stages of chronic liver disease could be reversed to a healthy state. Therefore, the discovery of biomarkers that could identify the early stages of chronic liver disease is important to prevent serious liver damage. Biomarker discovery at liver cancer and cirrhosis has enhanced the development of sequencing technology. Next generation sequencing (NGS) is one of the representative technical innovations in the biological field in the recent decades and it is the most important thing to design for research on what type of sequencing methods are suitable and how to handle the analysis steps for data integration. In this review, we comprehensively summarized NGS techniques for identifying genome, transcriptome, DNA methylome and 3D/4D chromatin structure, and introduced framework of processing data set and integrating multi-omics data for uncovering biomarkers.
Collapse
Affiliation(s)
- Sumin Oh
- Laboratory of Biomedical Genomics, Department of Biological Sciences, Sookmyung Women’s University, Seoul 04310, Korea
- Research Institute of Women’s Health, Sookmyung Women’s University, Seoul 04310, Korea
| | - Yeeun Jo
- Laboratory of Biomedical Genomics, Department of Biological Sciences, Sookmyung Women’s University, Seoul 04310, Korea
| | - Sungju Jung
- Laboratory of Biomedical Genomics, Department of Biological Sciences, Sookmyung Women’s University, Seoul 04310, Korea
| | - Sumin Yoon
- Laboratory of Biomedical Genomics, Department of Biological Sciences, Sookmyung Women’s University, Seoul 04310, Korea
| | - Kyung Hyun Yoo
- Laboratory of Biomedical Genomics, Department of Biological Sciences, Sookmyung Women’s University, Seoul 04310, Korea
- Research Institute of Women’s Health, Sookmyung Women’s University, Seoul 04310, Korea
| |
Collapse
|
9
|
Abstract
Individuals within a species can exhibit vast variation in copy number of repetitive DNA elements. This variation may contribute to complex traits such as lifespan and disease, yet it is only infrequently considered in genotype-phenotype associations. Although the possible importance of copy number variation is widely recognized, accurate copy number quantification remains challenging. Here, we assess the technical reproducibility of several major methods for copy number estimation as they apply to the large repetitive ribosomal DNA array (rDNA). rDNA encodes the ribosomal RNAs and exists as a tandem gene array in all eukaryotes. Repeat units of rDNA are kilobases in size, often with several hundred units comprising the array, making rDNA particularly intractable to common quantification techniques. We evaluate pulsed-field gel electrophoresis, droplet digital PCR, and Nextera-based whole genome sequencing as approaches to copy number estimation, comparing techniques across model organisms and spanning wide ranges of copy numbers. Nextera-based whole genome sequencing, though commonly used in recent literature, produced high error. We explore possible causes for this error and provide recommendations for best practices in rDNA copy number estimation. We present a resource of high-confidence rDNA copy number estimates for a set of S. cerevisiae and C. elegans strains for future use. We furthermore explore the possibility for FISH-based copy number estimation, an alternative that could potentially characterize copy number on a cellular level.
Collapse
|
10
|
|
11
|
Christopher J, Thorsen AS, Abujudeh S, Lourenço FC, Kemp R, Potter PK, Morrissey E, Hazelwood L, Winton DJ. Quantifying Microsatellite Mutation Rates from Intestinal Stem Cell Dynamics in Msh2-Deficient Murine Epithelium. Genetics 2019; 212:655-665. [PMID: 31126976 PMCID: PMC6614890 DOI: 10.1534/genetics.119.302268] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 05/14/2019] [Indexed: 12/12/2022] Open
Abstract
Microsatellite sequences have an enhanced susceptibility to mutation, and can act as sentinels indicating elevated mutation rates and increased risk of cancer. The probability of mutant fixation within the intestinal epithelium is dictated by a combination of stem cell dynamics and mutation rate. Here, we exploit this relationship to infer microsatellite mutation rates. First a sensitive, multiplexed, and quantitative method for detecting somatic changes in microsatellite length was developed that allowed the parallel detection of mutant [CA]n sequences from hundreds of low-input tissue samples at up to 14 loci. The method was applied to colonic crypts in Mus musculus, and enabled detection of mutant subclones down to 20% of the cellularity of the crypt (∼50 of 250 cells). By quantifying age-related increases in clone frequencies for multiple loci, microsatellite mutation rates in wild-type and Msh2-deficient epithelium were established. An average 388-fold increase in mutation per mitosis rate was observed in Msh2-deficient epithelium (2.4 × 10-2) compared to wild-type epithelium (6.2 × 10-5).
Collapse
Affiliation(s)
- Joseph Christopher
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Ann-Sofie Thorsen
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Sam Abujudeh
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Filipe C Lourenço
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Richard Kemp
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Paul K Potter
- Department Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, OX3 0BP, United Kingdom
| | - Edward Morrissey
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, OX3 9DS, United Kingdom
| | - Lee Hazelwood
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| | - Douglas J Winton
- Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE, United Kingdom
| |
Collapse
|
12
|
Lv J, Jiao W, Guo H, Liu P, Wang R, Zhang L, Zeng Q, Hu X, Bao Z, Wang S. HD-Marker: a highly multiplexed and flexible approach for targeted genotyping of more than 10,000 genes in a single-tube assay. Genome Res 2018; 28:1919-1930. [PMID: 30409770 PMCID: PMC6280760 DOI: 10.1101/gr.235820.118] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 10/25/2018] [Indexed: 01/03/2023]
Abstract
Targeted genotyping of transcriptome-scale genetic markers is highly attractive for genetic, ecological, and evolutionary studies, but achieving this goal in a cost-effective manner remains a major challenge, especially for laboratories working on nonmodel organisms. Here, we develop a high-throughput, sequencing-based GoldenGate approach (called HD-Marker), which addresses the array-related issues of original GoldenGate methodology and allows for highly multiplexed and flexible targeted genotyping of more than 12,000 loci in a single-tube assay (in contrast to fewer than 3100 in the original GoldenGate assay). We perform extensive analyses to demonstrate the power and performance of HD-Marker on various multiplex levels (296, 795, 1293, and 12,472 genic SNPs) across two sequencing platforms in two nonmodel species (the scallops Chlamys farreri and Patinopecten yessoensis), with extremely high capture rate (98%-99%) and genotyping accuracy (97%-99%). We also demonstrate the potential of HD-Marker for high-throughput targeted genotyping of alternative marker types (e.g., microsatellites and indels). With its remarkable cost-effectiveness (as low as $0.002 per genotype) and high flexibility in choice of multiplex levels and marker types, HD-Marker provides a highly attractive tool over array-based platforms for fulfilling genome/transcriptome-wide targeted genotyping applications, especially in nonmodel organisms.
Collapse
Affiliation(s)
- Jia Lv
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Wenqian Jiao
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Haobing Guo
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Pingping Liu
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Ruijia Wang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Lingling Zhang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Qifan Zeng
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Xiaoli Hu
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Zhenmin Bao
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Shi Wang
- MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| |
Collapse
|
13
|
Press MO, McCoy RC, Hall AN, Akey JM, Queitsch C. Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana. Genome Res 2018; 28:1169-1178. [PMID: 29970452 PMCID: PMC6071631 DOI: 10.1101/gr.231753.117] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 06/26/2018] [Indexed: 11/24/2022]
Abstract
Short tandem repeat (STR) mutations may comprise more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assessed this contribution across a collection of 96 strains of Arabidopsis thaliana, genotyping 2046 STR loci each, using highly parallel STR sequencing with molecular inversion probes. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR across these strains. STR expansions (large copy number increases) are found in most strains, several of which have evident functional effects. These include three of six intronic STR expansions we found to be associated with intron retention. Coding STRs were depleted of variation relative to noncoding STRs, and we detected a total of 56 coding STRs (11%) showing low variation consistent with the action of purifying selection. In contrast, some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detected 133 novel STR-phenotype associations under stringent criteria, most of which could not be detected with SNPs alone, and validated some with follow-up experiments. Our results support the conclusion that STRs constitute a large, unascertained reservoir of functionally relevant genomic variation.
Collapse
Affiliation(s)
- Maximilian O Press
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Rajiv C McCoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.,Molecular and Cellular Biology Program, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
14
|
Genovese LM, Geraci F, Corrado L, Mangano E, D'Aurizio R, Bordoni R, Severgnini M, Manzini G, De Bellis G, D'Alfonso S, Pellegrini M. A Census of Tandemly Repeated Polymorphic Loci in Genic Regions Through the Comparative Integration of Human Genome Assemblies. Front Genet 2018; 9:155. [PMID: 29770143 PMCID: PMC5941971 DOI: 10.3389/fgene.2018.00155] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 04/13/2018] [Indexed: 11/29/2022] Open
Abstract
Polymorphic Tandem Repeat (PTR) is a common form of polymorphism in the human genome. A PTR consists in a variation found in an individual (or in a population) of the number of repeating units of a Tandem Repeat (TR) locus of the genome with respect to the reference genome. Several phenotypic traits and diseases have been discovered to be strongly associated with or caused by specific PTR loci. PTR are further distinguished in two main classes: Short Tandem Repeats (STR) when the repeating unit has size up to 6 base pairs, and Variable Number Tandem Repeats (VNTR) for repeating units of size above 6 base pairs. As larger and larger populations are screened via high throughput sequencing projects, it becomes technically feasible and desirable to explore the association between PTR and a panoply of such traits and conditions. In order to facilitate these studies, we have devised a method for compiling catalogs of PTR from assembled genomes, and we have produced a catalog of PTR for genic regions (exons, introns, UTR and adjacent regions) of the human genome (GRCh38). We applied four different TR discovery software tools to uncover in the first phase 55,223,485 TR (after duplicate removal) in GRCh38, of which 373,173 were determined to be PTR in the second phase by comparison with five assembled human genomes. Of these, 263,266 are not included by state-of-the-art PTR catalogs. The new methodology is mainly based on a hierarchical and systematic application of alignment-based sequence comparisons to identify and measure the polymorphism of TR. While previous catalogs focus on the class of STR of small total size, we remove any size restrictions, aiming at the more general class of PTR, and we also target fuzzy TR by using specific detection tools. Similarly to other previous catalogs of human polymorphic loci, we focus our catalog toward applications in the discovery of disease-associated loci. Validation by cross-referencing with existing catalogs on common clinically-relevant loci shows good concordance. Overall, this proposed census of human PTR in genic regions is a shared resource (web accessible), complementary to existing catalogs, facilitating future genome-wide studies involving PTR.
Collapse
Affiliation(s)
| | - Filippo Geraci
- Institute for Informatics and Telematics of CNR, Pisa, Italy
| | - Lucia Corrado
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | | | - Roberta Bordoni
- Institute for Biomedical Technologies of CNR, Segrate, Italy
| | | | - Giovanni Manzini
- Institute for Informatics and Telematics of CNR, Pisa, Italy.,Department of Science and Technological Innovation, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | | - Sandra D'Alfonso
- Department of Health Sciences, University of Eastern Piedmont Amedeo Avogadro, Novara, Italy
| | | |
Collapse
|
15
|
Salk JJ, Schmitt MW, Loeb LA. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 2018; 19:269-285. [PMID: 29576615 PMCID: PMC6485430 DOI: 10.1038/nrg.2017.117] [Citation(s) in RCA: 338] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Mutations, the fuel of evolution, are first manifested as rare DNA changes within a population of cells. Although next-generation sequencing (NGS) technologies have revolutionized the study of genomic variation between species and individual organisms, most have limited ability to accurately detect and quantify rare variants among the different genome copies in heterogeneous mixtures of cells or molecules. We describe the technical challenges in characterizing subclonal variants using conventional NGS protocols and the recent development of error correction strategies, both computational and experimental, including consensus sequencing of single DNA molecules. We also highlight major applications for low-frequency mutation detection in science and medicine, describe emerging methodologies and provide our vision for the future of DNA sequencing.
Collapse
Affiliation(s)
- Jesse J Salk
- Department of Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Medicine, Divisions of Hematology and Medical Oncology, University of Washington School of Medicine, Seattle, WA, USA
- Fred Hutchinson Cancer Research Center, Clinical Research Division, Seattle, WA, USA
| | - Michael W Schmitt
- Department of Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Medicine, Divisions of Hematology and Medical Oncology, University of Washington School of Medicine, Seattle, WA, USA
- Fred Hutchinson Cancer Research Center, Clinical Research Division, Seattle, WA, USA
| | - Lawrence A Loeb
- Department of Pathology, University of Washington School of Medicine, Seattle, WA, USA
- Department of Biochemistry, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
16
|
Waalkes A, Smith N, Penewit K, Hempelmann J, Konnick EQ, Hause RJ, Pritchard CC, Salipante SJ. Accurate Pan-Cancer Molecular Diagnosis of Microsatellite Instability by Single-Molecule Molecular Inversion Probe Capture and High-Throughput Sequencing. Clin Chem 2018; 64:950-958. [PMID: 29632127 DOI: 10.1373/clinchem.2017.285981] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 03/08/2018] [Indexed: 11/06/2022]
Abstract
BACKGROUND Microsatellite instability (MSI) is an emerging actionable phenotype in oncology that informs tumor response to immune checkpoint pathway immunotherapy. However, there remains a need for MSI diagnostics that are low cost, highly accurate, and generalizable across cancer types. We developed a method for targeted high-throughput sequencing of numerous microsatellite loci with pan-cancer informativity for MSI using single-molecule molecular inversion probes (smMIPs). METHODS We designed a smMIP panel targeting 111 loci highly informative for MSI across cancers. We developed an analytical framework taking advantage of smMIP-mediated error correction to specifically and sensitively detect instability events without the need for typing matched normal material. RESULTS Using synthetic DNA mixtures, smMIPs were sensitive to at least 1% MSI-positive cells and were highly consistent across replicates. The fraction of identified unstable microsatellites discriminated tumors exhibiting MSI from those lacking MSI with high accuracy across colorectal (100% diagnostic sensitivity and specificity), prostate (100% diagnostic sensitivity and specificity), and endometrial cancers (95.8% diagnostic sensitivity and 100% specificity). MSI-PCR, the current standard-of-care molecular diagnostic for MSI, proved equally robust for colorectal tumors but evidenced multiple false-negative results in prostate (81.8% diagnostic sensitivity and 100% specificity) and endometrial (75.0% diagnostic sensitivity and 100% specificity) tumors. CONCLUSIONS smMIP capture provides an accurate, diagnostically sensitive, and economical means to diagnose MSI across cancer types without reliance on patient-matched normal material. The assay is readily scalable to large numbers of clinical samples, enables automated and quantitative analysis of microsatellite instability, and is readily standardized across clinical laboratories.
Collapse
Affiliation(s)
- Adam Waalkes
- Department of Laboratory Medicine, University of Washington, Seattle, WA
| | - Nahum Smith
- Department of Laboratory Medicine, University of Washington, Seattle, WA
| | - Kelsi Penewit
- Department of Laboratory Medicine, University of Washington, Seattle, WA
| | | | - Eric Q Konnick
- Department of Laboratory Medicine, University of Washington, Seattle, WA
| | - Ronald J Hause
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Colin C Pritchard
- Department of Laboratory Medicine, University of Washington, Seattle, WA
| | | |
Collapse
|
17
|
Abstract
Accumulating evidence suggests that many classes of DNA repeats exhibit attributes that distinguish them from other genetic variants, including the fact that they are more liable to mutation; this enables them to mediate genetic plasticity. The expansion of tandem repeats, particularly of short tandem repeats, can cause a range of disorders (including Huntington disease, various ataxias, motor neuron disease, frontotemporal dementia, fragile X syndrome and other neurological disorders), and emerging data suggest that tandem repeat polymorphisms (TRPs) can also regulate gene expression in healthy individuals. TRPs in human genomes may also contribute to the missing heritability of polygenic disorders. A better understanding of tandem repeats and their associated repeatome, as well as their capacity for genetic plasticity via both germline and somatic mutations, is needed to transform our understanding of the role of TRPs in health and disease.
Collapse
Affiliation(s)
- Anthony J Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne.,Department of Anatomy and Neuroscience, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
18
|
Xu L, Haasl RJ, Sun J, Zhou Y, Bickhart DM, Li J, Song J, Sonstegard TS, Van Tassell CP, Lewin HA, Liu GE. Systematic Profiling of Short Tandem Repeats in the Cattle Genome. Genome Biol Evol 2018; 9:20-31. [PMID: 28172841 PMCID: PMC5381564 DOI: 10.1093/gbe/evw256] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2016] [Indexed: 12/13/2022] Open
Abstract
Short tandem repeats (STRs), or microsatellites, are genetic variants with repetitive 2–6 base pair motifs in many mammalian genomes. Using high-throughput sequencing and experimental validations, we systematically profiled STRs in five Holsteins. We identified a total of 60,106 microsatellites and generated the first high-resolution STR map, representing a substantial pool of polymorphism in dairy cattle. We observed significant STRs overlap with functional genes and quantitative trait loci (QTL). We performed evolutionary and population genetic analyses using over 20,000 common dinucleotide STRs. Besides corroborating the well-established positive correlation between allele size and variance in allele size, these analyses also identified dozens of outlier STRs based on two anomalous relationships that counter expected characteristics of neutral evolution. And one STR locus overlaps with a significant region of a summary statistic designed to detect STR-related selection. Additionally, our results showed that only 57.1% of STRs located within SNP-based linkage disequilibrium (LD) blocks whereas the other 42.9% were out of blocks. Therefore, a substantial number of STRs are not tagged by SNPs in the cattle genome, likely due to STR's distinct mutation mechanism and elevated polymorphism. This study provides the foundation for future STR-based studies of cattle genome evolution and selection.
Collapse
Affiliation(s)
- Lingyang Xu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD.,Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China.,Department of Animal and Avian Sciences, University of Maryland, College Park, MD
| | - Ryan J Haasl
- Department of Biology, University of Wisconsin - Platteville, WI
| | - Jiajie Sun
- College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Yang Zhou
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD.,College of Animal Science and Technology, Northwest A&F University, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Yangling, Shannxi, China
| | - Derek M Bickhart
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| | - Junya Li
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD
| | - Tad S Sonstegard
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| | - Harris A Lewin
- Department of Evolution and Ecology, University of California, Davis, CA
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, Beltsville, MD
| |
Collapse
|
19
|
Kistler L, Johnson SM, Irwin MT, Louis EE, Ratan A, Perry GH. A massively parallel strategy for STR marker development, capture, and genotyping. Nucleic Acids Res 2017; 45:e142. [PMID: 28666376 PMCID: PMC5587753 DOI: 10.1093/nar/gkx574] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 06/21/2017] [Indexed: 12/11/2022] Open
Abstract
Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci—97.3–99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies.
Collapse
Affiliation(s)
- Logan Kistler
- Department of Anthropology, National Museum of Natural History, Smithsonian Institution, Washington, DC 20560, USA.,Departments of Anthropology and Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Stephen M Johnson
- Departments of Anthropology and Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Mitchell T Irwin
- Department of Anthropology, Northern Illinois University, DeKalb, IL 60115, USA
| | - Edward E Louis
- Center for Conservation and Research, Omaha's Henry Doorly Zoo and Aquarium, Omaha, NE 68107, USA
| | - Aakrosh Ratan
- Department of Public Health Sciences and Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - George H Perry
- Departments of Anthropology and Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
20
|
Tang H, Nzabarushimana E. STRScan: targeted profiling of short tandem repeats in whole-genome sequencing data. BMC Bioinformatics 2017; 18:398. [PMID: 28984185 PMCID: PMC5629557 DOI: 10.1186/s12859-017-1800-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) are found in many prokaryotic and eukaryotic genomes, and are commonly used as genetic markers, in particular for identity and parental testing in DNA forensics. The unstable expansion of some STRs was associated with various genetic disorders (e.g., the Huntington disease), and thus was used in genetic testing for screening individuals at high risk. Traditional STR analyses were based on the PCR amplification of STR loci followed by gel electrophoresis. With the availability of massive whole genome sequencing data, it becomes practical to mine STR profiles in silico from genome sequences. Software tools such as lobSTR and STR-FM have been developed to address these demands, which are, however, built upon whole genome reads mapping tools, and thus may not be sensitive enough. RESULTS In this paper, we present a standalone software tool STRScan that uses a greedy algorithm for targeted STR profiling in next-generation sequencing (NGS) data. STRScan was tested on the whole genome sequencing data from Venter genome sequencing and 1000 Genomes Project. The results showed that STRScan can profile 20% more STRs in the target set that are missed by lobSTR. CONCLUSION STRScan is particularly useful for the NGS-based targeted STR profiling, e.g., in genetic and human identity testing. STRScan is available as open-source software at http://darwin.informatics.indiana.edu/str/ .
Collapse
Affiliation(s)
- Haixu Tang
- School of Informatics and Computing, Indiana University, 150 S. Woodlawn Avenue, Bloomington, 47405, IN, USA.
| | - Etienne Nzabarushimana
- School of Informatics and Computing, Indiana University, 150 S. Woodlawn Avenue, Bloomington, 47405, IN, USA
| |
Collapse
|
21
|
Shin G, Grimes SM, Lee H, Lau BT, Xia LC, Ji HP. CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis. Nat Commun 2017; 8:14291. [PMID: 28169275 PMCID: PMC5309709 DOI: 10.1038/ncomms14291] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2016] [Accepted: 12/15/2016] [Indexed: 11/09/2022] Open
Abstract
Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.
Collapse
Affiliation(s)
- GiWon Shin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Susan M Grimes
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Billy T Lau
- Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| | - Li C Xia
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, CCSR 1115, 269 Campus Drive, Stanford, California 94305, USA.,Stanford Genome Technology Center, Stanford University, 3165 Porter Drive, Palo Alto, California 94304, USA
| |
Collapse
|
22
|
Bagshaw ATM, Horwood LJ, Fergusson DM, Gemmell NJ, Kennedy MA. Microsatellite polymorphisms associated with human behavioural and psychological phenotypes including a gene-environment interaction. BMC MEDICAL GENETICS 2017; 18:12. [PMID: 28158988 PMCID: PMC5291968 DOI: 10.1186/s12881-017-0374-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 01/25/2017] [Indexed: 02/05/2023]
Abstract
Background The genetic and environmental influences on human personality and behaviour are a complex matter of ongoing debate. Accumulating evidence indicates that short tandem repeats (STRs) in regulatory regions are good candidates to explain heritability not accessed by genome-wide association studies. Methods We tested for associations between the genotypes of four selected repeats and 18 traits relating to personality, behaviour, cognitive ability and mental health in a well-studied longitudinal birth cohort (n = 458-589) using one way analysis of variance. The repeats were a highly conserved poly-AC microsatellite in the upstream promoter region of the T-box brain 1 (TBR1) gene and three previously studied STRs in the activating enhancer-binding protein 2-beta (AP2-β) and androgen receptor (AR) genes. Where significance was found we used multiple regression to assess the influence of confounding factors. Results Carriers of the shorter, most common, allele of the AR gene’s GGN microsatellite polymorphism had fewer anxiety-related symptoms, which was consistent with previous studies, but in our study this was not significant following Bonferroni correction. No associations with two repeats in the AP2-β gene withstood this correction. A novel finding was that carriers of the minor allele of the TBR1 AC microsatellite were at higher risk of conduct problems in childhood at age 7-9 (p = 0.0007, which did pass Bonferroni correction). Including maternal smoking during pregnancy (MSDP) in models controlling for potentially confounding influences showed that an interaction between TBR1 genotype and MSDP was a significant predictor of conduct problems in childhood and adolescence (p < 0.001), and of self-reported criminal behaviour up to age 25 years (p ≤ 0.02). This interaction remained significant after controlling for possible confounders including maternal age at birth, socio-economic status and education, and offspring birth weight. Conclusions The potential functional importance of the TBR1 gene’s promoter microsatellite deserves further investigation. Our results suggest that it participates in a gene-environment interaction with MDSP and antisocial behaviour. However, previous evidence that mothers who smoke during pregnancy carry genes for antisocial behaviour suggests that epistasis may influence the interaction. Electronic supplementary material The online version of this article (doi:10.1186/s12881-017-0374-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andrew T M Bagshaw
- Department of Pathology, University of Otago, Christchurch, PO Box 4345, Christchurch, New Zealand.
| | - L John Horwood
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - David M Fergusson
- Department of Psychological Medicine, University of Otago, Christchurch, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin, New Zealand.,Gravida - National Centre for Growth and Development, University of Otago, Dunedin, New Zealand
| | - Martin A Kennedy
- Department of Pathology, University of Otago, Christchurch, PO Box 4345, Christchurch, New Zealand
| |
Collapse
|
23
|
Chen HY, Ma SL, Huang W, Ji L, Leung VHK, Jiang H, Yao X, Tang NLS. The mechanism of transactivation regulation due to polymorphic short tandem repeats (STRs) using IGF1 promoter as a model. Sci Rep 2016; 6:38225. [PMID: 27910883 PMCID: PMC5133613 DOI: 10.1038/srep38225] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 11/07/2016] [Indexed: 11/09/2022] Open
Abstract
Functional short tandem repeats (STR) are polymorphic in the population, and the number of repeats regulates the expression of nearby genes (known as expression STR, eSTR). STR in IGF1 promoter has been extensively studied for its association with IGF1 concentration in blood and various clinical traits and represents an important eSTR. We previously used an in-vitro luciferase reporter model to examine the interaction between STRs and SNPs in IGF1 promoter. Here, we further explored the mechanism how the number of repeats of the STR regulates gene transcription. An inverse correlation between the number of repeats and the extent of transactivation was found in a haplotype consisting of three promoter SNPs (C-STR-T-T). We showed that these adjacent SNPs located outside the STR were required for the STR to function as eSTR. The C allele of rs35767 provides a binding site for CCAAT/enhancer-binding-protein δ (C/EBPD), which is essential for the gradational transactivation property of eSTR and FOXA3 may also be involved. Therefore, we propose a mechanism in which the gradational transactivation by the eSTR is caused by the interaction of one or more transcriptional complexes located outside the STR, rather than by direct binding to a repeat motif of the STR.
Collapse
Affiliation(s)
- Holly Y Chen
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Suk Ling Ma
- Department of Psychiatry, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Wei Huang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Department of Pharmaceutics, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lindan Ji
- Department of Biochemistry and Molecular Biology, Zhejiang Provincial Key Laboratory of Pathophysiology, Ningbo University School of Medicine, Ningbo, China
| | - Vincent H K Leung
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Honglin Jiang
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061, USA
| | - Xiaoqiang Yao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Nelson L S Tang
- Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.,School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China.,Laboratory of Genetics of Disease Susceptibility, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China.,Functional Genomics and Biostatistical Computing laboratory, Shenzhen Research Institute, The Chinese University of Hong Kong, China.,KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China
| |
Collapse
|
24
|
Huddleston J, Chaisson MJP, Steinberg KM, Warren W, Hoekzema K, Gordon D, Graves-Lindsay TA, Munson KM, Kronenberg ZN, Vives L, Peluso P, Boitano M, Chin CS, Korlach J, Wilson RK, Eichler EE. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res 2016; 27:677-685. [PMID: 27895111 PMCID: PMC5411763 DOI: 10.1101/gr.214007.116] [Citation(s) in RCA: 233] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 11/15/2016] [Indexed: 01/07/2023]
Abstract
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
Collapse
Affiliation(s)
- John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Karyn Meltz Steinberg
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Wes Warren
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Tina A Graves-Lindsay
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Laura Vives
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Paul Peluso
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Matthew Boitano
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Chen-Shin Chin
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Jonas Korlach
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Richard K Wilson
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
25
|
Song Z, Zhang M, Li F, Weng Q, Zhou C, Li M, Li J, Huang H, Mo X, Gan S. Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites. Sci Rep 2016; 6:34941. [PMID: 27748400 PMCID: PMC5066178 DOI: 10.1038/srep34941] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 09/20/2016] [Indexed: 11/09/2022] Open
Abstract
Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations.
Collapse
Affiliation(s)
- Zhijiao Song
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Xiangshan Road, Beijing 100091, China
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
- Baoshan University, Yuanzheng Road, Baoshan 678000, China
| | - Miaomiao Zhang
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
- College of Forestry, South China Agricultural University, 284 Block, Wushan Street, Guangzhou 510642, China
| | - Fagen Li
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
| | - Qijie Weng
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
| | - Chanpin Zhou
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
| | - Mei Li
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
| | - Jie Li
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
| | - Huanhua Huang
- Guangdong Academy of Forestry, Longdong, Guangzhou 510520, China
| | - Xiaoyong Mo
- College of Forestry, South China Agricultural University, 284 Block, Wushan Street, Guangzhou 510642, China
| | - Siming Gan
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Xiangshan Road, Beijing 100091, China
- Key Laboratory of State Forestry Administration on Tropical Forestry Research, Research Institute of Tropical Forestry, Chinese Academy of Forestry, Longdong, Guangzhou 510520, China
| |
Collapse
|
26
|
Biezuner T, Spiro A, Raz O, Amir S, Milo L, Adar R, Chapal-Ilani N, Berman V, Fried Y, Ainbinder E, Cohen G, Barr HM, Halaban R, Shapiro E. A generic, cost-effective, and scalable cell lineage analysis platform. Genome Res 2016; 26:1588-1599. [PMID: 27558250 PMCID: PMC5088600 DOI: 10.1101/gr.202903.115] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2015] [Accepted: 08/11/2016] [Indexed: 02/05/2023]
Abstract
Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery.
Collapse
Affiliation(s)
- Tamir Biezuner
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Adam Spiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Ofir Raz
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Shiran Amir
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Lilach Milo
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Rivka Adar
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Noa Chapal-Ilani
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Veronika Berman
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Yael Fried
- Department of Biological Services, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Elena Ainbinder
- Department of Biological Services, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Galit Cohen
- Maurice and Vivienne Wohl Institute for Drug Discovery, G-INCPM, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Haim M Barr
- Maurice and Vivienne Wohl Institute for Drug Discovery, G-INCPM, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Ruth Halaban
- Department of Dermatology, Yale University School of Medicine, New Haven, Connecticut 06520-8059, USA
| | - Ehud Shapiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel.,Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 761001, Israel
| |
Collapse
|
27
|
Bickhart DM, Xu L, Hutchison JL, Cole JB, Null DJ, Schroeder SG, Song J, Garcia JF, Sonstegard TS, Van Tassell CP, Schnabel RD, Taylor JF, Lewin HA, Liu GE. Diversity and population-genetic properties of copy number variations and multicopy genes in cattle. DNA Res 2016; 23:253-62. [PMID: 27085184 PMCID: PMC4909312 DOI: 10.1093/dnares/dsw013] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 02/29/2016] [Indexed: 11/14/2022] Open
Abstract
The diversity and population genetics of copy number variation (CNV) in domesticated animals are not well understood. In this study, we analysed 75 genomes of major taurine and indicine cattle breeds (including Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, and Romagnola), sequenced to 11-fold coverage to identify 1,853 non-redundant CNV regions. Supported by high validation rates in array comparative genomic hybridization (CGH) and qPCR experiments, these CNV regions accounted for 3.1% (87.5 Mb) of the cattle reference genome, representing a significant increase over previous estimates of the area of the genome that is copy number variable (∼2%). Further population genetics and evolutionary genomics analyses based on these CNVs revealed the population structures of the cattle taurine and indicine breeds and uncovered potential diversely selected CNVs near important functional genes, including AOX1, ASZ1, GAT, GLYAT, and KRTAP9-1. Additionally, 121 CNV gene regions were found to be either breed specific or differentially variable across breeds, such as RICTOR in dairy breeds and PNPLA3 in beef breeds. In contrast, clusters of the PRP and PAG genes were found to be duplicated in all sequenced animals, suggesting that subfunctionalization, neofunctionalization, or overdominance play roles in diversifying those fertility-related genes. These CNV results provide a new glimpse into the diverse selection histories of cattle breeds and a basis for correlating structural variation with complex traits in the future.
Collapse
Affiliation(s)
- Derek M Bickhart
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Lingyang Xu
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | - Jana L Hutchison
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - John B Cole
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Daniel J Null
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Steven G Schroeder
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | | | - Tad S Sonstegard
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | | | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Harris A Lewin
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - George E Liu
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| |
Collapse
|
28
|
Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y, Joshi RS, Mittelman D, Sharp AJ. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 2016; 44:3750-62. [PMID: 27060133 PMCID: PMC4857002 DOI: 10.1093/nar/gkw219] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
Despite representing an important source of genetic variation, tandem repeats (TRs) remain poorly studied due to technical difficulties. We hypothesized that TRs can operate as expression (eQTLs) and methylation (mQTLs) quantitative trait loci. To test this we analyzed the effect of variation at 4849 promoter-associated TRs, genotyped in 120 individuals, on neighboring gene expression and DNA methylation. Polymorphic promoter TRs were associated with increased variance in local gene expression and DNA methylation, suggesting functional consequences related to TR variation. We identified >100 TRs associated with expression/methylation levels of adjacent genes. These potential eQTL/mQTL TRs were enriched for overlaps with transcription factor binding and DNaseI hypersensitivity sites, providing a rationale for their effects. Moreover, we showed that most TR variants are poorly tagged by nearby single nucleotide polymorphisms (SNPs) markers, indicating that many functional TR variants are not effectively assayed by SNP-based approaches. Our study assigns biological significance to TR variations in the human genome, and suggests that a significant fraction of TR variations exert functional effects via alterations of local gene expression or epigenetics. We conclude that targeted studies that focus on genotyping TR variants are required to fully ascertain functional variation in the genome.
Collapse
Affiliation(s)
- Javier Quilez
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Audrey Guilmatre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paras Garg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gareth Highnam
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Melissa Gymrek
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA New York Genome Center, New York, NY 10038, USA
| | - Yaniv Erlich
- Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY 10027, USA
| | - Ricky S Joshi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Mittelman
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Andrew J Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
29
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-1599. [PMID: 26290536 DOI: 10.1101/015784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 05/25/2023]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1-5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2-50 bp). Genes that contained TRs in the promoters, in their 3' untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1-5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
30
|
Bilgin Sonay T, Carvalho T, Robinson MD, Greminger MP, Krützen M, Comas D, Highnam G, Mittelman D, Sharp A, Marques-Bonet T, Wagner A. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res 2015; 25:1591-9. [PMID: 26290536 PMCID: PMC4617956 DOI: 10.1101/gr.190868.115] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 08/14/2015] [Indexed: 12/20/2022]
Abstract
Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population diversity patterns can be efficiently captured with short TRs (repeat unit length, 1–5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2–50 bp). Genes that contained TRs in the promoters, in their 3′ untranslated region, in introns, and in exons had higher expression divergence than genes without repeats in the regions. Polymorphic small repeats (1–5 bp) had also higher expression divergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.
Collapse
Affiliation(s)
- Tugce Bilgin Sonay
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tiago Carvalho
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Mark D Robinson
- The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Maja P Greminger
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - Michael Krützen
- Evolutionary Genetics Group, Anthropological Institute and Museum, University of Zurich, CH-8057 Zurich, Switzerland
| | - David Comas
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Gareth Highnam
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - David Mittelman
- Department of Biological Science and Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Andrew Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai School, New York, New York 10029, USA
| | - Tomàs Marques-Bonet
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain; Centro Nacional de Análisis Genómico (CNAG), PCB, Barcelona, 08028 Catalonia, Spain; Catalan Institution for Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
| | - Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, CH-805 Zurich, Switzerland; The Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland; The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|