1
|
Kundu P, Beura S, Mondal S, Das AK, Ghosh A. Machine learning for the advancement of genome-scale metabolic modeling. Biotechnol Adv 2024; 74:108400. [PMID: 38944218 DOI: 10.1016/j.biotechadv.2024.108400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/13/2024] [Accepted: 06/23/2024] [Indexed: 07/01/2024]
Abstract
Constraint-based modeling (CBM) has evolved as the core systems biology tool to map the interrelations between genotype, phenotype, and external environment. The recent advancement of high-throughput experimental approaches and multi-omics strategies has generated a plethora of new and precise information from wide-ranging biological domains. On the other hand, the continuously growing field of machine learning (ML) and its specialized branch of deep learning (DL) provide essential computational architectures for decoding complex and heterogeneous biological data. In recent years, both multi-omics and ML have assisted in the escalation of CBM. Condition-specific omics data, such as transcriptomics and proteomics, helped contextualize the model prediction while analyzing a particular phenotypic signature. At the same time, the advanced ML tools have eased the model reconstruction and analysis to increase the accuracy and prediction power. However, the development of these multi-disciplinary methodological frameworks mainly occurs independently, which limits the concatenation of biological knowledge from different domains. Hence, we have reviewed the potential of integrating multi-disciplinary tools and strategies from various fields, such as synthetic biology, CBM, omics, and ML, to explore the biochemical phenomenon beyond the conventional biological dogma. How the integrative knowledge of these intersected domains has improved bioengineering and biomedical applications has also been highlighted. We categorically explained the conventional genome-scale metabolic model (GEM) reconstruction tools and their improvement strategies through ML paradigms. Further, the crucial role of ML and DL in omics data restructuring for GEM development has also been briefly discussed. Finally, the case-study-based assessment of the state-of-the-art method for improving biomedical and metabolic engineering strategies has been elaborated. Therefore, this review demonstrates how integrating experimental and in silico strategies can help map the ever-expanding knowledge of biological systems driven by condition-specific cellular information. This multiview approach will elevate the application of ML-based CBM in the biomedical and bioengineering fields for the betterment of society and the environment.
Collapse
Affiliation(s)
- Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Satyajit Beura
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Suman Mondal
- P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Kumar Das
- Department of Bioscience and Biotechnology, Indian Institute of Technology, Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
2
|
Foltz SM, Li Y, Yao L, Terekhanova NV, Weerasinghe A, Gao Q, Dong G, Schindler M, Cao S, Sun H, Jayasinghe RG, Fulton RS, Fronick CC, King J, Kohnen DR, Fiala MA, Chen K, DiPersio JF, Vij R, Ding L. Somatic mutation phasing and haplotype extension using linked-reads in multiple myeloma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607342. [PMID: 39149342 PMCID: PMC11326269 DOI: 10.1101/2024.08.09.607342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Somatic mutation phasing informs our understanding of cancer-related events, like driver mutations. We generated linked-read whole genome sequencing data for 23 samples across disease stages from 14 multiple myeloma (MM) patients and systematically assigned somatic mutations to haplotypes using linked-reads. Here, we report the reconstructed cancer haplotypes and phase blocks from several MM samples and show how phase block length can be extended by integrating samples from the same individual. We also uncover phasing information in genes frequently mutated in MM, including DIS3, HIST1H1E, KRAS, NRAS, and TP53, phasing 79.4% of 20,705 high-confidence somatic mutations. In some cases, this enabled us to interpret clonal evolution models at higher resolution using pairs of phased somatic mutations. For example, our analysis of one patient suggested that two NRAS hotspot mutations occurred on the same haplotype but were independent events in different subclones. Given sufficient tumor purity and data quality, our framework illustrates how haplotype-aware analysis of somatic mutations in cancer can be beneficial for some cancer cases.
Collapse
Affiliation(s)
- Steven M. Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Nadezhda V. Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Amila Weerasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Guanlan Dong
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Moses Schindler
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Hua Sun
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Catrina C. Fronick
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Justin King
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Daniel R. Kohnen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Mark A. Fiala
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - John F. DiPersio
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63110, USA
| |
Collapse
|
3
|
Dougan KE, Bellantuono AJ, Kahlke T, Abbriano RM, Chen Y, Shah S, Granados-Cifuentes C, van Oppen MJH, Bhattacharya D, Suggett DJ, Rodriguez-Lanetty M, Chan CX. Whole-genome duplication in an algal symbiont bolsters coral heat tolerance. SCIENCE ADVANCES 2024; 10:eadn2218. [PMID: 39028812 PMCID: PMC11259175 DOI: 10.1126/sciadv.adn2218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 06/14/2024] [Indexed: 07/21/2024]
Abstract
The algal endosymbiont Durusdinium trenchii enhances the resilience of coral reefs under thermal stress. D. trenchii can live freely or in endosymbiosis, and the analysis of genetic markers suggests that this species has undergone whole-genome duplication (WGD). However, the evolutionary mechanisms that underpin the thermotolerance of this species are largely unknown. Here, we present genome assemblies for two D. trenchii isolates, confirm WGD in these taxa, and examine how selection has shaped the duplicated genome regions using gene expression data. We assess how the free-living versus endosymbiotic lifestyles have contributed to the retention and divergence of duplicated genes, and how these processes have enhanced the thermotolerance of D. trenchii. Our combined results suggest that lifestyle is the driver of post-WGD evolution in D. trenchii, with the free-living phase being the most important, followed by endosymbiosis. Adaptations to both lifestyles likely enabled D. trenchii to provide enhanced thermal stress protection to the host coral.
Collapse
Affiliation(s)
- Katherine E. Dougan
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD 4072, Australia
- Department of Biological Sciences, Biomolecular Science Institute, Florida International University, Miami, FL 33099, USA
| | - Anthony J. Bellantuono
- Department of Biological Sciences, Biomolecular Science Institute, Florida International University, Miami, FL 33099, USA
| | - Tim Kahlke
- Climate Change Cluster, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Raffaela M. Abbriano
- Climate Change Cluster, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Yibi Chen
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Sarah Shah
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Camila Granados-Cifuentes
- Department of Biological Sciences, Biomolecular Science Institute, Florida International University, Miami, FL 33099, USA
| | - Madeleine J. H. van Oppen
- School of Biosciences, The University of Melbourne, Parkville, VIC 3010, Australia
- Australian Institute of Marine Science, Townsville, QLD 4810, Australia
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | - David J. Suggett
- Climate Change Cluster, University of Technology Sydney, Sydney, NSW 2007, Australia
- KAUST Reefscape Restoration Initiative (KRRI) and Red Sea Research Center (RSRC), King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Mauricio Rodriguez-Lanetty
- Department of Biological Sciences, Biomolecular Science Institute, Florida International University, Miami, FL 33099, USA
| | - Cheong Xin Chan
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
4
|
Fuentes RR, Nieuwenhuis R, Chouaref J, Hesselink T, van Dooijeweert W, van den Broeck HC, Schijlen E, Schouten HJ, Bai Y, Fransz P, Stam M, de Jong H, Trivino SD, de Ridder D, van Dijk ADJ, Peters SA. A catalogue of recombination coldspots in interspecific tomato hybrids. PLoS Genet 2024; 20:e1011336. [PMID: 38950081 PMCID: PMC11244794 DOI: 10.1371/journal.pgen.1011336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 07/12/2024] [Accepted: 06/09/2024] [Indexed: 07/03/2024] Open
Abstract
Increasing natural resistance and resilience in plants is key for ensuring food security within a changing climate. Breeders improve these traits by crossing cultivars with their wild relatives and introgressing specific alleles through meiotic recombination. However, some genomic regions are devoid of recombination especially in crosses between divergent genomes, limiting the combinations of desirable alleles. Here, we used pooled-pollen sequencing to build a map of recombinant and non-recombinant regions between tomato and five wild relatives commonly used for introgressive tomato breeding. We detected hybrid-specific recombination coldspots that underscore the role of structural variations in modifying recombination patterns and maintaining genetic linkage in interspecific crosses. Crossover regions and coldspots show strong association with specific TE superfamilies exhibiting differentially accessible chromatin between somatic and meiotic cells. About two-thirds of the genome are conserved coldspots, located mostly in the pericentromeres and enriched with retrotransposons. The coldspots also harbor genes associated with agronomic traits and stress resistance, revealing undesired consequences of linkage drag and possible barriers to breeding. We presented examples of linkage drag that can potentially be resolved by pairing tomato with other wild species. Overall, this catalogue will help breeders better understand crossover localization and make informed decisions on generating new tomato varieties.
Collapse
Affiliation(s)
- Roven Rommel Fuentes
- Bioinformatics Group, Wageningen University and Research, Wageningen, The Netherlands
- Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Ronald Nieuwenhuis
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Jihed Chouaref
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Thamara Hesselink
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Willem van Dooijeweert
- Centre for Genetic Resources, Wageningen University and Research, Wageningen, The Netherlands
| | - Hetty C van den Broeck
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Elio Schijlen
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Henk J Schouten
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Yuling Bai
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Paul Fransz
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Maike Stam
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Hans de Jong
- Laboratory of Genetics, Wageningen University and Research, Wageningen, The Netherlands
| | - Sara Diaz Trivino
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Wageningen, The Netherlands
| | - Aalt D J van Dijk
- Bioinformatics Group, Wageningen University and Research, Wageningen, The Netherlands
| | - Sander A Peters
- Business Unit of Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
5
|
Stuckert AMM, Chouteau M, McClure M, LaPolice TM, Linderoth T, Nielsen R, Summers K, MacManes MD. The genomics of mimicry: Gene expression throughout development provides insights into convergent and divergent phenotypes in a Müllerian mimicry system. Mol Ecol 2024; 33:e17438. [PMID: 38923007 DOI: 10.1111/mec.17438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 04/22/2024] [Accepted: 05/24/2024] [Indexed: 06/28/2024]
Abstract
A common goal in evolutionary biology is to discern the mechanisms that produce the astounding diversity of morphologies seen across the tree of life. Aposematic species, those with a conspicuous phenotype coupled with some form of defence, are excellent models to understand the link between vivid colour pattern variations, the natural selection shaping it, and the underlying genetic mechanisms underpinning this variation. Mimicry systems in which species share a conspicuous phenotype can provide an even better model for understanding the mechanisms of colour production in aposematic species, especially if comimics have divergent evolutionary histories. Here we investigate the genetic mechanisms by which mimicry is produced in poison frogs. We assembled a 6.02-Gbp genome with a contig N50 of 310 Kbp, a scaffold N50 of 390 Kbp and 85% of expected tetrapod genes. We leveraged this genome to conduct gene expression analyses throughout development of four colour morphs of Ranitomeya imitator and two colour morphs from both R. fantastica and R. variabilis which R. imitator mimics. We identified a large number of pigmentation and patterning genes differentially expressed throughout development, many of them related to melanophores/melanin, iridophore development and guanine synthesis. We also identify the pteridine synthesis pathway (including genes such as qdpr and xdh) as a key driver of the variation in colour between morphs of these species, and identify several plausible candidates for colouration in vertebrates (e.g. cd36, ep-cadherin and perlwapin). Finally, we hypothesise that keratin genes (e.g. krt8) are important for producing different structural colours within these frogs.
Collapse
Affiliation(s)
- Adam M M Stuckert
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, USA
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, New Hampshire, USA
- Department of Biology, East Carolina University, Greenville, North Carolina, USA
| | - Mathieu Chouteau
- Laboratoire Écologie, Évolution, Interactions Des Systèmes Amazoniens (LEEISA), CNRS, IFREMER, Université de Guyane, Cayenne, France
| | - Melanie McClure
- Laboratoire Écologie, Évolution, Interactions Des Systèmes Amazoniens (LEEISA), CNRS, IFREMER, Université de Guyane, Cayenne, France
| | - Troy M LaPolice
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, New Hampshire, USA
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Tyler Linderoth
- Department of Integrative Biology, University of California, Berkeley, California, USA
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, California, USA
| | - Kyle Summers
- Department of Biology, East Carolina University, Greenville, North Carolina, USA
| | - Matthew D MacManes
- Department of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, New Hampshire, USA
| |
Collapse
|
6
|
Margalit S, Tulpová Z, Detinis Zur T, Michaeli Y, Deek J, Nifker G, Haldar R, Gnatek Y, Omer D, Dekel B, Feldman HB, Grunwald A, Ebenstein Y. Long-Read Structural and Epigenetic Profiling of a Kidney Tumor-Matched Sample with Nanopore Sequencing and Optical Genome Mapping. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.31.587463. [PMID: 38915648 PMCID: PMC11195078 DOI: 10.1101/2024.03.31.587463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Carcinogenesis often involves significant alterations in the cancer genome architecture, marked by large structural and copy number variations (SVs and CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping and nanopore sequencing are attractive technologies that bridge this resolution gap and offer enhanced performance for cytogenetic applications. These methods profile native, individual DNA molecules, thus capturing epigenetic information. We applied both techniques to characterize a clear cell renal cell carcinoma (ccRCC) tumor's structural and copy number landscape, highlighting the relative strengths of each method in the context of variant size and average read length. Additionally, we assessed their utility for methylome and hydroxymethylome profiling, emphasizing differences in epigenetic analysis applicability.
Collapse
Affiliation(s)
- Sapir Margalit
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Zuzana Tulpová
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
- Institute of Experimental Botany of the Czech Academy of Sciences, Olomouc, Czech Republic
| | - Tahir Detinis Zur
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Yael Michaeli
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Jasline Deek
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Gil Nifker
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Rita Haldar
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Yehudit Gnatek
- Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
| | - Dorit Omer
- Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
| | - Benjamin Dekel
- Pediatric Stem Cell Research Institute, Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
- Pediatric Nephrology Unit, The Edmond and Lily Safra Children’s Hospital, Sheba Medical Center, 52621 Ramat Gan, Israel
- School of Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Hagit Baris Feldman
- School of Medicine, Faculty of Medical and Health Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- The Genetics Institute and Genomics Center, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Assaf Grunwald
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| | - Yuval Ebenstein
- Department of Chemistry, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, 6997801 Tel Aviv, Israel
- Department of Biomedical Engineering, Tel Aviv University, 6997801 Tel Aviv, Israel
| |
Collapse
|
7
|
Choi SS, Mc Cartney A, Park D, Roberts H, Brav-Cubitt T, Mitchell C, Buckley TR. Multiple hybridization events and repeated evolution of homoeologue expression bias in parthenogenetic, polyploid New Zealand stick insects. Mol Ecol 2024:e17422. [PMID: 38842022 DOI: 10.1111/mec.17422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 03/03/2024] [Accepted: 04/17/2024] [Indexed: 06/07/2024]
Abstract
During hybrid speciation, homoeologues combine in a single genome. Homoeologue expression bias (HEB) occurs when one homoeologue has higher gene expression than another. HEB has been well characterized in plants but rarely investigated in animals, especially invertebrates. Consequently, we have little idea as to the role that HEB plays in allopolyploid invertebrate genomes. If HEB is constrained by features of the parental genomes, then we predict repeated evolution of similar HEB patterns among hybrid genomes formed from the same parental lineages. To address this, we reconstructed the history of hybridization between the New Zealand stick insect genera Acanthoxyla and Clitarchus using a high-quality genome assembly from Clitarchus hookeri to call variants and phase alleles. These analyses revealed the formation of three independent diploid and triploid hybrid lineages between these genera. RNA sequencing revealed a similar magnitude and direction of HEB among these hybrid lineages, and we observed that many enriched functions and pathways were also shared among lineages, consistent with repeated evolution due to parental genome constraints. In most hybrid lineages, a slight majority of the genes involved in mitochondrial function showed HEB towards the maternal homoeologues, consistent with only weak effects of mitonuclear incompatibility. We also observed a proteasome functional enrichment in most lineages and hypothesize this may result from the need to maintain proteostasis in hybrid genomes. Reference bias was a pervasive problem, and we caution against relying on HEB estimates from a single parental reference genome.
Collapse
Affiliation(s)
- Seung-Sub Choi
- Manaaki Whenua - Landcare Research, Auckland, New Zealand
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Ann Mc Cartney
- Manaaki Whenua - Landcare Research, Auckland, New Zealand
| | - Duckchul Park
- Manaaki Whenua - Landcare Research, Auckland, New Zealand
| | - Hester Roberts
- Manaaki Whenua - Landcare Research, Auckland, New Zealand
| | | | | | | |
Collapse
|
8
|
Zhang Z, Xiao J, Wang H, Yang C, Huang Y, Yue Z, Chen Y, Han L, Yin K, Lyu A, Fang X, Zhang L. Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity. Nat Commun 2024; 15:4631. [PMID: 38821971 PMCID: PMC11143213 DOI: 10.1038/s41467-024-49060-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 05/17/2024] [Indexed: 06/02/2024] Open
Abstract
Although long-read sequencing enables the generation of complete genomes for unculturable microbes, its high cost limits the widespread adoption of long-read sequencing in large-scale metagenomic studies. An alternative method is to assemble short-reads with long-range connectivity, which can be a cost-effective way to generate high-quality microbial genomes. Here, we develop Pangaea, a bioinformatic approach designed to enhance metagenome assembly using short-reads with long-range connectivity. Pangaea leverages connectivity derived from physical barcodes of linked-reads or virtual barcodes by aligning short-reads to long-reads. Pangaea utilizes a deep learning-based read binning algorithm to assemble co-barcoded reads exhibiting similar sequence contexts and abundances, thereby improving the assembly of high- and medium-abundance microbial genomes. Pangaea also leverages a multi-thresholding algorithm strategy to refine assembly for low-abundance microbes. We benchmark Pangaea on linked-reads and a combination of short- and long-reads from simulation data, mock communities and human gut metagenomes. Pangaea achieves significantly higher contig continuity as well as more near-complete metagenome-assembled genomes (NCMAGs) than the existing assemblers. Pangaea also generates three complete and circular NCMAGs on the human gut microbiomes.
Collapse
Grants
- This research was partially supported by the Young Collaborative Research Grant (C2004-23Y, L.Z.), HMRF (11221026, L.Z.), the open project of BGI-Shenzhen, Shenzhen 518000, China (BGIRSZ20220012, L.Z.), the Hong Kong Research Grant Council Early Career Scheme (HKBU 22201419, L.Z.), HKBU Start-up Grant Tier 2 (RC-SGT2/19-20/SCI/007, L.Z.), HKBU IRCMS (No. IRCMS/19-20/D02, L.Z.).
- This research was partially supported by the open project of BGI-Shenzhen, Shenzhen 518000, China (BGIRSZ20220014, KJ.Y.).
- The study were partially supported by the Science Technology and Innovation Committee of Shenzhen Municipality, China (SGDX20190919142801722, XD.F.),
Collapse
Affiliation(s)
- Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Jin Xiao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Hongbo Wang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | - Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| | | | - Zhen Yue
- BGI Research, Sanya, 572025, China
| | - Yang Chen
- State Key Laboratory of Dampness Syndrome of Chinese Medicine, The Second Affiliated Hospital of Guangzhou University of Chinese, Guangzhou, China
| | - Lijuan Han
- Department of Scientific Research, Kangmeihuada GeneTech Co., Ltd (KMHD), Shenzhen, China
| | - Kejing Yin
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| | - Xiaodong Fang
- BGI Research, Shenzhen, 518083, China
- BGI Research, Sanya, 572025, China
- Department of Scientific Research, Kangmeihuada GeneTech Co., Ltd (KMHD), Shenzhen, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China.
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China.
| |
Collapse
|
9
|
Chen M, Tan MH, Liu J, Yang YM, Yu JL, He LJ, Huang YZ, Sun YX, Qian YQ, Yan K, Dong MY. An efficient molecular genetic testing strategy for incontinentia pigmenti based on single-tube long fragment read sequencing. NPJ Genom Med 2024; 9:32. [PMID: 38811629 PMCID: PMC11137062 DOI: 10.1038/s41525-024-00421-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 05/18/2024] [Indexed: 05/31/2024] Open
Abstract
Incontinentia pigmenti (IP) is a rare X-linked dominant neuroectodermal dysplasia that primarily affects females. The only known causative gene is IKBKG, and the most common genetic cause is the recurrent IKBKG△4-10 deletion resulting from recombination between two MER67B repeats. Detection of variants in IKBKG is challenging due to the presence of a highly homologous non-pathogenic pseudogene IKBKGP1. In this study, we successfully identified four pathogenic variants in four IP patients using a strategy based on single-tube long fragment read (stLFR) sequencing with a specialized analysis pipeline. Three frameshift variants (c.519-3_519dupCAGG, c.1167dupC, and c.700dupT) were identified and subsequently validated by Sanger sequencing. Notably, c.519-3_519dupCAGG was found in both IKBKG and IKBKGP1, whereas the other two variants were only detected in the functional gene. The IKBKG△4-10 deletion was identified and confirmed in one patient. These results demonstrate that the proposed strategy can identify potential pathogenic variants and distinguish whether they are derived from IKBKG or its pseudogene. Thus, this strategy can be an efficient genetic testing method for IKBKG. By providing a comprehensive understanding of the whole genome, it may also enable the exploration of other genes potentially associated with IP. Furthermore, the strategy may also provide insights into other diseases with detection challenges due to pseudogenes.
Collapse
Affiliation(s)
- Min Chen
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Mei-Hua Tan
- BGI Genomics, Shenzhen, Guangdong, 518083, P. R. China
| | - Jiao Liu
- Lishui Maternity and Child Health Care Hospital, Lishui, Zhejiang, 323000, P. R. China
| | - Yan-Mei Yang
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Jia-Ling Yu
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Li-Juan He
- BGI Genomics, Shenzhen, Guangdong, 518083, P. R. China
| | - Ying-Zhi Huang
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Yi-Xi Sun
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Ye-Qing Qian
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Kai Yan
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China
| | - Min-Yue Dong
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, Zhejiang, 310006, P. R. China.
- Key Laboratory of Reproductive Genetics (Zhejiang University), Ministry of Education, Hangzhou, Zhejiang, 310006, P. R. China.
- Key Laboratory of Women's Reproductive Health of Zhejiang Province, Hangzhou, Zhejiang, 310006, P. R. China.
| |
Collapse
|
10
|
Schrauwen I, Rajendran Y, Acharya A, Öhman S, Arvio M, Paetau R, Siren A, Avela K, Granvik J, Leal SM, Määttä T, Kokkonen H, Järvelä I. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci Rep 2024; 14:11239. [PMID: 38755281 PMCID: PMC11099145 DOI: 10.1038/s41598-024-62009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
While short-read sequencing currently dominates genetic research and diagnostics, it frequently falls short of capturing certain structural variants (SVs), which are often implicated in the etiology of neurodevelopmental disorders (NDDs). Optical genome mapping (OGM) is an innovative technique capable of capturing SVs that are undetectable or challenging-to-detect via short-read methods. This study aimed to investigate NDDs using OGM, specifically focusing on cases that remained unsolved after standard exome sequencing. OGM was performed in 47 families using ultra-high molecular weight DNA. Single-molecule maps were assembled de novo, followed by SV and copy number variant calling. We identified 7 variants of interest, of which 5 (10.6%) were classified as likely pathogenic or pathogenic, located in BCL11A, OPHN1, PHF8, SON, and NFIA. We also identified an inversion disrupting NAALADL2, a gene which previously was found to harbor complex rearrangements in two NDD cases. Variants in known NDD genes or candidate variants of interest missed by exome sequencing mainly consisted of larger insertions (> 1kbp), inversions, and deletions/duplications of a low number of exons (1-4 exons). In conclusion, in addition to improving molecular diagnosis in NDDs, this technique may also reveal novel NDD genes which may harbor complex SVs often missed by standard sequencing techniques.
Collapse
Affiliation(s)
- Isabelle Schrauwen
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA.
| | - Yasmin Rajendran
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | - Anushree Acharya
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | | | - Maria Arvio
- Päijät-Häme Wellbeing Services, Neurology, Lahti, Finland
| | - Ritva Paetau
- Department of Child Neurology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Auli Siren
- Kanta-Häme Central Hospital, Hämeenlinna, Finland
| | - Kristiina Avela
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Johanna Granvik
- The Wellbeing Services County of Ostrobothnia, Kokkola, Finland
| | - Suzanne M Leal
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY, USA
| | - Tuomo Määttä
- The Wellbeing Services County of Kainuu, Kajaani, Finland
| | - Hannaleena Kokkonen
- Northern Finland Laboratory Centre NordLab and Medical Research Centre, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
11
|
Pfeifer SP, Baxter A, Savidge LE, Sedlazeck FJ, Bales KL. De Novo Genome Assembly for the Coppery Titi Monkey (Plecturocebus cupreus): An Emerging Nonhuman Primate Model for Behavioral Research. Genome Biol Evol 2024; 16:evae108. [PMID: 38758096 PMCID: PMC11140417 DOI: 10.1093/gbe/evae108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/07/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024] Open
Abstract
The coppery titi monkey (Plecturocebus cupreus) is an emerging nonhuman primate model system for behavioral and neurobiological research. At the same time, the almost entire absence of genomic resources for the species has hampered insights into the genetic underpinnings of the phenotypic traits of interest. To facilitate future genotype-to-phenotype studies, we here present a high-quality, fully annotated de novo genome assembly for the species with chromosome-length scaffolds spanning the autosomes and chromosome X (scaffold N50 = 130.8 Mb), constructed using data obtained from several orthologous short- and long-read sequencing and scaffolding techniques. With a base-level accuracy of ∼99.99% in chromosome-length scaffolds as well as benchmarking universal single-copy ortholog and k-mer completeness scores of >99.0% and 95.1% at the genome level, this assembly represents one of the most complete Pitheciidae genomes to date, making it an invaluable resource for comparative evolutionary genomics research to improve our understanding of lineage-specific changes underlying adaptive traits as well as deleterious mutations associated with disease.
Collapse
Affiliation(s)
- Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Alexander Baxter
- Department of Psychology, University of California, Davis, CA, USA
- California National Primate Research Center, Neuroscience and Behavior Division, Davis, CA, USA
| | - Logan E Savidge
- Department of Psychology, University of California, Davis, CA, USA
- California National Primate Research Center, Neuroscience and Behavior Division, Davis, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Karen L Bales
- Department of Psychology, University of California, Davis, CA, USA
- California National Primate Research Center, Neuroscience and Behavior Division, Davis, CA, USA
- Department of Neurobiology, Physiology, and Behavior, University of California, Davis, CA, USA
| |
Collapse
|
12
|
Shearman JR, Naktang C, Sonthirod C, Kongkachana W, U-Thoomporn S, Jomchai N, Maknual C, Yamprasai S, Wanthongchai P, Pootakham W, Tangphatsornruang S. De novo assembly and analysis of Sonneratia ovata genome and population analysis. Genomics 2024; 116:110837. [PMID: 38548034 DOI: 10.1016/j.ygeno.2024.110837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/22/2024] [Accepted: 03/24/2024] [Indexed: 04/01/2024]
Abstract
Mangroves are an important part of coastal and estuarine ecosystems where they serve as nurseries for marine species and prevent coastal erosion. Here we report the genome of Sonneratia ovata, which is a true mangrove that grows in estuarine environments and can tolerate moderate salt exposure. We sequenced the S. ovata genome and assembled it into chromosome-level scaffolds through the use of Hi-C. The genome is 212.3 Mb and contains 12 chromosomes that range in size from 12.2 to 23.2 Mb. Annotation identified 29,829 genes with a BUSCO completeness of 95.9%. We identified salt genes and found copy number expansion of salt genes such as ADP-ribosylation factor 1, and elongation factor 1-alpha. Population analysis identified a low level of genetic variation and a lack of population structure within S. ovata.
Collapse
Affiliation(s)
- Jeremy R Shearman
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Chaiwat Naktang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Chutima Sonthirod
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Wasitthee Kongkachana
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Sonicha U-Thoomporn
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Nukoon Jomchai
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Chatree Maknual
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Suchart Yamprasai
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Poonsri Wanthongchai
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Wirulda Pootakham
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand
| | - Sithichoke Tangphatsornruang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand.
| |
Collapse
|
13
|
Villani F, Guarracino A, Ward RR, Green T, Emms M, Pravenec M, Prins P, Garrison E, Williams RW, Chen H, Colonna V. Pangenome reconstruction in rats enhances genotype-phenotype mapping and novel variant discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.10.575041. [PMID: 38260597 PMCID: PMC10802574 DOI: 10.1101/2024.01.10.575041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The HXB/BXH family of recombinant inbred rat strains is a unique genetic resource that has been extensively phenotyped over 25 years, resulting in a vast dataset of quantitative molecular and physiological phenotypes. We built a pangenome graph from 10x Genomics Linked-Read data for 31 recombinant inbred rats to study genetic variation and association mapping. The pangenome includes 0.2Gb of sequence that is not present the reference mRatBN7.2, confirming the capture of substantial additional variation. We validated variants in challenging regions, including complex structural variants resolving into multiple haplotypes. Phenome-wide association analysis of validated SNPs uncovered variants associated with glucose/insulin levels and hippocampal gene expression. We propose an interaction between Pirl1l1, chromogranin expression, TNF-α levels, and insulin regulation. This study demonstrates the utility of linked-read pangenomes for comprehensive variant detection and mapping phenotypic diversity in a widely used rat genetic reference panel.
Collapse
Affiliation(s)
- Flavia Villani
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Rachel R Ward
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center
| | - Tomomi Green
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center
| | - Madeleine Emms
- Institute of Genetics and Biophysics, National Research Council, Naples, 80111, Italy
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, 14200 Prague, Czech Republic
| | - Pjotr Prins
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Erik Garrison
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert W. Williams
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center
| | - Vincenza Colonna
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Institute of Genetics and Biophysics, National Research Council, Naples, 80111, Italy
| |
Collapse
|
14
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Paasch A, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted phasing of 2-200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method. Sci Rep 2024; 14:7988. [PMID: 38580715 PMCID: PMC10997766 DOI: 10.1038/s41598-024-58733-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/02/2024] [Indexed: 04/07/2024] Open
Abstract
In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2-200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2-200 kb targets using a short-read sequencer.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Peter L Chang
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | | | - Amber Paasch
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA, 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA, 92011, USA.
| |
Collapse
|
15
|
Bhattarai UR, Poulin R, Gemmell NJ, Dowle E. Genome assembly and annotation of the mermithid nematode Mermis nigrescens. G3 (BETHESDA, MD.) 2024; 14:jkae023. [PMID: 38301266 PMCID: PMC10989877 DOI: 10.1093/g3journal/jkae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/21/2024] [Accepted: 01/22/2024] [Indexed: 02/03/2024]
Abstract
Genetic studies of nematodes have been dominated by Caenorhabditis elegans as a model species. A lack of genomic resources has limited the expansion of genetic research to other groups of nematodes. Here, we report a draft genome assembly of a mermithid nematode, Mermis nigrescens. Mermithidae are insect parasitic nematodes with hosts including a wide range of terrestrial arthropods. We sequenced, assembled, and annotated the whole genome of M. nigrescens using nanopore long reads and 10X Chromium link reads. The assembly is 524 Mb in size consisting of 867 scaffolds. The N50 value is 2.42 Mb, and half of the assembly is in the 30 longest scaffolds. The assembly BUSCO score from the eukaryotic database (eukaryota_odb10) indicates that the genome is 86.7% complete and 5.1% partial. The genome has a high level of heterozygosity (6.6%) with a repeat content of 83.98%. mRNA-seq reads from different sized nematodes (≤2 cm, 3.5-7 cm, and >7 cm body length) representing different developmental stages were also generated and used for the genome annotation. Using ab initio and evidence-based gene model predictions, 12,313 protein-coding genes and 24,186 mRNAs were annotated. These genomic resources will help researchers investigate the various aspects of the biology and host-parasite interactions of mermithid nematodes.
Collapse
Affiliation(s)
- Upendra R Bhattarai
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
- Department of Organismic & Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Robert Poulin
- Department of Zoology, University of Otago, Dunedin 9016, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
| | - Eddy Dowle
- Department of Anatomy, University of Otago, Dunedin 9016, New Zealand
| |
Collapse
|
16
|
Mostafa S, Rafizadeh R, Polasek TM, Bousman CA, Rostami‐Hodjegan A, Stowe R, Carrion P, Sheffield LJ, Kirkpatrick CMJ. Virtual twins for model-informed precision dosing of clozapine in patients with treatment-resistant schizophrenia. CPT Pharmacometrics Syst Pharmacol 2024; 13:424-436. [PMID: 38243630 PMCID: PMC10941576 DOI: 10.1002/psp4.13093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/14/2023] [Accepted: 11/02/2023] [Indexed: 01/21/2024] Open
Abstract
Model-informed precision dosing using virtual twins (MIPD-VTs) is an emerging strategy to predict target drug concentrations in clinical practice. Using a high virtualization MIPD-VT approach (Simcyp version 21), we predicted the steady-state clozapine concentration and clozapine dosage range to achieve a target concentration of 350 to 600 ng/mL in hospitalized patients with treatment-resistant schizophrenia (N = 11). We confirmed that high virtualization MIPD-VT can reasonably predict clozapine concentrations in individual patients with a coefficient of determination (R2 ) ranging between 0.29 and 0.60. Importantly, our approach predicted the final dosage range to achieve the desired target clozapine concentrations in 73% of patients. In two thirds of patients treated with fluvoxamine augmentation, steady-state clozapine concentrations were overpredicted two to four-fold. This work supports the application of a high virtualization MIPD-VT approach to inform the titration of clozapine doses in clinical practice. However, refinement is required to improve the prediction of pharmacokinetic drug-drug interactions, particularly with fluvoxamine augmentation.
Collapse
Affiliation(s)
- Sam Mostafa
- Centre for Medicine Use and SafetyMonash UniversityParkvilleVictoriaAustralia
- MyDNA Life Australia LimitedVictoriaAustralia
| | - Reza Rafizadeh
- BC Mental Health and Substance Use Services, BC Psychosis ProgramLower Mainland Pharmacy ServicesVancouverBritish ColumbiaCanada
| | - Thomas M. Polasek
- Centre for Medicine Use and SafetyMonash UniversityParkvilleVictoriaAustralia
- CertaraPrincetonNew JerseyUSA
- Department of Clinical PharmacologyRoyal Adelaide HospitalAdelaideSouth AustraliaAustralia
| | - Chad A. Bousman
- Department of Psychiatry, Melbourne Neuropsychiatry CentreUniversity of Melbourne and Melbourne HealthMelbourneVictoriaAustralia
- Alberta Children's Hospital Research Institute, Cumming School of MedicineUniversity of CalgaryCalgaryAlbertaCanada
- Hotchkiss Brain Institute, Cumming School of MedicineUniversity of CalgaryCalgaryAlbertaCanada
- Departments of Medical Genetics, Psychiatry, Physiology and Pharmacology, and Community Health SciencesUniversity of CalgaryCalgaryAlbertaCanada
| | - Amin Rostami‐Hodjegan
- Centre for Applied Pharmacokinetic Research (CAPKR), School of Health SciencesUniversity of ManchesterManchesterUK
- Simcyp DivisionCertara UK LimitedSheffieldUK
| | - Robert Stowe
- Department of PsychiatryUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Djavid Mowafaghian Centre for Brain HealthUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Department of Neurology (Medicine)University of British ColumbiaVancouverBritish ColumbiaCanada
| | - Prescilla Carrion
- Department of PsychiatryUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | | | | |
Collapse
|
17
|
Orteu A, Kucka M, Gordon IJ, Ng’iru I, van der Heijden ESM, Talavera G, Warren IA, Collins S, ffrench-Constant RH, Martins DJ, Chan YF, Jiggins CD, Martin SH. Transposable Element Insertions Are Associated with Batesian Mimicry in the Pantropical Butterfly Hypolimnas misippus. Mol Biol Evol 2024; 41:msae041. [PMID: 38401262 PMCID: PMC10924252 DOI: 10.1093/molbev/msae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 02/14/2024] [Accepted: 02/16/2024] [Indexed: 02/26/2024] Open
Abstract
Hypolimnas misippus is a Batesian mimic of the toxic African Queen butterfly (Danaus chrysippus). Female H. misippus butterflies use two major wing patterning loci (M and A) to imitate three color morphs of D. chrysippus found in different regions of Africa. In this study, we examine the evolution of the M locus and identify it as an example of adaptive atavism. This phenomenon involves a morphological reversion to an ancestral character that results in an adaptive phenotype. We show that H. misippus has re-evolved an ancestral wing pattern present in other Hypolimnas species, repurposing it for Batesian mimicry of a D. chrysippus morph. Using haplotagging, a linked-read sequencing technology, and our new analytical tool, Wrath, we discover two large transposable element insertions located at the M locus and establish that these insertions are present in the dominant allele responsible for producing mimetic phenotype. By conducting a comparative analysis involving additional Hypolimnas species, we demonstrate that the dominant allele is derived. This suggests that, in the derived allele, the transposable elements disrupt a cis-regulatory element, leading to the reversion to an ancestral phenotype that is then utilized for Batesian mimicry of a distinct model, a different morph of D. chrysippus. Our findings present a compelling instance of convergent evolution and adaptive atavism, in which the same pattern element has independently evolved multiple times in Hypolimnas butterflies, repeatedly playing a role in Batesian mimicry of diverse model species.
Collapse
Affiliation(s)
- Anna Orteu
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, UK
| | - Marek Kucka
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Ian J Gordon
- Centre of Excellence in Biodiversity, University of Rwanda, Huye, Rwanda
| | - Ivy Ng’iru
- Mpala Research Centre, Nanyuki 10400, Laikipia, Kenya
- School of Biosciences, Cardiff University, Cardiff CF 10 3AX, UK
- UK Centre for Ecology and Hydrology, Wallingford OX10 8BB, UK
| | - Eva S M van der Heijden
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
- Tree of Life Programme, Wellcome Sanger Institute, Hinxton, UK
| | - Gerard Talavera
- Institut Botànic de Barcelona (IBB), CSIC-CMCNB, Barcelona, Catalonia, Spain
| | - Ian A Warren
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Steve Collins
- African Butterfly Research Institute, Nairobi, Kenya
| | | | - Dino J Martins
- Turkana Basin Institute, Stony Brook University, Stony Brook, NY 11794, USA
| | | | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Simon H Martin
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
18
|
Hall HN, Parry D, Halachev M, Williamson KA, Donnelly K, Campos Parada J, Bhatia S, Joseph J, Holden S, Prescott TE, Bitoun P, Kirk EP, Newbury-Ecob R, Lachlan K, Bernar J, van Heyningen V, FitzPatrick DR, Meynert A. Short-read whole genome sequencing identifies causative variants in most individuals with previously unexplained aniridia. J Med Genet 2024; 61:250-261. [PMID: 38050128 PMCID: PMC7615962 DOI: 10.1136/jmg-2023-109181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 09/25/2023] [Indexed: 12/06/2023]
Abstract
BACKGROUND Classic aniridia is a highly penetrant autosomal dominant disorder characterised by congenital absence of the iris, foveal hypoplasia, optic disc anomalies and progressive opacification of the cornea. >90% of cases of classic aniridia are caused by heterozygous, loss-of-function variants affecting the PAX6 locus. METHODS Short-read whole genome sequencing was performed on 51 (39 affected) individuals from 37 different families who had screened negative for mutations in the PAX6 coding region. RESULTS Likely causative mutations were identified in 22 out of 37 (59%) families. In 19 out of 22 families, the causative genomic changes have an interpretable deleterious impact on the PAX6 locus. Of these 19 families, 1 has a novel heterozygous PAX6 frameshift variant missed on previous screens, 4 have single nucleotide variants (SNVs) (one novel) affecting essential splice sites of PAX6 5' non-coding exons and 2 have deep intronic SNV (one novel) resulting in gain of a donor splice site. In 12 out of 19, the causative variants are large-scale structural variants; 5 have partial or whole gene deletions of PAX6, 3 have deletions encompassing critical PAX6 cis-regulatory elements, 2 have balanced inversions with disruptive breakpoints within the PAX6 locus and 2 have complex rearrangements disrupting PAX6. The remaining 3 of 22 families have deletions encompassing FOXC1 (a known cause of atypical aniridia). Seven of the causative variants occurred de novo and one cosegregated with familial aniridia. We were unable to establish inheritance status in the remaining probands. No plausibly causative SNVs were identified in PAX6 cis-regulatory elements. CONCLUSION Whole genome sequencing proves to be an effective diagnostic test in most individuals with previously unexplained aniridia.
Collapse
Affiliation(s)
- Hildegard Nikki Hall
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - David Parry
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
- Illumina United Kingdom, Edinburgh, UK
| | - Mihail Halachev
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - Kathleen A Williamson
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - Kevin Donnelly
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - Jose Campos Parada
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - Shipra Bhatia
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - Jeffrey Joseph
- MRC Human Genetics Unit, The University of Edinburgh, Edinburgh, UK
| | - Simon Holden
- East Anglia Regional Genetics Service, Addenbrooke's Hospital, Cambridge, UK
| | - Trine E Prescott
- Department of Medical Genetics, Telemark Hospital, Skien, Norway
| | - Pierre Bitoun
- Consultations de Génétique médicale, Service de Pédiatrie, CHU Paris-Nord, Hôpital Jean Verdier, Bondy, France
| | - Edwin P Kirk
- Centre for Clinical Genetics, Sydney Children's Hospital Randwick, Randwick, New South Wales, Australia
| | - Ruth Newbury-Ecob
- Department of Clinical Genetics, University Hospitals Bristol NHS Foundation Trust, Bristol, UK
| | - Katherine Lachlan
- University Hospital Southampton, NHS Foundation Trust Wessex Clinical Genetics Service, Southampton, UK
| | - Juan Bernar
- Department of Genetics, Hospital Ruber Internacional, Madrid, Spain
| | - Veronica van Heyningen
- MRC Human Genetics Unit, The University of Edinburgh, Edinburgh, UK
- Institute of Ophthalmology, University College London, London, UK
| | - David R FitzPatrick
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| | - Alison Meynert
- Institute of Genetics and Cancer, The University of Edinburgh MRC Human Genetics Unit, Edinburgh, UK
| |
Collapse
|
19
|
Winters NP, Wafula EK, Knollenberg BJ, Hämälä T, Timilsena PR, Perryman M, Zhang D, Sheaffer LL, Praul CA, Ralph PE, Prewitt S, Leandro-Muñoz ME, Delgadillo-Duran DA, Altman NS, Tiffin P, Maximova SN, dePamphilis CW, Marden JH, Guiltinan MJ. A combination of conserved and diverged responses underlies Theobroma cacao's defense response to Phytophthora palmivora. BMC Biol 2024; 22:38. [PMID: 38360697 PMCID: PMC10870529 DOI: 10.1186/s12915-024-01831-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 01/23/2024] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND Plants have complex and dynamic immune systems that have evolved to resist pathogens. Humans have worked to enhance these defenses in crops through breeding. However, many crops harbor only a fraction of the genetic diversity present in wild relatives. Increased utilization of diverse germplasm to search for desirable traits, such as disease resistance, is therefore a valuable step towards breeding crops that are adapted to both current and emerging threats. Here, we examine diversity of defense responses across four populations of the long-generation tree crop Theobroma cacao L., as well as four non-cacao Theobroma species, with the goal of identifying genetic elements essential for protection against the oomycete pathogen Phytophthora palmivora. RESULTS We began by creating a new, highly contiguous genome assembly for the P. palmivora-resistant genotype SCA 6 (Additional file 1: Tables S1-S5), deposited in GenBank under accessions CP139290-CP139299. We then used this high-quality assembly to combine RNA and whole-genome sequencing data to discover several genes and pathways associated with resistance. Many of these are unique, i.e., differentially regulated in only one of the four populations (diverged 40 k-900 k generations). Among the pathways shared across all populations is phenylpropanoid biosynthesis, a metabolic pathway with well-documented roles in plant defense. One gene in this pathway, caffeoyl shikimate esterase (CSE), was upregulated across all four populations following pathogen treatment, indicating its broad importance for cacao's defense response. Further experimental evidence suggests this gene hydrolyzes caffeoyl shikimate to create caffeic acid, an antimicrobial compound and known inhibitor of Phytophthora spp. CONCLUSIONS Our results indicate most expression variation associated with resistance is unique to populations. Moreover, our findings demonstrate the value of using a broad sample of evolutionarily diverged populations for revealing the genetic bases of cacao resistance to P. palmivora. This approach has promise for further revealing and harnessing valuable genetic resources in this and other long-generation plants.
Collapse
Affiliation(s)
- Noah P Winters
- IGDP Ecology, The Pennsylvania State University, 422 Huck Life Sciences Building, University Park, PA, 16803, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Eric K Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | | | - Tuomas Hämälä
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, USA
- Department of Ecology and Genetics, University of Oulu, Oulu, Finland
| | - Prakash R Timilsena
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Melanie Perryman
- Department of Plant Science, The Pennsylvania State University, University Park, PA, USA
| | - Dapeng Zhang
- Sustainable Perennial Crops Laboratory, U.S. Department of Agriculture-Agricultural Research Service, Beltsville, MD, USA
| | - Lena L Sheaffer
- Department of Plant Science, The Pennsylvania State University, University Park, PA, USA
| | - Craig A Praul
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Paula E Ralph
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Sarah Prewitt
- Department of Plant Science, The Pennsylvania State University, University Park, PA, USA
| | | | | | - Naomi S Altman
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Peter Tiffin
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, MN, USA
| | - Siela N Maximova
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Plant Science, The Pennsylvania State University, University Park, PA, USA
| | - Claude W dePamphilis
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
- IGDP Plant Biology, The Pennsylvania State University, University Park, PA, USA
| | - James H Marden
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Mark J Guiltinan
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA.
- Department of Biology, The Pennsylvania State University, University Park, PA, USA.
- IGDP Plant Biology, The Pennsylvania State University, University Park, PA, USA.
- Department of Plant Science, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
20
|
Yang C, Zhang Z, Huang Y, Xie X, Liao H, Xiao J, Veldsman WP, Yin K, Fang X, Zhang L. LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome. Gigascience 2024; 13:giae028. [PMID: 38869148 PMCID: PMC11170215 DOI: 10.1093/gigascience/giae028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/15/2024] [Accepted: 05/09/2024] [Indexed: 06/14/2024] Open
Abstract
BACKGROUND Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform. FINDINGS To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots. CONCLUSIONS LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
Collapse
Affiliation(s)
- Chao Yang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
| | - Zhenmiao Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
| | - Yufen Huang
- BGI Research, Shenzhen 518083, China
- BGI Genomics, Shenzhen 518083, China
| | | | - Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR 999077, Hong Kong
| | - Jin Xiao
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
| | - Werner Pieter Veldsman
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
| | - Kejing Yin
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
| | - Xiaodong Fang
- BGI Genomics, Shenzhen 518083, China
- BGI Research, Sanya 572025, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
- Institute for Research and Continuing Education, Hong Kong Baptist University, Hong Kong SAR 999077, Hong Kong
| |
Collapse
|
21
|
Höjer P, Frick T, Siga H, Pourbozorgi P, Aghelpasand H, Martin M, Ahmadian A. BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies. Nucleic Acids Res 2023; 51:e114. [PMID: 37941142 PMCID: PMC10711428 DOI: 10.1093/nar/gkad1010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 10/04/2023] [Accepted: 10/18/2023] [Indexed: 11/10/2023] Open
Abstract
Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.
Collapse
Affiliation(s)
- Pontus Höjer
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Tobias Frick
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Humam Siga
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Parham Pourbozorgi
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Hooman Aghelpasand
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Marcel Martin
- Stockholm University, Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Afshin Ahmadian
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| |
Collapse
|
22
|
Pérez-Umphrey AA, Settlecowski AE, Elbers JP, Williams ST, Jonsson CB, Bonisoli-Alquati A, Snider AM, Taylor SS. Genetic variants associated with hantavirus infection in a reservoir host are related to regulation of inflammation and immune surveillance. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2023; 116:105525. [PMID: 37956745 DOI: 10.1016/j.meegid.2023.105525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/14/2023] [Accepted: 11/10/2023] [Indexed: 11/15/2023]
Abstract
The immunogenetics of wildlife populations influence the epidemiology and evolutionary dynamic of the host-pathogen system. Profiling immune gene diversity present in wildlife may be especially important for those species that, while not at risk of disease or extinction themselves, are host to diseases that are a threat to humans, other wildlife, or livestock. Hantaviruses (genus: Orthohantavirus) are globally distributed zoonotic RNA viruses with pathogenic strains carried by a diverse group of rodent hosts. The marsh rice rat (Oryzomys palustris) is the reservoir host of Orthohantavirus bayoui, a hantavirus that causes fatal cases of hantavirus cardiopulmonary syndrome in humans. We performed a genome wide association study (GWAS) using the rice rat "immunome" (i.e., all exons related to the immune response) to identify genetic variants associated with infection status in wild-caught rice rats naturally infected with their endemic strain of hantavirus. First, we created an annotated reference genome using 10× Chromium Linked Reads sequencing technology. This reference genome was used to create custom baits which were then used to target enrich prepared rice rat libraries (n = 128) and isolate their immunomes prior to sequencing. Top SNPs in the association test were present in four genes (Socs5, Eprs, Mrc1, and Il1f8) which have not been previously implicated in hantavirus infections. However, these genes correspond with other loci or pathways with established importance in hantavirus susceptibility or infection tolerance in reservoir hosts: the JAK/STAT, MHC, and NFκB. These results serve as informative markers for future exploration and highlight the importance of immune pathways that repeatedly emerge across hantavirus systems. Our work aids in creating cross-species comparisons for better understanding mechanisms of genetic susceptibility and host-pathogen coevolution in hantavirus systems.
Collapse
Affiliation(s)
- Anna A Pérez-Umphrey
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA.
| | - Amie E Settlecowski
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| | - Jean P Elbers
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA; Institute of Medical Genetics, Center for Pathobiochemistry and Genetics, Medical University of Vienna, Währinger Straße 10, 1090 Vienna, Austria
| | - S Tyler Williams
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| | - Colleen B Jonsson
- Department of Microbiology, Immunology and Biochemistry, College of Medicine, University of Tennessee Health Science Center, University of Tennessee, 858 Madison Ave., Memphis, TN 38163, USA
| | - Andrea Bonisoli-Alquati
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA; Department of Biological Sciences, California State Polytechnic University-Pomona, Pomona, CA 91768, USA
| | - Allison M Snider
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| | - Sabrina S Taylor
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| |
Collapse
|
23
|
Nulsen J, Hussain N, Al-Deka A, Yap J, Uddin K, Yau C, Ahmed AA. Completing a genomic characterisation of microscopic tumour samples with copy number. BMC Bioinformatics 2023; 24:453. [PMID: 38036971 PMCID: PMC10688092 DOI: 10.1186/s12859-023-05576-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 11/21/2023] [Indexed: 12/02/2023] Open
Abstract
BACKGROUND Genomic insights in settings where tumour sample sizes are limited to just hundreds or even tens of cells hold great clinical potential, but also present significant technical challenges. We previously developed the DigiPico sequencing platform to accurately identify somatic mutations from such samples. RESULTS Here, we complete this genomic characterisation with copy number. We present a novel protocol, PicoCNV, to call allele-specific somatic copy number alterations from picogram quantities of tumour DNA. We find that PicoCNV provides exactly accurate copy number in 84% of the genome for even the smallest samples, and demonstrate its clinical potential in maintenance therapy. CONCLUSIONS PicoCNV complements our existing platform, allowing for accurate and comprehensive genomic characterisations of cancers in settings where only microscopic samples are available.
Collapse
Affiliation(s)
- Joel Nulsen
- Weatherall Institute for Molecular Medicine, University of Oxford, Oxford, UK
- Nuffield Department for Women's and Reproductive Health, University of Oxford, Oxford, UK
- Singula Bio Ltd., Oxford, UK
| | - Nosheen Hussain
- Weatherall Institute for Molecular Medicine, University of Oxford, Oxford, UK
- Nuffield Department for Women's and Reproductive Health, University of Oxford, Oxford, UK
- Singula Bio Ltd., Oxford, UK
| | - Aws Al-Deka
- Weatherall Institute for Molecular Medicine, University of Oxford, Oxford, UK
- Nuffield Department for Women's and Reproductive Health, University of Oxford, Oxford, UK
- Singula Bio Ltd., Oxford, UK
| | - Jason Yap
- University of Birmingham, Birmingham, UK
| | | | - Christopher Yau
- Nuffield Department for Women's and Reproductive Health, University of Oxford, Oxford, UK
- Health Data Research UK, London, UK
| | - Ahmed Ashour Ahmed
- Weatherall Institute for Molecular Medicine, University of Oxford, Oxford, UK.
- Nuffield Department for Women's and Reproductive Health, University of Oxford, Oxford, UK.
- Singula Bio Ltd., Oxford, UK.
- Oxford Biomedical Research Centre, National Institute of Health Research, Oxford, UK.
| |
Collapse
|
24
|
Lin D, Zou Y, Li X, Wang J, Xiao Q, Gao X, Lin F, Zhang N, Jiao M, Guo Y, Teng Z, Li S, Wei Y, Zhou F, Yin R, Zhang S, Xing L, Xu W, Wu X, Yang B, Xiao K, Wu C, Tao Y, Yang X, Zhang J, Hu S, Dong S, Li X, Ye S, Hong Z, Pan Y, Yang Y, Sun H, Cao G. MGA-seq: robust identification of extrachromosomal DNA and genetic variants using multiple genetic abnormality sequencing. Genome Biol 2023; 24:247. [PMID: 37904244 PMCID: PMC10614391 DOI: 10.1186/s13059-023-03081-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 10/04/2023] [Indexed: 11/01/2023] Open
Abstract
Genomic abnormalities are strongly associated with cancer and infertility. In this study, we develop a simple and efficient method - multiple genetic abnormality sequencing (MGA-Seq) - to simultaneously detect structural variation, copy number variation, single-nucleotide polymorphism, homogeneously staining regions, and extrachromosomal DNA (ecDNA) from a single tube. MGA-Seq directly sequences proximity-ligated genomic fragments, yielding a dataset with concurrent genome three-dimensional and whole-genome sequencing information, enabling approximate localization of genomic structural variations and facilitating breakpoint identification. Additionally, by utilizing MGA-Seq, we map focal amplification and oncogene coamplification, thus facilitating the exploration of ecDNA's transcriptional regulatory function.
Collapse
Affiliation(s)
- Da Lin
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Yanyan Zou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Xinyu Li
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jinyue Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Qin Xiao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xiaochen Gao
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fei Lin
- Reproductive Medical Center, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Ningyuan Zhang
- Reproductive Medical Center, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Ming Jiao
- Department of Laboratory Animal Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yu Guo
- Department of Laboratory Animal Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhaowei Teng
- The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, China
| | - Shiyi Li
- Baylor College of Medicine, Houston, TX, USA
- Department of Radiation & Medical Oncology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yongchang Wei
- Department of Radiation & Medical Oncology, Zhongnan Hospital of Wuhan University, Wuhan, China
- Hubei Key Laboratory of Tumor Biological Behaviors, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Fuling Zhou
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Rong Yin
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Siheng Zhang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Lingyu Xing
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Weize Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xiaofeng Wu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Bing Yang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Ke Xiao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Chengchao Wu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Yingfeng Tao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xiaoqing Yang
- Hospital of Huazhong Agricultural University, Wuhan, China
| | - Jing Zhang
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Sheng Hu
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shuang Dong
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoyu Li
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shengwei Ye
- Department of Gastrointestinal Surgery, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhidan Hong
- Dapartment of Reproductive Medicine Center, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yihang Pan
- Precision Medicine Center, Scientific Research Center, School of Medicine, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, China
| | - Yuqin Yang
- Department of Laboratory Animal Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Haixiang Sun
- Reproductive Medical Center, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China.
| | - Gang Cao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China.
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China.
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China.
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
25
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
26
|
Li C, Chen L, Pan G, Zhang W, Li SC. Deciphering complex breakage-fusion-bridge genome rearrangements with Ambigram. Nat Commun 2023; 14:5528. [PMID: 37684230 PMCID: PMC10491683 DOI: 10.1038/s41467-023-41259-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open
Abstract
Breakage-fusion-bridge (BFB) is a complex rearrangement that leads to tumor malignancy. Existing models for detecting BFBs rely on the ideal BFB hypothesis, ruling out the possibility of BFBs entangled with other structural variations, that is, complex BFBs. We propose an algorithm Ambigram to identify complex BFB and reconstruct the rearranged structure of the local genome during the cancer subclone evolution process. Ambigram handles data from short, linked, long, and single-cell sequences, and optical mapping technologies. Ambigram successfully deciphers the gold- or silver-standard complex BFBs against the state-of-the-art in multiple cancers. Ambigram dissects the intratumor heterogeneity of complex BFB events with single-cell reads from melanoma and gastric cancer. Furthermore, applying Ambigram to liver and cervical cancer data suggests that the BFB mechanism may mediate oncovirus integrations. BFB also exists in noncancer genomics. Investigating the complete human genome reference with Ambigram suggests that the BFB mechanism may be involved in two genome reorganizations of Homo Sapiens during evolution. Moreover, Ambigram discovers the signals of recurrent foldback inversions and complex BFBs in whole genome data from the 1000 genome project, and congenital heart diseases, respectively.
Collapse
Affiliation(s)
- Chaohui Li
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Lingxi Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Guangze Pan
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Wenqian Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
27
|
Thomas GWC, Hughes JJ, Kumon T, Berv JS, Nordgren CE, Lampson M, Levine M, Searle JB, Good JM. The genomic landscape, causes, and consequences of extensive phylogenomic discordance in Old World mice and rats. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.28.555178. [PMID: 37693498 PMCID: PMC10491188 DOI: 10.1101/2023.08.28.555178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
A species tree is a central concept in evolutionary biology whereby a single branching phylogeny reflects relationships among species. However, the phylogenies of different genomic regions often differ from the species tree. Although tree discordance is often widespread in phylogenomic studies, we still lack a clear understanding of how variation in phylogenetic patterns is shaped by genome biology or the extent to which discordance may compromise comparative studies. We characterized patterns of phylogenomic discordance across the murine rodents (Old World mice and rats) - a large and ecologically diverse group that gave rise to the mouse and rat model systems. Combining new linked-read genome assemblies for seven murine species with eleven published rodent genomes, we first used ultra-conserved elements (UCEs) to infer a robust species tree. We then used whole genomes to examine finer-scale patterns of discordance and found that phylogenies built from proximate chromosomal regions had similar phylogenies. However, there was no relationship between tree similarity and local recombination rates in house mice, suggesting that genetic linkage influences phylogenetic patterns over deeper timescales. This signal may be independent of contemporary recombination landscapes. We also detected a strong influence of linked selection whereby purifying selection at UCEs led to less discordance, while genes experiencing positive selection showed more discordant and variable phylogenetic signals. Finally, we show that assuming a single species tree can result in high error rates when testing for positive selection under different models. Collectively, our results highlight the complex relationship between phylogenetic inference and genome biology and underscore how failure to account for this complexity can mislead comparative genomic studies.
Collapse
Affiliation(s)
- Gregg W. C. Thomas
- Division of Biological Sciences, University of Montana, Missoula, MT, 59801
- Informatics Group, Harvard University, Cambridge, MA, 02138
| | - Jonathan J. Hughes
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853
- Department of Evolution, Ecology, and Organismal Biology, University of California Riverside, Riverside, CA, 92521
| | - Tomohiro Kumon
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Jacob S. Berv
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109
| | - C. Erik Nordgren
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Michael Lampson
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Mia Levine
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104
| | - Jeremy B. Searle
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853
| | - Jeffrey M. Good
- Division of Biological Sciences, University of Montana, Missoula, MT, 59801
| |
Collapse
|
28
|
Mak L, Meleshko D, Danko DC, Barakzai WN, Maharjan S, Belchikov N, Hajirasouliha I. Ariadne: synthetic long read deconvolution using assembly graphs. Genome Biol 2023; 24:197. [PMID: 37641111 PMCID: PMC10463629 DOI: 10.1186/s13059-023-03033-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 08/07/2023] [Indexed: 08/31/2023] Open
Abstract
Synthetic long read sequencing techniques such as UST's TELL-Seq and Loop Genomics' LoopSeq combine 3[Formula: see text] barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3[Formula: see text] unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.
Collapse
Affiliation(s)
- Lauren Mak
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - Dmitry Meleshko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - David C. Danko
- Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine of Cornell University, New York, USA
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | | | - Salil Maharjan
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
| | - Natan Belchikov
- Physiology, Biophysics & Systems Biology Program, Weill Cornell Medicine of Cornell University, New York, USA
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University, New York, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine of Cornell University, New York, USA
| |
Collapse
|
29
|
Tschernoster N, Erger F, Kohl S, Reusch B, Wenzel A, Walsh S, Thiele H, Becker C, Franitza M, Bartram MP, Kömhoff M, Schumacher L, Kukat C, Borodina T, Quedenau C, Nürnberg P, Rinschen MM, Driller JH, Pedersen BP, Schlingmann KP, Hüttel B, Bockenhauer D, Beck B, Altmüller J. Long-read sequencing identifies a common transposition haplotype predisposing for CLCNKB deletions. Genome Med 2023; 15:62. [PMID: 37612755 PMCID: PMC10464140 DOI: 10.1186/s13073-023-01215-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 07/27/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND Long-read sequencing is increasingly used to uncover structural variants in the human genome, both functionally neutral and deleterious. Structural variants occur more frequently in regions with a high homology or repetitive segments, and one rearrangement may predispose to additional events. Bartter syndrome type 3 (BS 3) is a monogenic tubulopathy caused by deleterious variants in the chloride channel gene CLCNKB, a high proportion of these being large gene deletions. Multiplex ligation-dependent probe amplification, the current diagnostic gold standard for this type of mutation, will indicate a simple homozygous gene deletion in biallelic deletion carriers. However, since the phenotypic spectrum of BS 3 is broad even among biallelic deletion carriers, we undertook a more detailed analysis of precise breakpoint regions and genomic structure. METHODS Structural variants in 32 BS 3 patients from 29 families and one BS4b patient with CLCNKB deletions were investigated using long-read and synthetic long-read sequencing, as well as targeted long-read sequencing approaches. RESULTS We report a ~3 kb duplication of 3'-UTR CLCNKB material transposed to the corresponding locus of the neighbouring CLCNKA gene, also found on ~50 % of alleles in healthy control individuals. This previously unknown common haplotype is significantly enriched in our cohort of patients with CLCNKB deletions (45 of 51 alleles with haplotype information, 2.2 kb and 3.0 kb transposition taken together, p=9.16×10-9). Breakpoint coordinates for the CLCNKB deletion were identifiable in 28 patients, with three being compound heterozygous. In total, eight different alleles were found, one of them a complex rearrangement with three breakpoint regions. Two patients had different CLCNKA/CLCNKB hybrid genes encoding a predicted CLCNKA/CLCNKB hybrid protein with likely residual function. CONCLUSIONS The presence of multiple different deletion alleles in our cohort suggests that large CLCNKB gene deletions originated from many independently recurring genomic events clustered in a few hot spots. The uncovered associated sequence transposition haplotype apparently predisposes to these additional events. The spectrum of CLCNKB deletion alleles is broader than expected and likely still incomplete, but represents an obvious candidate for future genotype/phenotype association studies. We suggest a sensitive and cost-efficient approach, consisting of indirect sequence capture and long-read sequencing, to analyse disease-relevant structural variant hotspots in general.
Collapse
Affiliation(s)
- Nikolai Tschernoster
- Cologne Center for Genomics (CCG), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
- Institute of Human Genetics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Str. 34, 50931, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Florian Erger
- Institute of Human Genetics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Str. 34, 50931, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Stefan Kohl
- Department of Pediatrics, Cologne Children's Hospital, Cologne, Germany
| | - Björn Reusch
- Institute of Human Genetics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Str. 34, 50931, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Andrea Wenzel
- Institute of Human Genetics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Str. 34, 50931, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Stephen Walsh
- Department of Renal Medicine, UCL, University College London, London, UK
| | - Holger Thiele
- Cologne Center for Genomics (CCG), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Christian Becker
- Cologne Center for Genomics (CCG), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Marek Franitza
- Cologne Center for Genomics (CCG), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Malte P Bartram
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
- Department II of Internal Medicine, University of Cologne, Cologne, Germany
| | - Martin Kömhoff
- Department of Pediatrics, University Marburg, Marburg, Germany
| | - Lena Schumacher
- FACS & Imaging Core Facility, Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - Christian Kukat
- FACS & Imaging Core Facility, Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - Tatiana Borodina
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Straße 28, 10115, Berlin, Germany
| | - Claudia Quedenau
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Straße 28, 10115, Berlin, Germany
| | - Peter Nürnberg
- Cologne Center for Genomics (CCG), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
| | - Markus M Rinschen
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Aarhus Institute of Advanced Studies, Aarhus University, Aarhus, Denmark
- Department III of Medicine, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Jan H Driller
- Department of Molecular Biology and Genetics, Aarhus University, Universitetsbyen 81, DK-8000, Aarhus C, Denmark
| | - Bjørn P Pedersen
- Department of Molecular Biology and Genetics, Aarhus University, Universitetsbyen 81, DK-8000, Aarhus C, Denmark
| | - Karl P Schlingmann
- Department of General Pediatrics, University Children's Hospital, Münster, Germany
| | - Bruno Hüttel
- Max Planck Genome-Centre Cologne, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Detlef Bockenhauer
- Department of Renal Medicine, UCL, University College London, London, UK
- Great Ormond Street Hospital for Children, NHS Foundation Trust, London, UK
| | - Bodo Beck
- Institute of Human Genetics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Kerpener Str. 34, 50931, Cologne, Germany.
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany.
| | - Janine Altmüller
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany.
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Straße 28, 10115, Berlin, Germany.
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Core Facility Genomics, Berlin, Germany.
| |
Collapse
|
30
|
Sahu SK, Liu M, Chen Y, Gui J, Fang D, Chen X, Yang T, He C, Cheng L, Yang J, Sahu DN, Li L, Wang H, Mu W, Wei J, Liu J, Zhao Y, Zhang S, Lisby M, Liu X, Xu X, Li L, Wang S, Liu H. Chromosome-scale genomes of commercial timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis). Sci Data 2023; 10:512. [PMID: 37537171 PMCID: PMC10400565 DOI: 10.1038/s41597-023-02420-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/26/2023] [Indexed: 08/05/2023] Open
Abstract
Wood is the most important natural and endlessly renewable source of energy. Despite the ecological and economic importance of wood, many aspects of its formation have not yet been investigated. We performed chromosome-scale genome assemblies of three timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis) which exhibit different wood properties such as wood density, hardness, growth rate, and fiber cell wall thickness. The combination of 10X, stLFR, Hi-Fi sequencing and HiC data led us to assemble high-quality genomes evident by scaffold N50 length of 55.97 Mb (O. pyramidale), 22.37 Mb (M. ferrea) and 14.55 Mb (T. grandis) with >97% BUSCO completeness of the assemblies. A total of 35774, 24027, and 44813 protein-coding genes were identified in M. ferrea, T. grandis and O. pyramidale, respectively. The data generated in this study is anticipated to serve as a valuable genetic resource and will promote comparative genomic analyses, and it is of practical importance in gaining a further understanding of the wood properties in non-model woody species.
Collapse
Affiliation(s)
- Sunil Kumar Sahu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Min Liu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
- BGI Life Science Joint Research Center, Northeast Forestry University, Harbin, 150400, China
| | - Yewen Chen
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Jinshan Gui
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 311300, Hangzhou, China
| | - Dongming Fang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Xiaoli Chen
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Ting Yang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Chengzhong He
- Southwest Forestry University, Kunming, Yunnan, 650224, China
| | - Le Cheng
- BGI Research, Kunming, Yunnan, 650106, China
| | - Jinlong Yang
- BGI Research, Kunming, Yunnan, 650106, China
- College of Forensic Science, Xi'an Jiaotong University, Xi'an, China
| | - Durgesh Nandini Sahu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Linzhou Li
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Hongli Wang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Weixue Mu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Jinpu Wei
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Jie Liu
- Forestry Bureau of Ruili, Yunnan Dehong, Ruili, 678600, China
| | | | - Shouzhou Zhang
- Laboratory of Southern Subtropical Plant Diversity, Fairy Lake Botanical Garden, Shenzhen, Chinese Academy of Sciences, Shenzhen, 518004, China
| | - Michael Lisby
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Xin Liu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
| | - Xun Xu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China
- Guangdong Provincial Key Laboratory of Genome Read and Write, BGI Research, Shenzhen, 518083, China
| | - Laigeng Li
- National Key Laboratory of Plant Molecular Genetics and CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, 200032, China.
| | - Sibo Wang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China.
| | - Huan Liu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen, 518083, China.
- BGI Life Science Joint Research Center, Northeast Forestry University, Harbin, 150400, China.
| |
Collapse
|
31
|
McClinton B, Watson CM, Crinnion LA, McKibbin M, Ali M, Inglehearn CF, Toomes C. Haplotyping Using Long-Range PCR and Nanopore Sequencing to Phase Variants: Lessons Learned From the ABCA4 Locus. J Transl Med 2023; 103:100160. [PMID: 37088464 DOI: 10.1016/j.labinv.2023.100160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 01/11/2023] [Accepted: 04/17/2023] [Indexed: 04/25/2023] Open
Abstract
Short-read next-generation sequencing has revolutionized our ability to identify variants underlying inherited diseases; however, it does not allow the phasing of variants to clarify their diagnostic interpretation. The advent of widespread, increasingly accurate long-read sequencing has opened up new applications not currently available through short-read next-generation sequencing. One such use is the ability to phase variants to clarify their diagnostic interpretation and to investigate the increasingly prevalent role of cis-acting variants in the pathogenesis of the inherited disease, so-called complex alleles. Complex alleles are becoming an increasingly prevalent part of the study of genes associated with inherited diseases, for example, in ABCA4-related diseases. We sought to establish a cost-effective method to phase contiguous segments of the 130-kb ABCA4 locus by long-read sequencing of overlapping amplification products. Using the comprehensively characterized CEPH sample, NA12878, we verified the accuracy and robustness of our assay. However, in-field assessment of its utility using clinical test cases was hampered by the paucity and distribution of identified variants and by PCR chimerism, particularly where the number of PCR cycles was high. Despite this, we were able to construct robust phase blocks of up to 94.9 kb, representing 73% of the ABCA4 locus. We conclude that, although haplotype analysis of variants located within discrete amplification products was robust and informative, the stitching together of larger phase blocks using overlapping single-molecule reads remained practically challenging.
Collapse
Affiliation(s)
- Benjamin McClinton
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK
| | - Christopher M Watson
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK; North East and Yorkshire Genomic Laboratory Hub, Central Lab, St. James's University Hospital, Leeds, UK
| | - Laura A Crinnion
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK; North East and Yorkshire Genomic Laboratory Hub, Central Lab, St. James's University Hospital, Leeds, UK
| | - Martin McKibbin
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK; Department of Ophthalmology, St. James's University Hospital, Leeds, UK
| | - Manir Ali
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK
| | - Chris F Inglehearn
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK
| | - Carmel Toomes
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, UK.
| |
Collapse
|
32
|
Guichard A, Legeai F, Tagu D, Lemaitre C. MTG-Link: leveraging barcode information from linked-reads to assemble specific loci. BMC Bioinformatics 2023; 24:284. [PMID: 37452278 PMCID: PMC10347852 DOI: 10.1186/s12859-023-05395-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 06/21/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Local assembly with short and long reads has proven to be very useful in many applications: reconstruction of the sequence of a locus of interest, gap-filling in draft assemblies, as well as alternative allele reconstruction of large Structural Variants. Whereas linked-read technologies have a great potential to assemble specific loci as they provide long-range information while maintaining the power and accuracy of short-read sequencing, there is a lack of local assembly tools for linked-read data. RESULTS We present MTG-Link, a novel local assembly tool dedicated to linked-reads. The originality of the method lies in its read subsampling step which takes advantage of the barcode information contained in linked-reads mapped in flanking regions. We validated our approach on several datasets from different linked-read technologies. We show that MTG-Link is able to assemble successfully large sequences, up to dozens of Kb. We also demonstrate that the read subsampling step of MTG-Link considerably improves the local assembly of specific loci compared to other existing short-read local assembly tools. Furthermore, MTG-Link was able to fully characterize large insertion variants and deletion breakpoints in a human genome and to reconstruct dark regions in clinically-relevant human genes. It also improved the contiguity of a 1.3 Mb locus of biological interest in several individual genomes of the mimetic butterfly Heliconius numata. CONCLUSIONS MTG-Link is an efficient local assembly tool designed for different linked-read sequencing technologies. MTG-Link source code is available at https://github.com/anne-gcd/MTG-Link and as a Bioconda package.
Collapse
Affiliation(s)
- Anne Guichard
- IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France.
- Univ Rennes, Inria, CNRS, IRISA, 35000, Rennes, France.
| | - Fabrice Legeai
- IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France
- Univ Rennes, Inria, CNRS, IRISA, 35000, Rennes, France
| | - Denis Tagu
- IGEPP, INRAE, Institut Agro, Univ Rennes, 35653, Le Rheu, France
| | | |
Collapse
|
33
|
Li JX, Fernandez KX, Ritland C, Jancsik S, Engelhardt DB, Coombe L, Warren RL, van Belkum MJ, Carroll AL, Vederas JC, Bohlmann J, Birol I. Genomic virulence features of Beauveria bassiana as a biocontrol agent for the mountain pine beetle population. BMC Genomics 2023; 24:390. [PMID: 37430186 DOI: 10.1186/s12864-023-09473-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
BACKGROUND The mountain pine beetle, Dendroctonus ponderosae, is an irruptive bark beetle that causes extensive mortality to many pine species within the forests of western North America. Driven by climate change and wildfire suppression, a recent mountain pine beetle (MPB) outbreak has spread across more than 18 million hectares, including areas to the east of the Rocky Mountains that comprise populations and species of pines not previously affected. Despite its impacts, there are few tactics available to control MPB populations. Beauveria bassiana is an entomopathogenic fungus used as a biological agent in agriculture and forestry and has potential as a management tactic for the mountain pine beetle population. This work investigates the phenotypic and genomic variation between B. bassiana strains to identify optimal strains against a specific insect. RESULTS Using comparative genome and transcriptome analyses of eight B. bassiana isolates, we have identified the genetic basis of virulence, which includes oosporein production. Genes unique to the more virulent strains included functions in biosynthesis of mycotoxins, membrane transporters, and transcription factors. Significant differential expression of genes related to virulence, transmembrane transport, and stress response was identified between the different strains, as well as up to nine-fold upregulation of genes involved in the biosynthesis of oosporein. Differential correlation analysis revealed transcription factors that may be involved in regulating oosporein production. CONCLUSION This study provides a foundation for the selection and/or engineering of the most effective strain of B. bassiana for the biological control of mountain pine beetle and other insect pests populations.
Collapse
Affiliation(s)
- Janet X Li
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada.
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada.
| | - Kleinberg X Fernandez
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Carol Ritland
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Sharon Jancsik
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Daniel B Engelhardt
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada
| | - Marco J van Belkum
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Allan L Carroll
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - John C Vederas
- Department of Chemistry, University of Alberta, 11227 Saskatchewan Drive NW, Edmonton, AB, T6G 2G2, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, 2185 East Mall, Vancouver, BC, V6T 1Z4, Canada
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, 570 W 7th Ave #100, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
34
|
Bista I, Wood JMD, Desvignes T, McCarthy SA, Matschiner M, Ning Z, Tracey A, Torrance J, Sims Y, Chow W, Smith M, Oliver K, Haggerty L, Salzburger W, Postlethwait JH, Howe K, Clark MS, William Detrich H, Christina Cheng CH, Miska EA, Durbin R. Genomics of cold adaptations in the Antarctic notothenioid fish radiation. Nat Commun 2023; 14:3412. [PMID: 37296119 PMCID: PMC10256766 DOI: 10.1038/s41467-023-38567-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 05/05/2023] [Indexed: 06/12/2023] Open
Abstract
Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.
Collapse
Affiliation(s)
- Iliana Bista
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge, CB2 1QN, UK.
- Naturalis Biodiversity Center, Leiden, 2333 CR, the Netherlands.
| | - Jonathan M D Wood
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Thomas Desvignes
- University of Oregon, Institute of Neuroscience, 1254 University of Oregon, 13th Avenue, Eugene, OR, 97403, USA
| | - Shane A McCarthy
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Michael Matschiner
- University of Oslo, Natural History Museum, University of Oslo, Sars' gate 1, 0562, Oslo, Norway
- University of Zurich, Department of Palaeontology and Museum, University of Zurich, Karl-Schmid-Strasse 4, 8006, Zurich, Switzerland
| | - Zemin Ning
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Alan Tracey
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - James Torrance
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Ying Sims
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - William Chow
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Michelle Smith
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Karen Oliver
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Walter Salzburger
- University of Basel, Zoological Institute, Department of Environmental Sciences, Vesalgasse 1, 4051, Basel, Switzerland
| | - John H Postlethwait
- University of Oregon, Institute of Neuroscience, 1254 University of Oregon, 13th Avenue, Eugene, OR, 97403, USA
| | - Kerstin Howe
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Melody S Clark
- British Antarctic Survey, High Cross, Madingley Road, Cambridge, CB3 0ET, UK
| | - H William Detrich
- Northeastern University, Department of Marine and Environmental Sciences, Marine Science Centre, 430 Nahant Rd., Nahant, MA, 01908, USA
| | - C-H Christina Cheng
- Department of Evolution, Ecology, and Behaviour, University of Illinois, Urbana-Champaign, IL, 61801, USA
| | - Eric A Miska
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
- Wellcome/CRUK Gurdon Institute, University of Cambridge, Tennis Court Rd, Cambridge, CB2 1QN, UK
| | - Richard Durbin
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
| |
Collapse
|
35
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
36
|
Jin S, Han Z, Hu Y, Si Z, Dai F, He L, Cheng Y, Li Y, Zhao T, Fang L, Zhang T. Structural variation (SV)-based pan-genome and GWAS reveal the impacts of SVs on the speciation and diversification of allotetraploid cottons. MOLECULAR PLANT 2023; 16:678-693. [PMID: 36760124 DOI: 10.1016/j.molp.2023.02.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 01/22/2023] [Accepted: 02/05/2023] [Indexed: 06/18/2023]
Abstract
Structural variations (SVs) have long been described as being involved in the origin, adaption, and domestication of species. However, the underlying genetic and genomic mechanisms are poorly understood. Here, we report a high-quality genome assembly of Gossypium barbadense acc. Tanguis, a landrace that is closely related to formation of extra-long-staple (ELS) cultivated cotton. An SV-based pan-genome (Pan-SV) was then constructed using a total of 182 593 non-redundant SVs, including 2236 inversions, 97 398 insertions, and 82 959 deletions from 11 assembled genomes of allopolyploid cotton. The utility of this Pan-SV was then demonstrated through population structure analysis and genome-wide association studies (GWASs). Using segregation mapping populations produced through crossing ELS cotton and the landrace along with an SV-based GWAS, certain SVs responsible for speciation, domestication, and improvement in tetraploid cottons were identified. Importantly, some of the SVs presently identified as associated with the yield and fiber quality improvement had not been identified in previous SNP-based GWAS. In particular, a 9-bp insertion or deletion was found to associate with elimination of the interspecific reproductive isolation between Gossypium hirsutum and G. barbadense. Collectively, this study provides new insights into genome-wide, gene-scale SVs linked to important agronomic traits in a major crop species and highlights the importance of SVs during the speciation, domestication, and improvement of cultivated crop species.
Collapse
Affiliation(s)
- Shangkun Jin
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Zegang Han
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Yan Hu
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Zhanfeng Si
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Fan Dai
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Lu He
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yu Cheng
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yiqian Li
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Ting Zhao
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Lei Fang
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China; Hainan Institute of Zhejiang University, Sanya 572025, China
| | - Tianzhen Zhang
- Zhejiang Provincial Engineering Center for Crop Precision Breeding, Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China; Hainan Institute of Zhejiang University, Sanya 572025, China.
| |
Collapse
|
37
|
Popic V, Rohlicek C, Cunial F, Hajirasouliha I, Meleshko D, Garimella K, Maheshwari A. Cue: a deep-learning framework for structural variant discovery and genotyping. Nat Methods 2023; 20:559-568. [PMID: 36959322 PMCID: PMC10152467 DOI: 10.1038/s41592-023-01799-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 01/29/2023] [Indexed: 03/25/2023]
Abstract
Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.
Collapse
Affiliation(s)
| | | | - Fabio Cunial
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Iman Hajirasouliha
- Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Dmitry Meleshko
- Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
- Tri-Institutional Computational Biology and Medicine Program, Weill Cornell Medicine, New York, NY, USA
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | |
Collapse
|
38
|
Mikhaylova V, Rzepka M, Kawamura T, Xia Y, Chang PL, Zhou S, Pham L, Modi N, Yao L, Perez-Agustin A, Pagans S, Boles TC, Lei M, Wang Y, Garcia-Bassets I, Chen Z. Targeted Phasing of 2-200 Kilobase DNA Fragments with a Short-Read Sequencer and a Single-Tube Linked-Read Library Method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.05.531179. [PMID: 36945366 PMCID: PMC10028795 DOI: 10.1101/2023.03.05.531179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
In the human genome, heterozygous sites are genomic positions with different alleles inherited from each parent. On average, there is a heterozygous site every 1-2 kilobases (kb). Resolving whether two alleles in neighboring heterozygous positions are physically linked-that is, phased-is possible with a short-read sequencer if the sequencing library captures long-range information. TELL-Seq is a library preparation method based on millions of barcoded micro-sized beads that enables instrument-free phasing of a whole human genome in a single PCR tube. TELL-Seq incorporates a unique molecular identifier (barcode) to the short reads generated from the same high-molecular-weight (HMW) DNA fragment (known as 'linked-reads'). However, genome-scale TELL-Seq is not cost-effective for applications focusing on a single locus or a few loci. Here, we present an optimized TELL-Seq protocol that enables the cost-effective phasing of enriched loci (targets) of varying sizes, purity levels, and heterozygosity. Targeted TELL-Seq maximizes linked-read efficiency and library yield while minimizing input requirements, fragment collisions on microbeads, and sequencing burden. To validate the targeted protocol, we phased seven 180-200 kb loci enriched by CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis, four 20 kb loci enriched by CRISPR/Cas9-mediated protection from exonuclease digestion, and six 2-13 kb loci amplified by PCR. The selected targets have clinical and research relevance (BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA). These analyses reveal that targeted TELL-Seq provides a reliable way of phasing allelic variants within targets (2-200 kb in length) with the low cost and high accuracy of short-read sequencing.
Collapse
Affiliation(s)
| | - Madison Rzepka
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | | | - Yu Xia
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Peter L. Chang
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | | | - Long Pham
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Naisarg Modi
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| | - Likun Yao
- Department of Medicine, University of California, San Diego, La Jolla, CA 92093 USA
| | - Adrian Perez-Agustin
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | - Sara Pagans
- Department of Medical Sciences, School of Medicine, University of Girona, Girona, Spain
| | | | - Ming Lei
- Universal Sequencing Technology Corp., Canton, MA 02021, USA
| | - Yong Wang
- Universal Sequencing Technology Corp., Canton, MA 02021, USA
| | | | - Zhoutao Chen
- Universal Sequencing Technology Corp., Carlsbad, CA 92011, USA
| |
Collapse
|
39
|
Weisweiler M, Stich B. Benchmarking of structural variant detection in the tetraploid potato genome using linked-read sequencing. Genomics 2023; 115:110568. [PMID: 36702293 DOI: 10.1016/j.ygeno.2023.110568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/12/2023] [Accepted: 01/18/2023] [Indexed: 01/25/2023]
Abstract
It has recently been shown that structural variants (SV) can have a higher impact on gene expression variation compared to single nucleotide variants (SNV) in different plant species. Additionally, SV were associated with phenotypic variation in several crops. However, compared to the established SV detection based on short-read sequencing, less approaches were described for linked-read based SV calling. We therefore evaluated the performance of six linked-read SV callers compared to an established short-read SV caller based on simulated linked-reads in tetraploid potato. The objectives of our study were to i) compare the performance of SV callers based on linked-read sequencing to short-read sequencing, ii) examine the influence of SV type, SV length, haplotype incidence (HI), as well as sequencing coverage on the SV calling performance in the tetraploid potato genome, and iii) evaluate the accuracy of detecting insertions by linked-read compared to short-read sequencing. We observed high break point resolutions (BPR) detecting short SV and slightly lower BPR for large SV. Our observations highlighted the importance of short-read signals provided by Manta and LinkedSV to detect short SV. Manta and NAIBR performed well for detecting larger deletions, inversions, and duplications. Detected large SV were weakly influenced by the HI. Furthermore, we illustrated that large insertions can be assembled by Novel-X. Our results suggest the usage of the short-read and linked-read SV callers Manta, NAIBR, LinkedSV, and Novel-X based on at least 90x linked-read sequencing coverage to ensure the detection of a broad range of SV in the tetraploid potato genome.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany; Max Planck Institute for Plant Breeding Research, Carl-von-Linne-Weg 10, 50829 Köln, Germany.
| |
Collapse
|
40
|
Lundberg M, Mackintosh A, Petri A, Bensch S. Inversions maintain differences between migratory phenotypes of a songbird. Nat Commun 2023; 14:452. [PMID: 36707538 PMCID: PMC9883250 DOI: 10.1038/s41467-023-36167-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 01/18/2023] [Indexed: 01/28/2023] Open
Abstract
Structural rearrangements have been shown to be important in local adaptation and speciation, but have been difficult to reliably identify and characterize in non-model species. Here we combine long reads, linked reads and optical mapping to characterize three divergent chromosome regions in the willow warbler Phylloscopus trochilus, of which two are associated with differences in migration and one with an environmental gradient. We show that there are inversions (0.4-13 Mb) in each of the regions and that the divergence times between inverted and non-inverted haplotypes are similar across the regions (~1.2 Myrs), which is compatible with a scenario where inversions arose in either of two allopatric populations that subsequently hybridized. The improved genomes allow us to detect additional functional differences in the divergent regions, providing candidate genes for migration and adaptations to environmental gradients.
Collapse
Affiliation(s)
- Max Lundberg
- Department of Biology, Lund University, Lund, Sweden.
| | | | - Anna Petri
- Science for Life Laboratory, Uppsala Genome Center, Uppsala University, Uppsala, Sweden
| | | |
Collapse
|
41
|
Mastromatteo S, Chen A, Gong J, Lin F, Thiruvahindrapuram B, Sung WW, Whitney J, Wang Z, Patel RV, Keenan K, Halevy A, Panjwani N, Avolio J, Wang C, Côté-Maurais G, Bégin S, Adam D, Brochiero E, Bjornson C, Chilvers M, Price A, Parkins M, van Wylick R, Mateos-Corral D, Hughes D, Smith MJ, Morrison N, Tullis E, Stephenson AL, Wilcox P, Quon BS, Leung WM, Solomon M, Sun L, Ratjen F, Strug LJ. High-quality read-based phasing of cystic fibrosis cohort informs genetic understanding of disease modification. HGG ADVANCES 2023; 4:100156. [PMID: 36386424 PMCID: PMC9647008 DOI: 10.1016/j.xhgg.2022.100156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
Phasing of heterozygous alleles is critical for interpretation of cis-effects of disease-relevant variation. We sequenced 477 individuals with cystic fibrosis (CF) using linked-read sequencing, which display an average phase block N50 of 4.39 Mb. We use these samples to construct a graph representation of CFTR haplotypes, demonstrating its utility for understanding complex CF alleles. These are visualized in a Web app, CFTbaRcodes, that enables interactive exploration of CFTR haplotypes present in this cohort. We perform fine-mapping and phasing of the chr7q35 trypsinogen locus associated with CF meconium ileus, an intestinal obstruction at birth associated with more severe CF outcomes and pancreatic disease. A 20-kb deletion polymorphism and a PRSS2 missense variant p.Thr8Ile (rs62473563) are shown to independently contribute to meconium ileus risk (p = 0.0028, p = 0.011, respectively) and are PRSS2 pancreas eQTLs (p = 9.5 × 10−7 and p = 1.4 × 10−4, respectively), suggesting the mechanism by which these polymorphisms contribute to CF. The phase information from linked reads provides a putative causal explanation for variation at a CF-relevant locus, which also has implications for the genetic basis of non-CF pancreatitis, to which this locus has been reported to contribute.
Collapse
|
42
|
Höök L, Näsvall K, Vila R, Wiklund C, Backström N. High-density linkage maps and chromosome level genome assemblies unveil direction and frequency of extensive structural rearrangements in wood white butterflies (Leptidea spp.). Chromosome Res 2023; 31:2. [PMID: 36662301 PMCID: PMC9859909 DOI: 10.1007/s10577-023-09713-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/19/2022] [Accepted: 12/28/2022] [Indexed: 01/21/2023]
Abstract
Karyotypes are generally conserved between closely related species and large chromosome rearrangements typically have negative fitness consequences in heterozygotes, potentially driving speciation. In the order Lepidoptera, most investigated species have the ancestral karyotype and gene synteny is often conserved across deep divergence, although examples of extensive genome reshuffling have recently been demonstrated. The genus Leptidea has an unusual level of chromosome variation and rearranged sex chromosomes, but the extent of restructuring across the rest of the genome is so far unknown. To explore the genomes of the wood white (Leptidea) species complex, we generated eight genome assemblies using a combination of 10X linked reads and HiC data, and improved them using linkage maps for two populations of the common wood white (L. sinapis) with distinct karyotypes. Synteny analysis revealed an extensive amount of rearrangements, both compared to the ancestral karyotype and between the Leptidea species, where only one of the three Z chromosomes was conserved across all comparisons. Most restructuring was explained by fissions and fusions, while translocations appear relatively rare. We further detected several examples of segregating rearrangement polymorphisms supporting a highly dynamic genome evolution in this clade. Fusion breakpoints were enriched for LINEs and LTR elements, which suggests that ectopic recombination might be an important driver in the formation of new chromosomes. Our results show that chromosome count alone may conceal the extent of genome restructuring and we propose that the amount of genome evolution in Lepidoptera might still be underestimated due to lack of taxonomic sampling.
Collapse
Affiliation(s)
- L. Höök
- Evolutionary Biology Program, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - K. Näsvall
- Evolutionary Biology Program, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - R. Vila
- Butterfly Diversity and Evolution Lab, Institut de Biologia Evolutiva (CSIC-UPF), Barcelona, Spain
| | - C. Wiklund
- Department of Zoology, Division of Ecology, Stockholm University, Stockholm, Sweden
| | - N. Backström
- Evolutionary Biology Program, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| |
Collapse
|
43
|
Etherington GJ, Nash W, Ciezarek A, Mehta TK, Barria A, Peñaloza C, Khan MGQ, Durrant A, Forrester N, Fraser F, Irish N, Kaithakottil GG, Lipscombe J, Trong T, Watkins C, Swarbreck D, Angiolini E, Cnaani A, Gharbi K, Houston RD, Benzie JAH, Haerty W. Chromosome-level genome sequence of the Genetically Improved Farmed Tilapia (GIFT, Oreochromis niloticus) highlights regions of introgression with O. mossambicus. BMC Genomics 2022; 23:832. [PMID: 36522771 PMCID: PMC9756657 DOI: 10.1186/s12864-022-09065-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 12/05/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The Nile tilapia (Oreochromis niloticus) is the third most important freshwater fish for aquaculture. Its success is directly linked to continuous breeding efforts focusing on production traits such as growth rate and weight. Among those elite strains, the Genetically Improved Farmed Tilapia (GIFT) programme initiated by WorldFish is now distributed worldwide. To accelerate the development of the GIFT strain through genomic selection, a high-quality reference genome is necessary. RESULTS Using a combination of short (10X Genomics) and long read (PacBio HiFi, PacBio CLR) sequencing and a genetic map for the GIFT strain, we generated a chromosome level genome assembly for the GIFT. Using genomes of two closely related species (O. mossambicus, O. aureus), we characterised the extent of introgression between these species and O. niloticus that has occurred during the breeding process. Over 11 Mb of O. mossambicus genomic material could be identified within the GIFT genome, including genes associated with immunity but also with traits of interest such as growth rate. CONCLUSION Because of the breeding history of elite strains, current reference genomes might not be the most suitable to support further studies into the GIFT strain. We generated a chromosome level assembly of the GIFT strain, characterising its mixed origins, and the potential contributions of introgressed regions to selected traits.
Collapse
Affiliation(s)
- G J Etherington
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - W Nash
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - A Ciezarek
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - T K Mehta
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - A Barria
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - C Peñaloza
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - M G Q Khan
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Department of Fisheries Biology and Genetics, Bangladesh Agricultural University, Mymensingh, 2202, Bangladesh
| | - A Durrant
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - N Forrester
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - F Fraser
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - N Irish
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - G G Kaithakottil
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - J Lipscombe
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - T Trong
- WorldFish, 10670, Penang, Malaysia
| | - C Watkins
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - D Swarbreck
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - E Angiolini
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - A Cnaani
- Department of Poultry and Aquaculture, Institute of Animal Science, Agricultural Research Organization - Volcani Institute, Rishon LeTsiyon, Israel
| | - K Gharbi
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK
| | - R D Houston
- The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- Benchmark Genetics, 1 Pioneer Building, Edinburgh Technopole, Penicuik, EH26 0GB, UK
| | | | - W Haerty
- Earlham Institute, Norwich Research Park, Colney Ln, Norwich, NR4 7UZ, UK.
- School of Biological Sciences, University of East Anglia, Norwich, UK.
| |
Collapse
|
44
|
Rapid molecular diversification and homogenization of clustered major ampullate silk genes in Argiope garden spiders. PLoS Genet 2022; 18:e1010537. [PMID: 36508456 PMCID: PMC9779670 DOI: 10.1371/journal.pgen.1010537] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 12/22/2022] [Accepted: 11/18/2022] [Indexed: 12/14/2022] Open
Abstract
The evolutionary diversification of orb-web weaving spiders is closely tied to the mechanical performance of dragline silk. This proteinaceous fiber provides the primary structural framework of orb web architecture, and its extraordinary toughness allows these structures to absorb the high energy of aerial prey impact. The dominant model of dragline silk molecular structure involves the combined function of two highly repetitive, spider-specific, silk genes (spidroins)-MaSp1 and MaSp2. Recent genomic studies, however, have suggested this framework is overly simplistic, and our understanding of how MaSp genes evolve is limited. Here we present a comprehensive analysis of MaSp structural and evolutionary diversity across species of Argiope (garden spiders). This genomic analysis reveals the largest catalog of MaSp genes found in any spider, driven largely by an expansion of MaSp2 genes. The rapid diversification of Argiope MaSp genes, located primarily in a single genomic cluster, is associated with profound changes in silk gene structure. MaSp2 genes, in particular, have evolved complex hierarchically organized repeat units (ensemble repeats) delineated by novel introns that exhibit remarkable evolutionary dynamics. These repetitive introns have arisen independently within the genus, are highly homogenized within a gene, but diverge rapidly between genes. In some cases, these iterated introns are organized in an alternating structure in which every other intron is nearly identical in sequence. We hypothesize that this intron structure has evolved to facilitate homogenization of the coding sequence. We also find evidence of intergenic gene conversion and identify a more diverse array of stereotypical amino acid repeats than previously recognized. Overall, the extreme diversification found among MaSp genes requires changes in the structure-function model of dragline silk performance that focuses on the differential use and interaction among various MaSp paralogs as well as the impact of ensemble repeat structure and different amino acid motifs on mechanical behavior.
Collapse
|
45
|
Peel E, Silver L, Brandies P, Zhu Y, Cheng Y, Hogg CJ, Belov K. Best genome sequencing strategies for annotation of complex immune gene families in wildlife. Gigascience 2022; 11:giac100. [PMID: 36310247 PMCID: PMC9618407 DOI: 10.1093/gigascience/giac100] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/10/2022] [Accepted: 09/29/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The biodiversity crisis and increasing impact of wildlife disease on animal and human health provides impetus for studying immune genes in wildlife. Despite the recent boom in genomes for wildlife species, immune genes are poorly annotated in nonmodel species owing to their high level of polymorphism and complex genomic organisation. Our research over the past decade and a half on Tasmanian devils and koalas highlights the importance of genomics and accurate immune annotations to investigate disease in wildlife. Given this, we have increasingly been asked the minimum levels of genome quality required to effectively annotate immune genes in order to study immunogenetic diversity. Here we set out to answer this question by manually annotating immune genes in 5 marsupial genomes and 1 monotreme genome to determine the impact of sequencing data type, assembly quality, and automated annotation on accurate immune annotation. RESULTS Genome quality is directly linked to our ability to annotate complex immune gene families, with long reads and scaffolding technologies required to reassemble immune gene clusters and elucidate evolution, organisation, and true gene content of the immune repertoire. Draft-quality genomes generated from short reads with HiC or 10× Chromium linked reads were unable to achieve this. Despite mammalian BUSCOv5 scores of up to 94.1% amongst the 6 genomes, automated annotation pipelines incorrectly annotated up to 59% of manually annotated immune genes regardless of assembly quality or method of automated annotation. CONCLUSIONS Our results demonstrate that long reads and scaffolding technologies, alongside manual annotation, are required to accurately study the immune gene repertoire of wildlife species.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Luke Silver
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Parice Brandies
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Ying Zhu
- Sichuan Provincial Academy of Natural Resource Sciences, Chengdu, Sichuan 610000, China
| | - Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, University of Sydney, Sydney NSW 2006, Australia
| |
Collapse
|
46
|
Ketchum RN, Davidson PL, Smith EG, Wray GA, Burt JA, Ryan JF, Reitzel AM. A Chromosome-level Genome Assembly of the Highly Heterozygous Sea Urchin Echinometra sp. EZ Reveals Adaptation in the Regulatory Regions of Stress Response Genes. Genome Biol Evol 2022; 14:evac144. [PMID: 36161313 PMCID: PMC9557091 DOI: 10.1093/gbe/evac144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2022] [Indexed: 11/14/2022] Open
Abstract
Echinometra is the most widespread genus of sea urchin and has been the focus of a wide range of studies in ecology, speciation, and reproduction. However, available genetic data for this genus are generally limited to a few select loci. Here, we present a chromosome-level genome assembly based on 10x Genomics, PacBio, and Hi-C sequencing for Echinometra sp. EZ from the Persian/Arabian Gulf. The genome is assembled into 210 scaffolds totaling 817.8 Mb with an N50 of 39.5 Mb. From this assembly, we determined that the E. sp. EZ genome consists of 2n = 42 chromosomes. BUSCO analysis showed that 95.3% of BUSCO genes were complete. Ab initio and transcript-informed gene modeling and annotation identified 29,405 genes, including a conserved Hox cluster. E. sp. EZ can be found in high-temperature and high-salinity environments, and we therefore compared E. sp. EZ gene families and transcription factors associated with environmental stress response ("defensome") with other echinoid species with similar high-quality genomic resources. While the number of defensome genes was broadly similar for all species, we identified strong signatures of positive selection in E. sp. EZ noncoding elements near genes involved in environmental response pathways as well as losses of transcription factors important for environmental response. These data provide key insights into the biology of E. sp. EZ as well as the diversification of Echinometra more widely and will serve as a useful tool for the community to explore questions in this taxonomic group and beyond.
Collapse
Affiliation(s)
- Remi N Ketchum
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
- Whitney Laboratory for Marine Bioscience, University of Florida, Marineland, Florida, USA
| | | | - Edward G Smith
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| | - Gregory A Wray
- Department of Biology, Duke University, Durham, North Carolina, USA
| | - John A Burt
- Water Research Center & Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, UAE
| | - Joseph F Ryan
- Whitney Laboratory for Marine Bioscience, University of Florida, Marineland, Florida, USA
| | - Adam M Reitzel
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, North Carolina, USA
| |
Collapse
|
47
|
Weisweiler M, Arlt C, Wu PY, Van Inghelandt D, Hartwig T, Stich B. Structural variants in the barley gene pool: precision and sensitivity to detect them using short-read sequencing and their association with gene expression and phenotypic variation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3511-3529. [PMID: 36029318 PMCID: PMC9519679 DOI: 10.1007/s00122-022-04197-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 08/03/2022] [Indexed: 06/15/2023]
Abstract
Structural variants (SV) of 23 barley inbreds, detected by the best combination of SV callers based on short-read sequencing, were associated with genome-wide and gene-specific gene expression and, thus, were evaluated to predict agronomic traits. In human genetics, several studies have shown that phenotypic variation is more likely to be caused by structural variants (SV) than by single nucleotide variants. However, accurate while cost-efficient discovery of SV in complex genomes remains challenging. The objectives of our study were to (i) facilitate SV discovery studies by benchmarking SV callers and their combinations with respect to their sensitivity and precision to detect SV in the barley genome, (ii) characterize the occurrence and distribution of SV clusters in the genomes of 23 barley inbreds that are the parents of a unique resource for mapping quantitative traits, the double round robin population, (iii) quantify the association of SV clusters with transcript abundance, and (iv) evaluate the use of SV clusters for the prediction of phenotypic traits. In our computer simulations based on a sequencing coverage of 25x, a sensitivity > 70% and precision > 95% was observed for all combinations of SV types and SV length categories if the best combination of SV callers was used. We observed a significant (P < 0.05) association of gene-associated SV clusters with global gene-specific gene expression. Furthermore, about 9% of all SV clusters that were within 5 kb of a gene were significantly (P < 0.05) associated with the gene expression of the corresponding gene. The prediction ability of SV clusters was higher compared to that of single-nucleotide polymorphisms from an array across the seven studied phenotypic traits. These findings suggest the usefulness of exploiting SV information when fine mapping and cloning the causal genes underlying quantitative traits as well as the high potential of using SV clusters for the prediction of phenotypes in diverse germplasm sets.
Collapse
Affiliation(s)
- Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Christopher Arlt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Po-Ya Wu
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Delphine Van Inghelandt
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Thomas Hartwig
- Institute for Molecular Physiology, Universitätsstraße 1, 40225, Düsseldorf, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225, Düsseldorf, Germany.
- Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225, Düsseldorf, Germany.
| |
Collapse
|
48
|
Zhou M, Ko M, Hoge AC, Luu K, Liu Y, Russell ML, Hannon WW, Zhang Z, Carrot-Zhang J, Beroukhim R, Van Allen EM, Choudhury AD, Nelson PS, Freedman ML, Taplin ME, Meyerson M, Viswanathan SR, Ha G. Patterns of structural variation define prostate cancer across disease states. JCI Insight 2022; 7:e161370. [PMID: 35943799 PMCID: PMC9536266 DOI: 10.1172/jci.insight.161370] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/04/2022] [Indexed: 11/19/2022] Open
Abstract
The complex genomic landscape of prostate cancer evolves across disease states under therapeutic pressure directed toward inhibiting androgen receptor (AR) signaling. While significantly altered genes in prostate cancer have been extensively defined, there have been fewer systematic analyses of how structural variation shapes the genomic landscape of this disease across disease states. We uniformly characterized structural alterations across 531 localized and 143 metastatic prostate cancers profiled by whole genome sequencing, 125 metastatic samples of which were also profiled via whole transcriptome sequencing. We observed distinct significantly recurrent breakpoints in localized and metastatic castration-resistant prostate cancers (mCRPC), with pervasive alterations in noncoding regions flanking the AR, MYC, FOXA1, and LSAMP genes enriched in mCRPC and TMPRSS2-ERG rearrangements enriched in localized prostate cancer. We defined 9 subclasses of mCRPC based on signatures of structural variation, each associated with distinct genetic features and clinical outcomes. Our results comprehensively define patterns of structural variation in prostate cancer and identify clinically actionable subgroups based on whole genome profiling.
Collapse
Affiliation(s)
- Meng Zhou
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Minjeong Ko
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Anna C.H. Hoge
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Kelsey Luu
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Yuzhen Liu
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Magdalena L. Russell
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - William W. Hannon
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Zhenwei Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Department of Pathology, UMass Memorial Medical Center, Worcester, Massachusetts, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Eliezer M. Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Cancer Genomics, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Atish D. Choudhury
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Peter S. Nelson
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Matthew L. Freedman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Mary-Ellen Taplin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Srinivas R. Viswanathan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Gavin Ha
- Public Health Sciences and Human Biology Divisions, Fred Hutchinson Cancer Center, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
49
|
Mackintosh A, Laetsch DR, Baril T, Ebdon S, Jay P, Vila R, Hayward A, Lohse K. The genome sequence of the scarce swallowtail, Iphiclides podalirius. G3 (BETHESDA, MD.) 2022; 12:jkac193. [PMID: 35929795 PMCID: PMC9434224 DOI: 10.1093/g3journal/jkac193] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 07/20/2022] [Indexed: 12/04/2022]
Abstract
The scarce swallowtail, Iphiclides podalirius (Linnaeus, 1758), is a species of butterfly in the family Papilionidae. Here, we present a chromosome-level genome assembly for Iphiclides podalirius as well as gene and transposable element annotations. We investigate how the density of genomic features differs between the 30 Iphiclides podalirius chromosomes. We find that shorter chromosomes have higher heterozygosity at four-fold-degenerate sites and a greater density of transposable elements. While the first result is an expected consequence of differences in recombination rate, the second suggests a counter-intuitive relationship between recombination and transposable element evolution. This high-quality genome assembly, the first for any species in the tribe Leptocircini, will be a valuable resource for population genomics in the genus Iphiclides and comparative genomics more generally.
Collapse
Affiliation(s)
- Alexander Mackintosh
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Dominik R Laetsch
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Tobias Baril
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
| | - Sam Ebdon
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Paul Jay
- Ecologie Systématique Evolution, Bâtiment 360, CNRS, AgroParisTech, Université Paris-Saclay, 91400 Orsay, France
| | - Roger Vila
- Institut de Biologia Evolutiva (CSIC—Universitat Pompeu Fabra), Barcelona 08003, Spain
| | - Alex Hayward
- Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Cornwall TR10 9FE, UK
| | - Konrad Lohse
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
50
|
Löytynoja A. Thousands of human mutation clusters are explained by short-range template switching. Genome Res 2022; 32:1437-1447. [PMID: 35760560 PMCID: PMC9435742 DOI: 10.1101/gr.276478.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 06/21/2022] [Indexed: 02/03/2023]
Abstract
Variation within human genomes is unevenly distributed, and variants show spatial clustering. DNA replication-related template switching is a poorly known mutational mechanism capable of causing major chromosomal rearrangements as well as creating short inverted sequence copies that appear as local mutation clusters in sequence comparisons. In this study, haplotype-resolved genome assemblies representing 25 human populations and multinucleotide variants aggregated from 140,000 human sequencing experiments were reanalyzed. Local template switching could explain thousands of complex mutation clusters across the human genome, the loci segregating within and between populations. During the study, computational tools were developed for identification of template switch events using both short-read sequencing data and genotype data, and for genotyping candidate loci using short-read data. The characteristics of template-switch mutations complicate their detection, and widely used analysis pipelines for short-read sequencing data, normally capable of identifying single nucleotide changes, were found to miss template-switch mutations of tens of base pairs, potentially invalidating medical genetic studies searching for a causative allele behind genetic diseases. Combined with the massive sequencing data now available for humans, the novel tools described here enable building catalogs of affected loci and studying the cellular mechanisms behind template switching in both healthy organisms and disease.
Collapse
Affiliation(s)
- Ari Löytynoja
- Institute of Biotechnology, University of Helsinki, FI-00014 Helsinki, Finland
| |
Collapse
|