1
|
Singh L, San JE, Tegally H, Brzoska PM, Anyaneji UJ, Wilkinson E, Clark L, Giandhari J, Pillay S, Lessells RJ, Martin DP, Furtado M, Kiran AM, de Oliveira T. Targeted Sanger sequencing to recover key mutations in SARS-CoV-2 variant genome assemblies produced by next-generation sequencing. Microb Genom 2022; 8:000774. [PMID: 35294336 PMCID: PMC9176282 DOI: 10.1099/mgen.0.000774] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 01/06/2022] [Indexed: 12/19/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is adaptively evolving to ensure its persistence within human hosts. It is therefore necessary to continuously monitor the emergence and prevalence of novel variants that arise. Importantly, some mutations have been associated with both molecular diagnostic failures and reduced or abrogated next-generation sequencing (NGS) read coverage in some genomic regions. Such impacts are particularly problematic when they occur in genomic regions such as those that encode the spike (S) protein, which are crucial for identifying and tracking the prevalence and dissemination dynamics of concerning viral variants. Targeted Sanger sequencing presents a fast and cost-effective means to accurately extend the coverage of whole-genome sequences. We designed a custom set of primers to amplify a 401 bp segment of the receptor-binding domain (RBD) (between positions 22698 and 23098 relative to the Wuhan-Hu-1 reference). We then designed a Sanger sequencing wet-laboratory protocol. We applied the primer set and wet-laboratory protocol to sequence 222 samples that were missing positions with key mutations K417N, E484K, and N501Y due to poor coverage after NGS sequencing. Finally, we developed SeqPatcher, a Python-based computational tool to analyse the trace files yielded by Sanger sequencing to generate consensus sequences, or take preanalysed consensus sequences in fasta format, and merge them with their corresponding whole-genome assemblies. We successfully sequenced 153 samples of 222 (69 %) using Sanger sequencing and confirmed the occurrence of key beta variant mutations (K417N, E484K, N501Y) in the S genes of 142 of 153 (93 %) samples. Additionally, one sample had the Y508F mutation and four samples the S477N. Samples with RT-PCR Ct scores ranging from 13.85 to 37.47 (mean=25.70) could be Sanger sequenced efficiently. These results show that our method and pipeline can be used to improve the quality of whole-genome assemblies produced using NGS and can be used with any pairs of the most used NGS and Sanger sequencing platforms.
Collapse
Affiliation(s)
- Lavanya Singh
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | - James E. San
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | - Houriiyah Tegally
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | | | - Ugochukwu J. Anyaneji
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | - Eduan Wilkinson
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, 7600, South Africa
| | - Lindsay Clark
- HPCBio, Roy J. Carver Biotechnology Center, University of Illinois, IL, USA
| | - Jennifer Giandhari
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | - Sureshnee Pillay
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | - Richard J. Lessells
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
| | - Darren Patrick Martin
- Institute of Infectious Diseases and Molecular Medicine, Division of Computational Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town 7701, South Africa
| | | | - Anmol M. Kiran
- Malawi-Liverpool-Wellcome Trust, Chichiri, Blantyre 3, Malawi
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool CH64 7TE, UK
| | - Tulio de Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform, University of KwaZulu-Natal, Durban, South Africa
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, 7600, South Africa
- Department of Global Health, University of Washington, Seattle, WA, USA
| |
Collapse
|
2
|
Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CLG, Davis C, Ewing B, Oommen S, Lau C, Yu HC, Li J, Roe BA, Green P, Gerhard DS, Temple G, Haussler D, Brent MR. Targeted discovery of novel human exons by comparative genomics. Genes Dev 2007; 17:1763-73. [PMID: 17989246 PMCID: PMC2099585 DOI: 10.1101/gr.7128207] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Accepted: 10/15/2007] [Indexed: 01/20/2023]
Abstract
A complete and accurate set of human protein-coding gene annotations is perhaps the single most important resource for genomic research after the human-genome sequence itself, yet the major gene catalogs remain incomplete and imperfect. Here we describe a genome-wide effort, carried out as part of the Mammalian Gene Collection (MGC) project, to identify human genes not yet in the gene catalogs. Our approach was to produce gene predictions by algorithms that rely on comparative sequence data but do not require direct cDNA evidence, then to test predicted novel genes by RT-PCR. We have identified 734 novel gene fragments (NGFs) containing 2188 exons with, at most, weak prior cDNA support. These NGFs correspond to an estimated 563 distinct genes, of which >160 are completely absent from the major gene catalogs, while hundreds of others represent significant extensions of known genes. The NGFs appear to be predominantly protein-coding genes rather than noncoding RNAs, unlike novel transcribed sequences identified by technologies such as tiling arrays and CAGE. They tend to be expressed at low levels and in a tissue-specific manner, and they are enriched for roles in motor activity, cell adhesion, connective tissue, and central nervous system development. Our results demonstrate that many important genes and gene fragments have been missed by traditional approaches to gene discovery but can be identified by their evolutionary signatures using comparative sequence data. However, they suggest that hundreds-not thousands-of protein-coding genes are completely missing from the current gene catalogs.
Collapse
Affiliation(s)
- Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Skotheim RI, Nees M. Alternative splicing in cancer: Noise, functional, or systematic? Int J Biochem Cell Biol 2007; 39:1432-49. [PMID: 17416541 DOI: 10.1016/j.biocel.2007.02.016] [Citation(s) in RCA: 157] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2006] [Revised: 02/13/2007] [Accepted: 02/22/2007] [Indexed: 12/22/2022]
Abstract
Pre-messenger RNA splicing is a fine-tuned process that generates multiple functional variants from individual genes. Various cell types and developmental stages regulate alternative splicing patterns differently in their generation of specific gene functions. In cancers, splicing is significantly altered, and understanding the underlying mechanisms and patterns in cancer will shed new light onto cancer biology. Cancer-specific transcript variants are promising biomarkers and targets for diagnostic, prognostic, and treatment purposes. In this review, we explore how alternative splicing cannot simply be considered as noise or an innocent bystander, but is actively regulated or deregulated in cancers. A special focus will be on aspects of cell biology and biochemistry of alternative splicing in cancer cells, addressing differences in splicing mechanisms between normal and malignant cells. The systems biology of splicing is only now applied to the field of cancer research. We explore functional annotations for some of the most intensely spliced gene classes, and provide a literature mining and clustering that reflects the most intensely investigated genes. A few well-established cancer-specific splice events, such as the CD44 antigen, are used to illustrate the potential behind the exploration of the mechanisms of their regulation. Accordingly, we describe the functional connection between the regulatory machinery (i.e., the spliceosome and its accessory proteins) and their global impact on qualitative transcript variation that are only now emerging from the use of genomic technologies such as microarrays. These studies are expected to open an entirely new level of genetic information that is currently still poorly understood.
Collapse
Affiliation(s)
- Rolf I Skotheim
- Department of Cancer Prevention, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| | | |
Collapse
|