1
|
Lorenzi JN, Graner F, Courtier-Orgogozo V, Achaz G. CNCA aligns small annotated genomes. BMC Bioinformatics 2024; 25:89. [PMID: 38424511 PMCID: PMC10905818 DOI: 10.1186/s12859-024-05700-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 02/12/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND To explore the evolutionary history of sequences, a sequence alignment is a first and necessary step, and its quality is crucial. In the context of the study of the proximal origins of SARS-CoV-2 coronavirus, we wanted to construct an alignment of genomes closely related to SARS-CoV-2 using both coding and non-coding sequences. To our knowledge, there is no tool that can be used to construct this type of alignment, which motivated the creation of CNCA. RESULTS CNCA is a web tool that aligns annotated genomes from GenBank files. It generates a nucleotide alignment that is then updated based on the protein sequence alignment. The output final nucleotide alignment matches the protein alignment and guarantees no frameshift. CNCA was designed to align closely related small genome sequences up to 50 kb (typically viruses) for which the gene order is conserved. CONCLUSIONS CNCA constructs multiple alignments of small genomes by integrating both coding and non-coding sequences. This preserves regions traditionally ignored in conventional back-translation methods, such as non-coding regions.
Collapse
Affiliation(s)
- Jean-Noël Lorenzi
- Université Paris Cité, Paris, France.
- CNRS, Institut Jacques Monod, 75013, Paris, France.
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, 75006, Paris, France.
| | - François Graner
- Université Paris Cité, Paris, France
- CNRS, Matière Et Systèmes Complexes, 75013, Paris, France
| | | | - Guillaume Achaz
- SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, 75006, Paris, France
| |
Collapse
|
2
|
Mallory MA, Hymas WC, Simmon KE, Pyne MT, Stevenson JB, Barker AP, Hillyard DR, Hanson KE. Development and validation of a next-generation sequencing assay with open-access analysis software for detecting resistance-associated mutations in CMV. J Clin Microbiol 2023; 61:e0082923. [PMID: 38092673 PMCID: PMC10729743 DOI: 10.1128/jcm.00829-23] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/29/2023] [Indexed: 12/20/2023] Open
Abstract
Cytomegalovirus (CMV) resistance testing by targeted next-generation sequencing (NGS) allows for the simultaneous analysis of multiple genes. We developed and validated an amplicon-based Ion Torrent NGS assay to detect CMV resistance mutations in UL27, UL54, UL56, and UL97 and compared the results to standard Sanger sequencing. NGS primers were designed to generate 83 overlapping amplicons of four CMV genes (~10 kb encompassing 138 mutation sites). An open-access software plugin was developed to perform read alignment, call variants, and interpret drug resistance. Plasmids were tested to determine NGS error rate and minor variant limit of detection. NGS limit of detection was determined using the CMV WHO International Standard and quantified clinical specimens. Reproducibility was also assessed. After establishing quality control metrics, 185 patient specimens previously tested using Sanger were reanalyzed by NGS. The NGS assay had a low error rate (<0.05%) and high accuracy (95%) for detecting CMV-associated resistance mutations present at ≥5% in contrived mixed populations. Mutation sites were reproducibly sequenced with 40× coverage when plasma viral loads were ≥2.6 log IU/mL. NGS detected the same resistance-associated mutations identified by Sanger in 68/69 (98.6%) specimens. In 16 specimens, NGS detected 18 resistance mutations that Sanger failed to detect; 14 were low-frequency variants (<20%), and six would have changed the drug resistance interpretation. The NGS assay showed excellent agreement with Sanger and generated high-quality sequence from low viral load specimens. Additionally, the higher resolution and analytic sensitivity of NGS potentially enables earlier detection of antiviral resistance.
Collapse
Affiliation(s)
- Melanie A. Mallory
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Weston C. Hymas
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Keith E. Simmon
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Michael T. Pyne
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Jeffery B. Stevenson
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
| | - Adam P. Barker
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
- Department of Pathology, University of Utah, Salt Lake City, Utah, USA
| | - David R. Hillyard
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
- Department of Pathology, University of Utah, Salt Lake City, Utah, USA
| | - Kimberly E. Hanson
- ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, Utah, USA
- Department of Pathology, University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
3
|
Yao Y, Frith MC. Improved DNA-Versus-Protein Homology Search for Protein Fossils. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1691-1699. [PMID: 35617174 DOI: 10.1109/tcbb.2022.3177855] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein fossils, i.e., noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and faster. Of the ∼ 7 major categories of eukaryotic TE, three were long thought absent in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally. This is an extended version of a conference paper (Yao & Frith, 2021).
Collapse
|
4
|
Howe AY, Rodrigo C, Cunningham E, Douglas MW, Dietz J, Grebely J, Popping S, Sfalcin JA, Parczewski M, Sarrazin C, de Salazar A, Fuentes A, Sayan M, Quer J, Kjellin M, Kileng H, Mor O, Lennerstrand J, Fourati S, di Maio VC, Chulanov V, Pawlotsky JM, Harrigan PR, Ceccherini-Silberstein F, Garcia F, Martinello M, Matthews G, Fernando FF, Esteban JI, Müllhaupt B, Wiesch JSZ, Buggisch P, Neumann-Haefelin C, Berg T, Berg CP, Schattenberg JM, Moreno C, Stauber R, Lloyd A, Dore G, Applegate T, Ignacio J, Garcia-Cehic D, Gregori J, Rodriguez-Frias F, Rando A, Angelico M, Andreoni M, Babudieri S, Bertoli A, Cento V, Coppola N, Craxì A, Paolucci S, Parruti G, Pasquazzi C, Perno CF, Teti E, Vironet C, Lannergård A, Duberg AS, Aleman S, Gutteberg T, Soulier A, Gourgeon A, Chevaliez S, Pol S, Carrat F, Salmon D, Kaiser R, Knopes E, Gomes P, de Kneght R, Rijnders B, Poljak M, Lunar M, Usubillaga R, Seguin C, Tay E, Wilson C, Wang DS, George J, Kok J, Pérez AB, Chueca N, García-Deltoro M, Martínez-Sapiña AM, Lara-Pérez MM, García-Bujalance S, Aldámiz-Echevarría T, Vera-Méndez FJ, Pineda JA, Casado M, Pascasio JM, Salmerón J, Alados-Arboledas JC, Poyato A, Téllez F, Rivero-Juárez A, Merino D, Vivancos-Gallego MJ, Rosales-Zábal JM, Ocete MD, Simón MÁ, Rincón P, Reus S, De la Iglesia A, García-Arata I, Jiménez M, Jiménez F, Hernández-Quero J, Galera C, Balghata MO, Primo J, Masiá M, Espinosa N, Delgado M, von-Wichmann MÁ, Collado A, Santos J, Mínguez C, Díaz-Flores F, Fernández E, Bernal E, De Juan J, Antón JJ, Vélez M, Aguilera A, Navarro D, Arenas JI, Fernández C, Espinosa MD, Ríos MJ, Alonso R, Hidalgo C, Hernández R, Téllez MJ, Rodríguez FJ, Antequera P, Delgado C, Martín P, Crespo J, Becerril B, Pérez O, García-Herola A, Montero J, Freyre C, Grau C, Cabezas J, Jimenez M, Rodriguez MAM, Quilez C, Pardo MR, Muñoz-Medina L, Figueruela B. Characteristics of hepatitis C virus resistance in an international cohort after a decade of direct-acting antivirals. JHEP Rep 2022; 4:100462. [PMID: 35434589 PMCID: PMC9010635 DOI: 10.1016/j.jhepr.2022.100462] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 02/05/2022] [Indexed: 10/24/2022] Open
Abstract
Background & Aims Direct-acting antiviral (DAA) regimens provide a cure in >95% of patients with chronic HCV infection. However, in some patients in whom therapy fails, resistance-associated substitutions (RASs) can develop, limiting retreatment options and risking onward resistant virus transmission. In this study, we evaluated RAS prevalence and distribution, including novel NS5A RASs and clinical factors associated with RAS selection, among patients who experienced DAA treatment failure. Methods SHARED is an international consortium of clinicians and scientists studying HCV drug resistance. HCV sequence linked metadata from 3,355 patients were collected from 22 countries. NS3, NS5A, and NS5B RASs in virologic failures, including novel NS5A substitutions, were examined. Associations of clinical and demographic characteristics with RAS selection were investigated. Results The frequency of RASs increased from its natural prevalence following DAA exposure: 37% to 60% in NS3, 29% to 80% in NS5A, 15% to 22% in NS5B for sofosbuvir, and 24% to 37% in NS5B for dasabuvir. Among 730 virologic failures, most were treated with first-generation DAAs, 94% had drug resistance in ≥1 DAA class: 31% single-class resistance, 42% dual-class resistance (predominantly against protease and NS5A inhibitors), and 21% triple-class resistance. Distinct patterns containing ≥2 highly resistant RASs were common. New potential NS5A RASs and adaptive changes were identified in genotypes 1a, 3, and 4. Following DAA failure, RAS selection was more frequent in older people with cirrhosis and those infected with genotypes 1b and 4. Conclusions Drug resistance in HCV is frequent after DAA treatment failure. Previously unrecognized substitutions continue to emerge and remain uncharacterized. Lay summary Although direct-acting antiviral medications effectively cure hepatitis C in most patients, sometimes treatment selects for resistant viruses, causing antiviral drugs to be either ineffective or only partially effective. Multidrug resistance is common in patients for whom DAA treatment fails. Older patients and patients with advanced liver diseases are more likely to select drug-resistant viruses. Collective efforts from international communities and governments are needed to develop an optimal approach to managing drug resistance and preventing the transmission of resistant viruses.
Collapse
Key Words
- DAA
- DAA, direct-acting antiviral
- DCV, daclatasvir
- DSV, dasabuvir
- GT, genotype
- HCV
- LDV, ledipasvir
- NI, nucleoside
- NNI, non-nucleoside
- NS5A
- NS5AI, NS5A replication complex inhibitor
- OR, odds ratio
- PI, NS3 protease inhibitor
- PIB, pibrentasvir
- RAS
- RASs, resistance-associated substitutions
- SHARED, The Surveillance of Hepatitis C Antiviral Resistance, Epidemiology and methoDologies
- SOF, sofosbuvir
- SVR, sustained virologic response
- VEL, velpatasvir
- aOR, adjusted odds ratio
- sFC, substitution frequency change
- virologic failure
Collapse
|
5
|
Rangan R, Zheludev IN, Hagey RJ, Pham EA, Wayment-Steele HK, Glenn JS, Das R. RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses: a first look. RNA (NEW YORK, N.Y.) 2020; 26:937-959. [PMID: 32398273 PMCID: PMC7373990 DOI: 10.1261/rna.076141.120] [Citation(s) in RCA: 182] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 05/11/2020] [Indexed: 05/11/2023]
Abstract
As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Here we present a first look at RNA sequence conservation and structural propensities in the SARS-CoV-2 genome. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nt as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences subsequently reported from the COVID-19 outbreak, and we present a curated list of 30 "SARS-related-conserved" regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 "SARS-CoV-2-conserved-structured" regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the extended 5' UTR, frameshifting stimulation element, and 3' UTR. Lastly, we predict regions of the SARS-CoV-2 viral genome that have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 "SARS-CoV-2-conserved-unstructured" genomic regions may be most easily accessible by hybridization in primer-based diagnostic strategies.
Collapse
Affiliation(s)
- Ramya Rangan
- Biophysics Program, Stanford University, Stanford, California 94305, USA
| | - Ivan N Zheludev
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Rachel J Hagey
- Departments of Medicine (Division of Gastroenterology and Hepatology) and Microbiology & Immunology, Stanford School of Medicine, Stanford, California 94305, USA
| | - Edward A Pham
- Departments of Medicine (Division of Gastroenterology and Hepatology) and Microbiology & Immunology, Stanford School of Medicine, Stanford, California 94305, USA
| | | | - Jeffrey S Glenn
- Departments of Medicine (Division of Gastroenterology and Hepatology) and Microbiology & Immunology, Stanford School of Medicine, Stanford, California 94305, USA
- Palo Alto Veterans Administration, Palo Alto, California 94304, USA
| | - Rhiju Das
- Biophysics Program, Stanford University, Stanford, California 94305, USA
- Department of Biochemistry, Stanford University School of Medicine, Stanford, California 94305, USA
- Department of Physics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
6
|
Libin PJK, Deforche K, Abecasis AB, Theys K. VIRULIGN: fast codon-correct alignment and annotation of viral genomes. Bioinformatics 2020; 35:1763-1765. [PMID: 30295730 PMCID: PMC6513156 DOI: 10.1093/bioinformatics/bty851] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 09/24/2018] [Accepted: 10/05/2018] [Indexed: 12/11/2022] Open
Abstract
Summary Virus sequence data are an essential resource for reconstructing spatiotemporal dynamics of viral spread as well as to inform treatment and prevention strategies. However, the potential benefit of these applications critically depends on accurate and correctly annotated alignments of genetically heterogeneous data. VIRULIGN was built for fast codon-correct alignments of large datasets, with standardized and formalized genome annotation and various alignment export formats. Availability and implementation VIRULIGN is freely available at https://github.com/rega-cev/virulign as an open source software project. Supplementary information Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Pieter J K Libin
- KU Leuven, Rega Institute for Medical, Laboratorium of Clinical and Evolutionary Virology, Leuven, Belgium.,Artificial Intelligence Lab, Department of Computer Science, Vrije Universiteit Brussel, Brussels, Belgium
| | | | - Ana B Abecasis
- Center for Global Health and Tropical Medicine, Institute for Hygiene and Tropical Medicine, Lisboa, Portugal
| | - Kristof Theys
- KU Leuven, Rega Institute for Medical, Laboratorium of Clinical and Evolutionary Virology, Leuven, Belgium
| |
Collapse
|