1
|
Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
|
2
|
Structural and genetic diversity in the secreted mucins, MUC5AC and MUC5B. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585560. [PMID: 38562829 PMCID: PMC10983947 DOI: 10.1101/2024.03.18.585560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.
Collapse
|
3
|
Structurally divergent and recurrently mutated regions of primate genomes. Cell 2024; 187:1547-1562.e13. [PMID: 38428424 PMCID: PMC10947866 DOI: 10.1016/j.cell.2024.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/26/2023] [Accepted: 01/31/2024] [Indexed: 03/03/2024]
Abstract
We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
Collapse
|
4
|
Complete chromosome 21 centromere sequences from a Down syndrome family reveal size asymmetry and differences in kinetochore attachment. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.25.581464. [PMID: 38464314 PMCID: PMC10925182 DOI: 10.1101/2024.02.25.581464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Down syndrome is the most common form of human intellectual disability caused by precocious segregation and nondisjunction of chromosome 21. Differences in centromere structure have been hypothesized to play a potential role in this process in addition to the well-established risk of advancing maternal age. Using long-read sequencing, we completely sequenced and assembled the centromeres from a parent-child trio where Trisomy 21 arose in the child as a result of a meiosis I error. The proband carries three distinct chromosome 21 centromere haplotypes that vary by 11-fold in length--both the largest (H1) and smallest (H2) originating from the mother. The longest H1 allele harbors a less clearly defined centromere dip region (CDR) as defined by CpG methylation and a significantly reduced signal by CENP-A chromatin immunoprecipitation sequencing when compared to H2 or paternal H3 centromeres. These epigenetic signatures suggest less competent kinetochore attachment for the maternally transmitted H1. Analysis of H1 in the mother indicates that the reduced CENP-A ChIP-seq signal, but not the CDR profile, pre-existed the meiotic nondisjunction event. A comparison of the three proband centromeres to a population sampling of 35 completely sequenced chromosome 21 centromeres shows that H2 is the smallest centromere sequenced to date and all three haplotypes (H1-H3) share a common origin of ~15 thousand years ago. These results suggest that recent asymmetry in size and epigenetic differences of chromosome 21 centromeres may contribute to nondisjunction risk.
Collapse
|
5
|
Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall. Genome Res 2023; 33:2029-2040. [PMID: 38190646 PMCID: PMC10760522 DOI: 10.1101/gr.278070.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/03/2023] [Indexed: 01/10/2024]
Abstract
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
|
6
|
The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
|
7
|
Common lizard microhabitat selection varies by sex, parity mode, and colouration. BMC Ecol Evol 2023; 23:47. [PMID: 37667183 PMCID: PMC10478496 DOI: 10.1186/s12862-023-02158-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 08/22/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND Animals select and interact with their environment in various ways, including to ensure their physiology is at its optimal capacity, access to prey is possible, and predators can be avoided. Often conflicting, the balance of choices made may vary depending on an individual's life-history and condition. The common lizard (Zootoca vivipara) has egg-laying and live-bearing lineages and displays a variety of dorsal patterns and colouration. How colouration and reproductive mode affect habitat selection decisions on the landscape is not known. In this study, we first tested if co-occurring male and female viviparous and oviparous common lizards differ in their microhabitat selection. Second, we tested if the dorsal colouration of an individual lizard matched its basking site choice within the microhabitat where it was encountered, which could be related to camouflage and crypsis. RESULTS We found that site use differed from the habitat otherwise available, suggesting lizards actively choose the composition and structure of their microhabitat. Females were found in areas with more wood and less bare ground compared to males; we speculate that this may be for better camouflage and reducing predation risk during pregnancy, when females are less mobile. Microhabitat use also differed by parity mode: viviparous lizards were found in areas with more density of flowering plants, while oviparous lizards were found in areas that were wetter and had more moss. This may relate to differing habitat preferences of viviparous vs. oviparous for clutch lay sites. We found that an individual's dorsal colouration matched that of the substrate of its basking site. This could indicate that individuals may choose their basking site to optimise camouflage within microhabitat. Further, all individuals were found basking in areas close to cover, which we expect could be used to escape predation. CONCLUSIONS Our study suggests that common lizards may actively choose their microhabitat and basking site, balancing physiological requirements, escape response and camouflage as a tactic for predator avoidance. This varies for parity modes, sexes, and dorsal colourations, suggesting that individual optimisation strategies are influenced by inter-individual variation within populations as well as determined by evolutionary differences associated with life history.
Collapse
|
8
|
Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
|
9
|
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
|
10
|
Characterization of large-scale genomic differences in the first complete human genome. Genome Biol 2023; 24:157. [PMID: 37403156 PMCID: PMC10320979 DOI: 10.1186/s13059-023-02995-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/23/2023] [Indexed: 07/06/2023] Open
Abstract
BACKGROUND The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.
Collapse
|
11
|
The SARS-CoV-2 Spike Protein Mutation Explorer: using an interactive application to improve the public understanding of SARS-CoV-2 variants of concern. J Vis Commun Med 2023; 46:122-132. [PMID: 37526402 PMCID: PMC10726978 DOI: 10.1080/17453054.2023.2237087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 06/23/2023] [Indexed: 08/02/2023]
Abstract
Due to the COVID-19 pandemic the virus responsible, SARS-CoV-2, became a source of intense interest for non-expert audiences. The viral spike protein gained particular public interest as the main target for protective immune responses, including those elicited by vaccines. The rapid evolution of SARS-CoV-2 resulted in variations in the spike that enhanced transmissibility or weakened vaccine protection. This created new variants of concern (VOCs). The emergence of VOCs was studied using viral sequence data which was shared through portals such as the online Mutation Explorer of the COVID-19 Genomics UK consortium (COG-UK/ME). This was designed for an expert audience, but the information it contained could be of general interest if suitably communicated. Visualisations, interactivity and animation can improve engagement and understanding of molecular biology topics, and so we developed a graphical educational resource, the SARS-CoV-2 Spike Protein Mutation Explorer (SSPME), which used interactive 3D molecular models and animations to explain the molecular biology underpinning VOCs. User testing showed that the SSPME had better usability and improved participant knowledge confidence and knowledge acquisition compared to COG-UK/ME. This demonstrates how interactive visualisations can be used for effective molecular biology communication, as well as improving the public understanding of SARS-CoV-2 VOCs.
Collapse
|
12
|
Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539448. [PMID: 37205567 PMCID: PMC10187267 DOI: 10.1101/2023.05.04.539448] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
|
13
|
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
|
14
|
Increased mutation and gene conversion within human segmental duplications. Nature 2023; 617:325-334. [PMID: 37165237 PMCID: PMC10172114 DOI: 10.1038/s41586-023-05895-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.
Collapse
|
15
|
Abstract
The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.
Collapse
|
16
|
Gaps and complex structurally variant loci in phased genome assemblies. Genome Res 2023; 33:496-510. [PMID: 37164484 PMCID: PMC10234299 DOI: 10.1101/gr.277334.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 12/07/2022] [Indexed: 05/12/2023]
Abstract
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
Collapse
|
17
|
A Bayesian approach to incorporate structural data into the mapping of genotype to antigenic phenotype of influenza A(H3N2) viruses. PLoS Comput Biol 2023; 19:e1010885. [PMID: 36972311 PMCID: PMC10079231 DOI: 10.1371/journal.pcbi.1010885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/06/2023] [Accepted: 01/20/2023] [Indexed: 03/29/2023] Open
Abstract
Surface antigens of pathogens are commonly targeted by vaccine-elicited antibodies but antigenic variability, notably in RNA viruses such as influenza, HIV and SARS-CoV-2, pose challenges for control by vaccination. For example, influenza A(H3N2) entered the human population in 1968 causing a pandemic and has since been monitored, along with other seasonal influenza viruses, for the emergence of antigenic drift variants through intensive global surveillance and laboratory characterisation. Statistical models of the relationship between genetic differences among viruses and their antigenic similarity provide useful information to inform vaccine development, though accurate identification of causative mutations is complicated by highly correlated genetic signals that arise due to the evolutionary process. Here, using a sparse hierarchical Bayesian analogue of an experimentally validated model for integrating genetic and antigenic data, we identify the genetic changes in influenza A(H3N2) virus that underpin antigenic drift. We show that incorporating protein structural data into variable selection helps resolve ambiguities arising due to correlated signals, with the proportion of variables representing haemagglutinin positions decisively included, or excluded, increased from 59.8% to 72.4%. The accuracy of variable selection judged by proximity to experimentally determined antigenic sites was improved simultaneously. Structure-guided variable selection thus improves confidence in the identification of genetic explanations of antigenic variation and we also show that prioritising the identification of causative mutations is not detrimental to the predictive capability of the analysis. Indeed, incorporating structural information into variable selection resulted in a model that could more accurately predict antigenic assay titres for phenotypically-uncharacterised virus from genetic sequence. Combined, these analyses have the potential to inform choices of reference viruses, the targeting of laboratory assays, and predictions of the evolutionary success of different genotypes, and can therefore be used to inform vaccine selection processes.
Collapse
|
18
|
Structurally divergent and recurrently mutated regions of primate genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531415. [PMID: 36945442 PMCID: PMC10028934 DOI: 10.1101/2023.03.07.531415] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.
Collapse
|
19
|
Abstract
In late 2020, after circulating for almost a year in the human population, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exhibited a major step change in its adaptation to humans. These highly mutated forms of SARS-CoV-2 had enhanced rates of transmission relative to previous variants and were termed 'variants of concern' (VOCs). Designated Alpha, Beta, Gamma, Delta and Omicron, the VOCs emerged independently from one another, and in turn each rapidly became dominant, regionally or globally, outcompeting previous variants. The success of each VOC relative to the previously dominant variant was enabled by altered intrinsic functional properties of the virus and, to various degrees, changes to virus antigenicity conferring the ability to evade a primed immune response. The increased virus fitness associated with VOCs is the result of a complex interplay of virus biology in the context of changing human immunity due to both vaccination and prior infection. In this Review, we summarize the literature on the relative transmissibility and antigenicity of SARS-CoV-2 variants, the role of mutations at the furin spike cleavage site and of non-spike proteins, the potential importance of recombination to virus success, and SARS-CoV-2 evolution in the context of T cells, innate immunity and population immunity. SARS-CoV-2 shows a complicated relationship among virus antigenicity, transmission and virulence, which has unpredictable implications for the future trajectory and disease burden of COVID-19.
Collapse
|
20
|
Abstract
Monoclonal antibodies (mAbs) offer a treatment option for individuals with severe COVID-19 and are especially important in high-risk individuals where vaccination is not an option. Given the importance of understanding the evolution of resistance to mAbs by SARS-CoV-2, we reviewed the available in vitro neutralization data for mAbs against live variants and viral constructs containing spike mutations of interest. Unfortunately, evasion of mAb-induced protection is being reported with new SARS-CoV-2 variants. The magnitude of neutralization reduction varied greatly among mAb-variant pairs. For example, sotrovimab retained its neutralization capacity against Omicron BA.1 but showed reduced efficacy against BA.2, BA.4 and BA.5, and BA.2.12.1. At present, only bebtelovimab has been reported to retain its efficacy against all SARS-CoV-2 variants considered here. Resistance to mAb neutralization was dominated by the action of epitope single amino acid substitutions in the spike protein. Although not all observed epitope mutations result in increased mAb evasion, amino acid substitutions at non-epitope positions and combinations of mutations also contribute to evasion of neutralization. This Review highlights the implications for the rational design of viral genomic surveillance and factors to consider for the development of novel mAb therapies.
Collapse
|
21
|
SARS-CoV-2 Evolution and Patient Immunological History Shape the Breadth and Potency of Antibody-Mediated Immunity. J Infect Dis 2022; 227:40-49. [PMID: 35920058 PMCID: PMC9384671 DOI: 10.1093/infdis/jiac332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/28/2022] [Accepted: 08/01/2022] [Indexed: 01/19/2023] Open
Abstract
Since the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), humans have been exposed to distinct SARS-CoV-2 antigens, either by infection with different variants, and/or vaccination. Population immunity is thus highly heterogeneous, but the impact of such heterogeneity on the effectiveness and breadth of the antibody-mediated response is unclear. We measured antibody-mediated neutralization responses against SARS-CoV-2Wuhan, SARS-CoV-2α, SARS-CoV-2δ, and SARS-CoV-2ο pseudoviruses using sera from patients with distinct immunological histories, including naive, vaccinated, infected with SARS-CoV-2Wuhan, SARS-CoV-2α, or SARS-CoV-2δ, and vaccinated/infected individuals. We show that the breadth and potency of the antibody-mediated response is influenced by the number, the variant, and the nature (infection or vaccination) of exposures, and that individuals with mixed immunity acquired by vaccination and natural exposure exhibit the broadest and most potent responses. Our results suggest that the interplay between host immunity and SARS-CoV-2 evolution will shape the antigenicity and subsequent transmission dynamics of SARS-CoV-2, with important implications for future vaccine design.
Collapse
|
22
|
|
23
|
Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat Genet 2022; 54:1305-1319. [PMID: 35982159 PMCID: PMC9470534 DOI: 10.1038/s41588-022-01148-2] [Citation(s) in RCA: 99] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 06/28/2022] [Indexed: 12/16/2022]
Abstract
To capture the full spectrum of genetic risk for autism, we performed a two-stage analysis of rare de novo and inherited coding variants in 42,607 autism cases, including 35,130 new cases recruited online by SPARK. We identified 60 genes with exome-wide significance (P < 2.5 × 10-6), including five new risk genes (NAV3, ITSN1, MARK2, SCAF1 and HNRNPUL2). The association of NAV3 with autism risk is primarily driven by rare inherited loss-of-function (LoF) variants, with an estimated relative risk of 4, consistent with moderate effect. Autistic individuals with LoF variants in the four moderate-risk genes (NAV3, ITSN1, SCAF1 and HNRNPUL2; n = 95) have less cognitive impairment than 129 autistic individuals with LoF variants in highly penetrant genes (CHD8, SCN2A, ADNP, FOXP1 and SHANK3) (59% vs 88%, P = 1.9 × 10-6). Power calculations suggest that much larger numbers of autism cases are needed to identify additional moderate-risk genes.
Collapse
|
24
|
SARS-CoV-2 Omicron is an immune escape variant with an altered cell entry pathway. Nat Microbiol 2022; 7:1161-1179. [PMID: 35798890 PMCID: PMC9352574 DOI: 10.1038/s41564-022-01143-7] [Citation(s) in RCA: 274] [Impact Index Per Article: 137.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 05/03/2022] [Indexed: 12/12/2022]
Abstract
Vaccines based on the spike protein of SARS-CoV-2 are a cornerstone of the public health response to COVID-19. The emergence of hypermutated, increasingly transmissible variants of concern (VOCs) threaten this strategy. Omicron (B.1.1.529), the fifth VOC to be described, harbours multiple amino acid mutations in spike, half of which lie within the receptor-binding domain. Here we demonstrate substantial evasion of neutralization by Omicron BA.1 and BA.2 variants in vitro using sera from individuals vaccinated with ChAdOx1, BNT162b2 and mRNA-1273. These data were mirrored by a substantial reduction in real-world vaccine effectiveness that was partially restored by booster vaccination. The Omicron variants BA.1 and BA.2 did not induce cell syncytia in vitro and favoured a TMPRSS2-independent endosomal entry pathway, these phenotypes mapping to distinct regions of the spike protein. Impaired cell fusion was determined by the receptor-binding domain, while endosomal entry mapped to the S2 domain. Such marked changes in antigenicity and replicative biology may underlie the rapid global spread and altered pathogenicity of the Omicron variant.
Collapse
|
25
|
Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 2022; 185:1986-2005.e26. [PMID: 35525246 PMCID: PMC9563103 DOI: 10.1016/j.cell.2022.04.017] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/14/2022] [Accepted: 04/08/2022] [Indexed: 12/13/2022]
Abstract
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.
Collapse
|
26
|
Familial long-read sequencing increases yield of de novo mutations. Am J Hum Genet 2022; 109:631-646. [PMID: 35290762 DOI: 10.1016/j.ajhg.2022.02.014] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 02/16/2022] [Indexed: 12/11/2022] Open
Abstract
Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.
Collapse
|
27
|
Abstract
Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.
Collapse
|
28
|
Tracking SARS-CoV-2 mutations and variants through the COG-UK-Mutation Explorer. Virus Evol 2022; 8:veac023. [PMID: 35502202 PMCID: PMC9037374 DOI: 10.1093/ve/veac023] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/03/2022] [Accepted: 03/17/2022] [Indexed: 11/13/2022] Open
Abstract
COG-UK Mutation Explorer (COG-UK-ME, https://sars2.cvr.gla.ac.uk/cog-uk/-last accessed date 16 March 2022) is a web resource that displays knowledge and analyses on SARS-CoV-2 virus genome mutations and variants circulating in the UK, with a focus on the observed amino acid replacements that have an antigenic role in the context of the human humoral and cellular immune response. This analysis is based on more than 2 million genome sequences (as of March 2022) for UK SARS-CoV-2 data held in the CLIMB-COVID centralised data environment. COG-UK-ME curates these data and displays analyses that are cross-referenced to experimental data collated from the primary literature. The aim is to track mutations of immunological importance that are accumulating in current variants of concern and variants of interest that could alter the neutralising activity of monoclonal antibodies (mAbs), convalescent sera, and vaccines. Changes in epitopes recognised by T cells, including those where reduced T cell binding has been demonstrated, are reported. Mutations that have been shown to confer SARS-CoV-2 resistance to antiviral drugs are also included. Using visualisation tools, COG-UK-ME also allows users to identify the emergence of variants carrying mutations that could decrease the neutralising activity of both mAbs present in therapeutic cocktails, e.g. Ronapreve. COG-UK-ME tracks changes in the frequency of combinations of mutations and brings together the curated literature on the impact of those mutations on various functional aspects of the virus and therapeutics. Given the unpredictable nature of SARS-CoV-2 as exemplified by yet another variant of concern, Omicron, continued surveillance of SARS-CoV-2 remains imperative to monitor virus evolution linked to the efficacy of therapeutics.
Collapse
|
29
|
Mutations that adapt SARS-CoV-2 to mink or ferret do not increase fitness in the human airway. Cell Rep 2022; 38:110344. [PMID: 35093235 PMCID: PMC8768428 DOI: 10.1016/j.celrep.2022.110344] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/11/2021] [Accepted: 01/14/2022] [Indexed: 12/18/2022] Open
Abstract
SARS-CoV-2 has a broad mammalian species tropism infecting humans, cats, dogs, and farmed mink. Since the start of the 2019 pandemic, several reverse zoonotic outbreaks of SARS-CoV-2 have occurred in mink, one of which reinfected humans and caused a cluster of infections in Denmark. Here we investigate the molecular basis of mink and ferret adaptation and demonstrate the spike mutations Y453F, F486L, and N501T all specifically adapt SARS-CoV-2 to use mustelid ACE2. Furthermore, we risk assess these mutations and conclude mink-adapted viruses are unlikely to pose an increased threat to humans, as Y453F attenuates the virus replication in human cells and all three mink adaptations have minimal antigenic impact. Finally, we show that certain SARS-CoV-2 variants emerging from circulation in humans may naturally have a greater propensity to infect mustelid hosts and therefore these species should continue to be surveyed for reverse zoonotic infections.
Collapse
|
30
|
Population genomics of Bacillus anthracis from an anthrax hyperendemic area reveals transmission processes across spatial scales and unexpected within-host diversity. Microb Genom 2022; 8:000759. [PMID: 35188453 PMCID: PMC8942019 DOI: 10.1099/mgen.0.000759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 12/10/2021] [Indexed: 11/18/2022] Open
Abstract
Genomic sequencing has revolutionized our understanding of bacterial disease epidemiology, but remains underutilized for zoonotic pathogens in remote endemic settings. Anthrax, caused by the spore-forming bacterium Bacillus anthracis, remains a threat to human and animal health and rural livelihoods in low- and middle-income countries. While the global genomic diversity of B. anthracis has been well-characterized, there is limited information on how its populations are genetically structured at the scale at which transmission occurs, critical for understanding the pathogen's evolution and transmission dynamics. Using a uniquely rich dataset, we quantified genome-wide SNPs among 73 B. anthracis isolates derived from 33 livestock carcasses sampled over 1 year throughout the Ngorongoro Conservation Area, Tanzania, a region hyperendemic for anthrax. Genome-wide SNPs distinguished 22 unique B. anthracis genotypes (i.e. SNP profiles) within the study area. However, phylogeographical structure was lacking, as identical SNP profiles were found throughout the study area, likely the result of the long and variable periods of spore dormancy and long-distance livestock movements. Significantly, divergent genotypes were obtained from spatio-temporally linked cases and even individual carcasses. The high number of SNPs distinguishing isolates from the same host is unlikely to have arisen during infection, as supported by our simulation models. This points to an unexpectedly wide transmission bottleneck for B. anthracis, with an inoculum comprising multiple variants being the norm. Our work highlights that inferring transmission patterns of B. anthracis from genomic data will require analytical approaches that account for extended and variable environmental persistence, as well as co-infection.
Collapse
|
31
|
Reduced neutralisation of the Delta (B.1.617.2) SARS-CoV-2 variant of concern following vaccination. PLoS Pathog 2021; 17:e1010022. [PMID: 34855916 PMCID: PMC8639073 DOI: 10.1371/journal.ppat.1010022] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/10/2021] [Indexed: 11/20/2022] Open
Abstract
Vaccines are proving to be highly effective in controlling hospitalisation and deaths associated with SARS-CoV-2 infection but the emergence of viral variants with novel antigenic profiles threatens to diminish their efficacy. Assessment of the ability of sera from vaccine recipients to neutralise SARS-CoV-2 variants will inform the success of strategies for minimising COVID19 cases and the design of effective antigenic formulations. Here, we examine the sensitivity of variants of concern (VOCs) representative of the B.1.617.1 and B.1.617.2 (first associated with infections in India) and B.1.351 (first associated with infection in South Africa) lineages of SARS-CoV-2 to neutralisation by sera from individuals vaccinated with the BNT162b2 (Pfizer/BioNTech) and ChAdOx1 (Oxford/AstraZeneca) vaccines. Across all vaccinated individuals, the spike glycoproteins from B.1.617.1 and B.1.617.2 conferred reductions in neutralisation of 4.31 and 5.11-fold respectively. The reduction seen with the B.1.617.2 lineage approached that conferred by the glycoprotein from B.1.351 (South African) variant (6.29-fold reduction) that is known to be associated with reduced vaccine efficacy. Neutralising antibody titres elicited by vaccination with two doses of BNT162b2 were significantly higher than those elicited by vaccination with two doses of ChAdOx1. Fold decreases in the magnitude of neutralisation titre following two doses of BNT162b2, conferred reductions in titre of 7.77, 11.30 and 9.56-fold respectively to B.1.617.1, B.1.617.2 and B.1.351 pseudoviruses, the reduction in neutralisation of the delta variant B.1.617.2 surpassing that of B.1.351. Fold changes in those vaccinated with two doses of ChAdOx1 were 0.69, 4.01 and 1.48 respectively. The accumulation of mutations in these VOCs, and others, demonstrate the quantifiable risk of antigenic drift and subsequent reduction in vaccine efficacy. Accordingly, booster vaccines based on updated variants are likely to be required over time to prevent productive infection. This study also suggests that two dose regimes of vaccine are required for maximal BNT162b2 and ChAdOx1-induced immunity.
Collapse
|
32
|
Abstract
Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of 'variants of concern', that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.
Collapse
|
33
|
Abstract
Although most mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome are expected to be either deleterious and swiftly purged or relatively neutral, a small proportion will affect functional properties and may alter infectivity, disease severity or interactions with host immunity. The emergence of SARS-CoV-2 in late 2019 was followed by a period of relative evolutionary stasis lasting about 11 months. Since late 2020, however, SARS-CoV-2 evolution has been characterized by the emergence of sets of mutations, in the context of 'variants of concern', that impact virus characteristics, including transmissibility and antigenicity, probably in response to the changing immune profile of the human population. There is emerging evidence of reduced neutralization of some SARS-CoV-2 variants by postvaccination serum; however, a greater understanding of correlates of protection is required to evaluate how this may impact vaccine effectiveness. Nonetheless, manufacturers are preparing platforms for a possible update of vaccine sequences, and it is crucial that surveillance of genetic and antigenic changes in the global virus population is done alongside experiments to elucidate the phenotypic impacts of mutations. In this Review, we summarize the literature on mutations of the SARS-CoV-2 spike protein, the primary antigen, focusing on their impacts on antigenicity and contextualizing them in the protein structure, and discuss them in the context of observed mutation frequencies in global sequence datasets.
Collapse
|
34
|
Abstract
We report severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike ΔH69/V70 in multiple independent lineages, often occurring after acquisition of receptor binding motif replacements such as N439K and Y453F, known to increase binding affinity to the ACE2 receptor and confer antibody escape. In vitro, we show that, although ΔH69/V70 itself is not an antibody evasion mechanism, it increases infectivity associated with enhanced incorporation of cleaved spike into virions. ΔH69/V70 is able to partially rescue infectivity of spike proteins that have acquired N439K and Y453F escape mutations by increased spike incorporation. In addition, replacement of the H69 and V70 residues in the Alpha variant B.1.1.7 spike (where ΔH69/V70 occurs naturally) impairs spike incorporation and entry efficiency of the B.1.1.7 spike pseudotyped virus. Alpha variant B.1.1.7 spike mediates faster kinetics of cell-cell fusion than wild-type Wuhan-1 D614G, dependent on ΔH69/V70. Therefore, as ΔH69/V70 compensates for immune escape mutations that impair infectivity, continued surveillance for deletions with functional effects is warranted.
Collapse
|
35
|
Recurrent emergence of SARS-CoV-2 spike deletion H69/V70 and its role in the Alpha variant B.1.1.7. Cell Rep 2021; 35:109292. [PMID: 34166617 PMCID: PMC8185188 DOI: 10.1016/j.celrep.2021.109292] [Citation(s) in RCA: 284] [Impact Index Per Article: 94.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/29/2021] [Accepted: 06/02/2021] [Indexed: 12/23/2022] Open
Abstract
We report severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike ΔH69/V70 in multiple independent lineages, often occurring after acquisition of receptor binding motif replacements such as N439K and Y453F, known to increase binding affinity to the ACE2 receptor and confer antibody escape. In vitro, we show that, although ΔH69/V70 itself is not an antibody evasion mechanism, it increases infectivity associated with enhanced incorporation of cleaved spike into virions. ΔH69/V70 is able to partially rescue infectivity of spike proteins that have acquired N439K and Y453F escape mutations by increased spike incorporation. In addition, replacement of the H69 and V70 residues in the Alpha variant B.1.1.7 spike (where ΔH69/V70 occurs naturally) impairs spike incorporation and entry efficiency of the B.1.1.7 spike pseudotyped virus. Alpha variant B.1.1.7 spike mediates faster kinetics of cell-cell fusion than wild-type Wuhan-1 D614G, dependent on ΔH69/V70. Therefore, as ΔH69/V70 compensates for immune escape mutations that impair infectivity, continued surveillance for deletions with functional effects is warranted.
Collapse
|
36
|
A high-quality bonobo genome refines the analysis of hominid evolution. Nature 2021; 594:77-81. [PMID: 33953399 PMCID: PMC8172381 DOI: 10.1038/s41586-021-03519-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 04/07/2021] [Indexed: 12/17/2022]
Abstract
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3–5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome. A high-quality bonobo genome assembly provides insights into incomplete lineage sorting in hominids and its relevance to gene evolution and the genetic relationship among living hominids.
Collapse
|
37
|
Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 270] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
|
38
|
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat Biotechnol 2021; 39:302-308. [PMID: 33288906 PMCID: PMC7954704 DOI: 10.1038/s41587-020-0719-5] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 09/16/2020] [Indexed: 12/18/2022]
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Collapse
|
39
|
Spatiotemporal reconstruction and transmission dynamics during the 2016-17 H5N8 highly pathogenic avian influenza epidemic in Italy. Transbound Emerg Dis 2021; 68:37-50. [PMID: 31788978 PMCID: PMC8048528 DOI: 10.1111/tbed.13420] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 10/03/2019] [Accepted: 10/29/2019] [Indexed: 11/29/2022]
Abstract
Effective control of avian diseases in domestic populations requires understanding of the transmission dynamics facilitating viral emergence and spread. In 2016-17, Italy experienced a significant avian influenza epidemic caused by a highly pathogenic A(H5N8) virus, which affected domestic premises housing around 2.7 million birds, primarily in the north-eastern regions with the highest density of poultry farms (Lombardy, Emilia-Romagna and Veneto). We perform integrated analyses of genetic, spatiotemporal and host data within a Bayesian phylogenetic framework. Using continuous and discrete phylogeography, we estimate the locations of movements responsible for the spread and persistence of the epidemic. The information derived from these analyses on rates of transmission between regions through time can be used to assess the success of control measures. Using an approach based on phylogenetic-temporal distances between domestic cases, we infer the presence of cryptic wild bird-mediated transmission, information that can be used to complement existing epidemiological methods for distinguishing transmission within the domestic population from incursions across the wildlife-domestic interface, a common challenge in veterinary epidemiology. Spatiotemporal reconstruction of the epidemic reveals a highly skewed distribution of virus movements with a high proportion of shorter distance local movements interspersed with occasional long-distance dispersal events associated with wild birds. We also show how such inference be used to identify possible instances of human-mediated movements where distances between phylogenetically linked domestic cases are unusually high.
Collapse
|
40
|
Different environmental gradients associated to the spatiotemporal and genetic pattern of the H5N8 highly pathogenic avian influenza outbreaks in poultry in Italy. Transbound Emerg Dis 2021; 68:152-167. [PMID: 32613724 PMCID: PMC8048857 DOI: 10.1111/tbed.13661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 05/28/2020] [Accepted: 05/28/2020] [Indexed: 10/29/2022]
Abstract
Comprehensive understanding of the patterns and drivers of avian influenza outbreaks is pivotal to inform surveillance systems and heighten nations' ability to quickly detect and respond to the emergence of novel viruses. Starting in early 2017, the Italian poultry sector has been involved in the massive H5N8 highly pathogenic avian influenza epidemic that spread in the majority of the European countries in 2016/2017. Eighty-three outbreaks were recorded in north-eastern Italy, where a densely populated poultry area stretches along the Lombardy, Emilia-Romagna and Veneto regions. The confirmed cases, affecting both the rural and industrial sectors, depicted two distinct epidemic waves. We adopted a combination of multivariate statistics techniques and multi-model regression selection and inference, to investigate how environmental factors relate to the pattern of outbreaks diversity with respect to their spatiotemporal and genetic diversity. Results showed that a combination of eco-climatic and host density predictors were associated with the outbreaks pattern, and variation along gradients was noticeable among genetically and geographically distinct groups of avian influenza cases. These regional contrasts may be indicative of a different mechanism driving the introduction and spreading routes of the influenza virus in the domestic poultry population. This methodological approach may be extended to different spatiotemporal scale to foster site-specific, ecologically informed risk mitigating strategies.
Collapse
|
41
|
Genetic Basis of Antigenic Variation of SAT3 Foot-And-Mouth Disease Viruses in Southern Africa. Front Vet Sci 2020; 7:568. [PMID: 33102544 PMCID: PMC7506032 DOI: 10.3389/fvets.2020.00568] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 07/16/2020] [Indexed: 11/13/2022] Open
Abstract
Foot-and-mouth disease (FMD) continues to be a major burden for livestock owners in endemic countries and a continuous threat to FMD-free countries. The epidemiology and control of FMD in Africa is complicated by the presence of five clinically indistinguishable serotypes. Of these the Southern African Territories (SAT) type 3 has received limited attention, likely due to its restricted distribution and it being less frequently detected. We investigated the intratypic genetic variation of the complete P1 capsid-coding region of 22 SAT3 viruses and confirmed the geographical distribution of five of the six SAT3 topotypes. The antigenic cross-reactivity of 12 SAT3 viruses against reference antisera was assessed by performing virus neutralization assays and calculating the r1-values, which is a ratio of the heterologous neutralizing titer to the homologous neutralizing titer. Interestingly, cross-reactivity between the SAT3 reference antisera and many SAT3 viruses was notably high (r1-values >0.3). Moreover, some of the SAT3 viruses reacted more strongly to the reference sera compared to the homologous virus (r1-values >1). An increase in the avidity of the reference antisera to the heterologous viruses could explain some of the higher neutralization titers observed. Subsequently, we used the antigenic variability data and corresponding genetic and structural data to predict naturally occurring amino acid positions that correlate with antigenic changes. We identified four unique residues within the VP1, VP2, and VP3 proteins, associated with a change in cross-reactivity, with two sites that change simultaneously. The analysis of antigenic variation in the context of sequence differences is critical for both surveillance-informed selection of effective vaccines and the rational design of vaccine antigens tailored for specific geographic localities, using reverse genetics.
Collapse
|
42
|
Genomic and Immunogenic Protein Diversity of Erysipelothrix rhusiopathiae Isolated From Pigs in Great Britain: Implications for Vaccine Protection. Front Microbiol 2020; 11:418. [PMID: 32231655 PMCID: PMC7083082 DOI: 10.3389/fmicb.2020.00418] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 02/27/2020] [Indexed: 12/30/2022] Open
Abstract
Erysipelas, caused by the bacterium Erysipelothrix rhusiopathiae, is re-emerging in swine and poultry production systems worldwide. While the global genomic diversity of this species has been characterized, how much of this genomic and functional diversity is maintained at smaller scales is unclear. Specifically, while several key immunogenic surface proteins have been identified for E. rhusiopathiae, little is known about their presence among field strains and their divergence from vaccines, which could result in vaccine failure. Here, a comparative genomics approach was taken to determine the diversity of E. rhusiopathiae strains in pigs in Great Britain over nearly three decades, as well as to assess the field strains’ divergence from the vaccine strain most commonly used in British pigs. In addition, the presence/absence and variability of 13 previously described immunogenic surface proteins was determined, including SpaA which is considered a key immunogen. We found a high diversity of E. rhusiopathiae strains in British pigs, similar to the situation described in European poultry but in contrast to swine production systems in Asia. Of the four clades of E. rhusiopathiae found globally, three were represented among British pig isolates, with Clade 2 being the most common. All British pig isolates had one amino acid difference in the immunoprotective domain of the SpaA protein compared to the vaccine strain. However, we were able to confirm using in silico structural protein analyses that this difference is unlikely to compromise vaccine protection. Of 12 other known immunogenic surface proteins of E. rhusiopathiae examined, 11 were found to be present in all British pig isolates and the vaccine strain, but with highly variable degrees of conservation at the amino acid sequence level, ranging from 0.3 to 27% variant positions. Moreover, the phylogenetic incongruence of these proteins suggests that horizontal transfer of genes encoding for antigens is commonplace for this bacterium. We hypothesize that the sequence variants in these proteins could be responsible for differences in the efficacy of the immune response. Our results provide the necessary basis for testing this hypothesis through in vitro and in vivo studies.
Collapse
|
43
|
The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level. Nucleic Acids Res 2019; 46:W282-W288. [PMID: 29905870 PMCID: PMC6031002 DOI: 10.1093/nar/gky467] [Citation(s) in RCA: 304] [Impact Index Per Article: 60.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 05/24/2018] [Indexed: 12/05/2022] Open
Abstract
The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called ‘Clade Project’) to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.
Collapse
|
44
|
Improving the identification of antigenic sites in the H1N1 influenza virus through accounting for the experimental structure in a sparse hierarchical Bayesian model. J R Stat Soc Ser C Appl Stat 2019; 68:859-885. [PMID: 31598013 PMCID: PMC6774336 DOI: 10.1111/rssc.12338] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Understanding how genetic changes allow emerging virus strains to escape the protection afforded by vaccination is vital for the maintenance of effective vaccines. We use structural and phylogenetic differences between pairs of virus strains to identify important antigenic sites on the surface of the influenza A(H1N1) virus through the prediction of haemagglutination inhibition (HI) titre: pairwise measures of the antigenic similarity of virus strains. We propose a sparse hierarchical Bayesian model that can deal with the pairwise structure and inherent experimental variability in the H1N1 data through the introduction of latent variables. The latent variables represent the underlying HI titre measurement of any given pair of virus strains and help to account for the fact that, for any HI titre measurement between the same pair of virus strains, the difference in the viral sequence remains the same. Through accurately representing the structure of the H1N1 data, the model can select virus sites which are antigenic, while its latent structure achieves the computational efficiency that is required to deal with large virus sequence data, as typically available for the influenza virus. In addition to the latent variable model, we also propose a new method, the block‐integrated widely applicable information criterion biWAIC, for selecting between competing models. We show how this enables us to select the random effects effectively when used with the model proposed and we apply both methods to an A(H1N1) data set.
Collapse
|
45
|
The molecular basis of antigenic variation among A(H9N2) avian influenza viruses. Emerg Microbes Infect 2018; 7:176. [PMID: 30401826 PMCID: PMC6220119 DOI: 10.1038/s41426-018-0178-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 10/03/2018] [Accepted: 10/08/2018] [Indexed: 01/02/2023]
Abstract
Avian influenza A(H9N2) viruses are an increasing threat to global poultry production and, through zoonotic infection, to human health where they are considered viruses with pandemic potential. Vaccination of poultry is a key element of disease control in endemic countries, but vaccine effectiveness is persistently challenged by the emergence of antigenic variants. Here we employed a combination of techniques to investigate the genetic basis of H9N2 antigenic variability and evaluate the role of different molecular mechanisms of immune escape. We systematically tested the influence of published H9N2 monoclonal antibody escape mutants on chicken antisera binding, determining that many have no significant effect. Substitutions introducing additional glycosylation sites were a notable exception, though these are relatively rare among circulating viruses. To identify substitutions responsible for antigenic variation in circulating viruses, we performed an integrated meta-analysis of all published H9 haemagglutinin sequences and antigenic data. We validated this statistical analysis experimentally and allocated several new residues to H9N2 antigenic sites, providing molecular markers that will help explain vaccine breakdown in the field and inform vaccine selection decisions. We find evidence for the importance of alternative mechanisms of immune escape, beyond simple modulation of epitope structure, with substitutions increasing glycosylation or receptor-binding avidity, exhibiting the largest impacts on chicken antisera binding. Of these, meta-analysis indicates avidity regulation to be more relevant to the evolution of circulating viruses, suggesting that a specific focus on avidity regulation is required to fully understand the molecular basis of immune escape by influenza, and potentially other viruses.
Collapse
|
46
|
imGLAD: accurate detection and quantification of target organisms in metagenomes. PeerJ 2018; 6:e5882. [PMID: 30405973 PMCID: PMC6216955 DOI: 10.7717/peerj.5882] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 10/03/2018] [Indexed: 12/13/2022] Open
Abstract
Accurate detection of target microbial species in metagenomic datasets from environmental samples remains limited because the limit of detection of current methods is typically inaccessible and the frequency of false-positives, resulting from inadequate identification of regions of the genome that are either too highly conserved to be diagnostic (e.g., rRNA genes) or prone to frequent horizontal genetic exchange (e.g., mobile elements) remains unknown. To overcome these limitations, we introduce imGLAD, which aims to detect (target) genomic sequences in metagenomic datasets. imGLAD achieves high accuracy because it uses the sequence-discrete population concept for discriminating between metagenomic reads originating from the target organism compared to reads from co-occurring close relatives, masks regions of the genome that are not informative using the MyTaxa engine, and models both the sequencing breadth and depth to determine relative abundance and limit of detection. We validated imGLAD by analyzing metagenomic datasets derived from spinach leaves inoculated with the enteric pathogen Escherichia coli O157:H7 and showed that its limit of detection can be comparable to that of PCR-based approaches for these samples (∼1 cell/gram).
Collapse
|
47
|
Integrating patient and whole-genome sequencing data to provide insights into the epidemiology of seasonal influenza A(H3N2) viruses. Microb Genom 2017; 4. [PMID: 29310750 PMCID: PMC5857367 DOI: 10.1099/mgen.0.000137] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Genetic surveillance of seasonal influenza is largely focused on sequencing of the haemagglutinin gene. Consequently, our understanding of the contribution of the remaining seven gene segments to the evolution and epidemiological dynamics of seasonal influenza is relatively limited. The increased availability of next-generation sequencing technologies allows rapid and economic whole-genome sequencing (WGS) of influenza virus. Here, 150 influenza A(H3N2) positive clinical specimens with linked epidemiological data, from the 2014/15 season in Scotland, were sequenced directly using both Sanger sequencing of the HA1 region and WGS using the Illumina MiSeq platform. Sequences generated by the two methods were highly correlated, and WGS provided on average >90 % whole genome coverage. As reported in other European countries during 2014/15, all strains belonged to genetic group 3C, with subgroup 3C.2a predominating. Multiple inter-subgroup reassortants were identified, including three 3C.3 viruses descended from a single reassortment event, which had persisted in the population. Cases of severe acute respiratory illness were significantly clustered on phylogenies of multiple gene segments indicating potential genetic factors warranting further investigation. Severe cases were also more likely to be associated with reassortant viruses and to occur later in the season. These results suggest that WGS provides an opportunity to develop our understanding of the relationship between the influenza genome and disease severity and the epidemiological consequences of within-subtype reassortment. Therefore, increased levels of WGS, linked to clinical and epidemiological data, could improve influenza surveillance.
Collapse
|
48
|
A sparse hierarchical Bayesian model for detecting relevant antigenic sites in virus evolution. Comput Stat 2017. [DOI: 10.1007/s00180-017-0730-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
49
|
Abstract
Quantifying and predicting the antigenic characteristics of a virus is something of a holy grail for infectious disease research because of its central importance to the emergence of new strains, the severity of outbreaks, and vaccine selection. However, these characteristics are defined by a complex interplay of viral and host factors so that phylogenetic measures of viral similarity are often poorly correlated to antigenic relationships. Here, we generate antigenic phylogenies that track the phenotypic evolution of two serotypes of foot-and-mouth disease virus by combining host serology and viral sequence data to identify sites that are critical to their antigenic evolution. For serotype SAT1, we validate our antigenic phylogeny against monoclonal antibody escape mutants, which match all of the predicted antigenic sites. For serotype O, we validate it against known sites where available, and otherwise directly evaluate the impact on antigenic phenotype of substitutions in predicted sites using reverse genetics and serology. We also highlight a critical and poorly understood problem for vaccine selection by revealing qualitative differences between assays that are often used interchangeably to determine antigenic match between field viruses and vaccine strains. Our approach provides a tool to identify naturally occurring antigenic substitutions, allowing us to track the genetic diversification and associated antigenic evolution of the virus. Despite the hugely important role vaccines have played in enhancing human and animal health, vaccinology remains a conspicuously empirical science. This study advances the field by providing guidance for tuning vaccine strains via site-directed mutagenesis through this high-resolution tracking of antigenic evolution of the virus between rare major shifts in phenotype.
Collapse
|
50
|
Identification of Low- and High-Impact Hemagglutinin Amino Acid Substitutions That Drive Antigenic Drift of Influenza A(H1N1) Viruses. PLoS Pathog 2016; 12:e1005526. [PMID: 27057693 PMCID: PMC4825936 DOI: 10.1371/journal.ppat.1005526] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 03/04/2016] [Indexed: 12/20/2022] Open
Abstract
Determining phenotype from genetic data is a fundamental challenge. Identification of emerging antigenic variants among circulating influenza viruses is critical to the vaccine virus selection process, with vaccine effectiveness maximized when constituents are antigenically similar to circulating viruses. Hemagglutination inhibition (HI) assay data are commonly used to assess influenza antigenicity. Here, sequence and 3-D structural information of hemagglutinin (HA) glycoproteins were analyzed together with corresponding HI assay data for former seasonal influenza A(H1N1) virus isolates (1997–2009) and reference viruses. The models developed identify and quantify the impact of eighteen amino acid substitutions on the antigenicity of HA, two of which were responsible for major transitions in antigenic phenotype. We used reverse genetics to demonstrate the causal effect on antigenicity for a subset of these substitutions. Information on the impact of substitutions allowed us to predict antigenic phenotypes of emerging viruses directly from HA gene sequence data and accuracy was doubled by including all substitutions causing antigenic changes over a model incorporating only the substitutions with the largest impact. The ability to quantify the phenotypic impact of specific amino acid substitutions should help refine emerging techniques that predict the evolution of virus populations from one year to the next, leading to stronger theoretical foundations for selection of candidate vaccine viruses. These techniques have great potential to be extended to other antigenically variable pathogens. Influenza A viruses are characterized by rapid antigenic drift: structural changes in B-cell epitopes that facilitate escape from pre-existing immunity. Consequently, seasonal influenza continues to impose a major burden on human health. Accurate quantification of the antigenic impact of specific amino acid substitutions is a pre-requisite for predicting the fitness and evolutionary outcome of variant viruses. Using assays to attribute antigenic variation to amino acid sequence changes we identify substitutions that contribute to antigenic drift and quantify their impact. We show that substitutions identified as low-impact are a critical component of virus antigenic evolution and by including these, as well as the high-impact substitutions often focused on, the accuracy of predicting antigenic phenotypes of emerging viruses from genotype is doubled.
Collapse
|