1
|
Kim H, Chang W, Chae SJ, Park JE, Seo M, Kim JK. scLENS: data-driven signal detection for unbiased scRNA-seq data analysis. Nat Commun 2024; 15:3575. [PMID: 38678050 DOI: 10.1038/s41467-024-47884-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 04/14/2024] [Indexed: 04/29/2024] Open
Abstract
High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While dimensionality reduction tools have been developed to extract biological signals from the data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, can unintentionally distort signals in the data. Here, we develop scLENS, a dimensionality reduction tool that circumvents the long-standing issues of signal distortion and manual input. Specifically, we identify the primary cause of signal distortion during log normalization and effectively address it by uniformizing cell vector lengths with L2 normalization. Furthermore, we utilize random matrix theory-based noise filtering and a signal robustness test to enable data-driven determination of the threshold for signal dimensions. Our method outperforms 11 widely used dimensionality reduction tools and performs particularly well for challenging scRNA-seq datasets with high sparsity and variability. To facilitate the use of scLENS, we provide a user-friendly package that automates accurate signal detection of scRNA-seq data without manual time-consuming tuning.
Collapse
Affiliation(s)
- Hyun Kim
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea
| | - Won Chang
- Division of Statistics and Data Science, University of Cincinnati, Cincinnati, OH, 45221, USA
| | - Seok Joo Chae
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea
- Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong-Eun Park
- Graduate School of Medical Science and Engineering, KAIST, Daejeon, 34141, Republic of Korea
| | - Minseok Seo
- Department of Computer and Information Science, Korea University, Sejong, 30019, Republic of Korea
| | - Jae Kyoung Kim
- Biomedical Mathematics Group, Pioneer Research Center for Mathematical and Computational Sciences, Institute for Basic Science, Daejeon, 34126, Republic of Korea.
- Department of Mathematical Sciences, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
2
|
Zheng P, Zhou C, Ding Y, Liu B, Lu L, Zhu F, Duan S. Nanopore sequencing technology and its applications. MedComm (Beijing) 2023; 4:e316. [PMID: 37441463 PMCID: PMC10333861 DOI: 10.1002/mco2.316] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 05/29/2023] [Accepted: 05/31/2023] [Indexed: 07/15/2023] Open
Abstract
Since the development of Sanger sequencing in 1977, sequencing technology has played a pivotal role in molecular biology research by enabling the interpretation of biological genetic codes. Today, nanopore sequencing is one of the leading third-generation sequencing technologies. With its long reads, portability, and low cost, nanopore sequencing is widely used in various scientific fields including epidemic prevention and control, disease diagnosis, and animal and plant breeding. Despite initial concerns about high error rates, continuous innovation in sequencing platforms and algorithm analysis technology has effectively addressed its accuracy. During the coronavirus disease (COVID-19) pandemic, nanopore sequencing played a critical role in detecting the severe acute respiratory syndrome coronavirus-2 virus genome and containing the pandemic. However, a lack of understanding of this technology may limit its popularization and application. Nanopore sequencing is poised to become the mainstream choice for preventing and controlling COVID-19 and future epidemics while creating value in other fields such as oncology and botany. This work introduces the contributions of nanopore sequencing during the COVID-19 pandemic to promote public understanding and its use in emerging outbreaks worldwide. We discuss its application in microbial detection, cancer genomes, and plant genomes and summarize strategies to improve its accuracy.
Collapse
Affiliation(s)
- Peijie Zheng
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Chuntao Zhou
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Yuemin Ding
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
- Institute of Translational Medicine, School of MedicineZhejiang University City CollegeHangzhouChina
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of MedicineZhejiang University City CollegeHangzhouChina
| | - Bin Liu
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Liuyi Lu
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Feng Zhu
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
| | - Shiwei Duan
- Department of Clinical MedicineSchool of MedicineZhejiang University City CollegeHangzhouChina
- Institute of Translational Medicine, School of MedicineZhejiang University City CollegeHangzhouChina
- Key Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang Province, School of MedicineZhejiang University City CollegeHangzhouChina
| |
Collapse
|
3
|
Zhang J, Zheng N, Liu M, Yao D, Wang Y, Wang J, Xin J. Multi-weight susceptible-infected model for predicting COVID-19 in China. Neurocomputing 2023; 534:161-170. [PMID: 36923265 PMCID: PMC9993734 DOI: 10.1016/j.neucom.2023.02.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/10/2023] [Accepted: 02/26/2023] [Indexed: 03/17/2023]
Abstract
The mutant strains of COVID-19 caused a global explosion of infections, including many cities of China. In 2020, a hybrid AI model was proposed by Zheng et al., which accurately predicted the epidemic in Wuhan. As the main part of the hybrid AI model, ISI method makes two important assumptions to avoid over-fitting. However, the assumptions cannot be effectively applied to new mutant strains. In this paper, a more general method, named the multi-weight susceptible-infected model (MSI) is proposed to predict COVID-19 in Chinese Mainland. First, a Gaussian pre-processing method is proposed to solve the problem of data fluctuation based on the quantity consistency of cumulative infection number and the trend consistency of daily infection number. Then, we improve the model from two aspects: changing the grouped multi-parameter strategy to the multi-weight strategy, and removing the restriction of weight distribution of viral infectivity. Experiments on the outbreaks in many places in China from the end of 2021 to May 2022 show that, in China, an individual infected by Delta or Omicron strains of SARS-CoV-2 can infect others within 3-4 days after he/she got infected. Especially, the proposed method effectively predicts the trend of the epidemics in Xi'an, Tianjin, Henan, and Shanghai from December 2021 to May 2022.
Collapse
Affiliation(s)
- Jun Zhang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China.,School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Nanning Zheng
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Mingyu Liu
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China.,Qian Xuesen College, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Dingyi Yao
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China.,Qian Xuesen College, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Yusong Wang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Jianji Wang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| | - Jingmin Xin
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an 710049, Shaanxi, China
| |
Collapse
|
4
|
Senchyna F, Singh R. Dynamic Epidemiological Networks: A Data Representation Framework for Modeling and Tracking of SARS-CoV-2 Variants. J Comput Biol 2023; 30:446-468. [PMID: 37098217 DOI: 10.1089/cmb.2022.0469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023] Open
Abstract
The large-scale real-time sequencing of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes has allowed for rapid identification of concerning variants through phylogenetic analysis. However, the nature of phylogenetic reconstruction is typically static, in that the relationships between taxonomic units, once defined, are not subject to alterations. Furthermore, most phylogenetic methods are intrinsically batch mode in nature, requiring the presence of the entire data set. Finally, the emphasis of phylogenetics is on relating taxonomical units. These characteristics complicate the application of classical phylogenetics methods to represent relationships in molecular data collected from rapidly evolving strains of an etiological agent, such as SARS-CoV-2, since the molecular landscape is updated continuously as samples are collected. In such settings, variant definitions are subject to epistemological constraints and may change as data accumulate. Furthermore, representing within-variant molecular relationships may be as important as representing between variant relationships. This article describes a novel data representation framework called dynamic epidemiological networks (DENs) along with algorithms that underpin its construction to address these issues. The proposed representation is applied to study the molecular development underlying the spread of the COVID-19 (coronavirus disease 2019) pandemic in two countries: Israel and Portugal spanning a 2-year period from February 2020 to April 2022. The results demonstrate how this framework could be used to provide a multiscale representation of the data by capturing molecular relationships between samples as well as those between variants, automatically identifying the emergence of high frequency variants (lineages), including variants of concern such as Alpha and Delta, and tracking their growth. Additionally, we show how analyzing the evolution of the DEN can help identify changes in the viral population that could not be readily inferred from phylogenetic analysis.
Collapse
Affiliation(s)
- Fiona Senchyna
- Department of Computer Science, San Francisco State University, San Francisco, California, USA
| | - Rahul Singh
- Department of Computer Science, San Francisco State University, San Francisco, California, USA
- Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, California, USA
| |
Collapse
|
5
|
Evers P, Pezacki JP. Unraveling Complex MicroRNA Signaling Pathways with Activity‐Based Protein Profiling to Guide Therapeutic Discovery**. Isr J Chem 2023. [DOI: 10.1002/ijch.202200088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Parrish Evers
- Department of Chemistry and Biomolecular Sciences University of Ottawa 150 Louis-Pasteur Pvt. K1N 6N5 Ottawa Canada
| | - John Paul Pezacki
- Department of Chemistry and Biomolecular Sciences University of Ottawa 150 Louis-Pasteur Pvt. K1N 6N5 Ottawa Canada
- Department of Biochemistry Microbiology, and Immunology University of Ottawa 451 Smyth Rd. K1H 8M5 Ottawa Canada
| |
Collapse
|
6
|
Zheng SY, Zhang YP, Liu YX, Zhao W, Peng XL, Zheng YP, Fu YH, Yu JM, He JS. Tracking of Mutational Signature of SARS-CoV-2 Omicron on Distinct Continents and Little Difference was Found. Viruses 2023; 15:v15020321. [PMID: 36851535 PMCID: PMC9967123 DOI: 10.3390/v15020321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 01/18/2023] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open
Abstract
The Omicron variant is currently ravaging the world, raising serious concern globally. Monitoring genomic variations and determining their influence on biological features are critical for tracing its ongoing transmission and facilitating effective measures. Based on large-scale sequences from different continents, this study found that: (i) The genetic diversity of Omicron is much lower than that of the Delta variant. Still, eight deletions (Del 1-8) and 1 insertion, as well as 130 SNPs, were detected on the Omicron genomes, with two deletions (Del 3 and 4) and 38 SNPs commonly detected on all continents and exhibiting high-occurring frequencies. (ii) Four groups of tightly linked SNPs (linkage I-IV) were detected, among which linkage I, containing 38 SNPs, with 6 located in the RBD, increased its occurring frequency remarkably over time. (iii) The third codons of the Omicron shouldered the most mutation pressures, while the second codons presented the least flexibility. (iv) Four major mutants with amino acid substitutions in the RBD were detected, and further structural analysis suggested that the substitutions did not alter the viral receptor binding ability greatly. It was inferred that though the Omicron genome harbored great changes in antigenicity and remarkable ability to evade immunity, it was immune-pressure selected. This study tracked mutational signatures of Omicron variant and the potential biological significance of the SNPs, and the linkages await further functional verification.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Jie-Mei Yu
- Correspondence: (J.-M.Y.); (J.-S.H.); Tel.: +86-10-51684358 (J.-M.Y.)
| | - Jin-Sheng He
- Correspondence: (J.-M.Y.); (J.-S.H.); Tel.: +86-10-51684358 (J.-M.Y.)
| |
Collapse
|
7
|
Huang Q, Qiu H, Bible PW, Huang Y, Zheng F, Gu J, Sun J, Hao Y, Liu Y. Early detection of SARS-CoV-2 variants through dynamic co-mutation network surveillance. Front Public Health 2023; 11:1015969. [PMID: 36755900 PMCID: PMC9901361 DOI: 10.3389/fpubh.2023.1015969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/02/2023] [Indexed: 01/25/2023] Open
Abstract
Background Precise public health and clinical interventions for the COVID-19 pandemic has spurred a global rush on SARS-CoV-2 variant tracking, but current approaches to variant tracking are challenged by the flood of viral genome sequences leading to a loss of timeliness, accuracy, and reliability. Here, we devised a new co-mutation network framework, aiming to tackle these difficulties in variant surveillance. Methods To avoid simultaneous input and modeling of the whole large-scale data, we dynamically investigate the nucleotide covarying pattern of weekly sequences. The community detection algorithm is applied to a co-occurring genomic alteration network constructed from mutation corpora of weekly collected data. Co-mutation communities are identified, extracted, and characterized as variant markers. They contribute to the creation and weekly updates of a community-based variant dictionary tree representing SARS-CoV-2 evolution, where highly similar ones between weeks have been merged to represent the same variants. Emerging communities imply the presence of novel viral variants or new branches of existing variants. This process was benchmarked with worldwide GISAID data and validated using national level data from six COVID-19 hotspot countries. Results A total of 235 co-mutation communities were identified after a 120 weeks' investigation of worldwide sequence data, from March 2020 to mid-June 2022. The dictionary tree progressively developed from these communities perfectly recorded the time course of SARS-CoV-2 branching, coinciding with GISAID clades. The time-varying prevalence of these communities in the viral population showed a good match with the emergence and circulation of the variants they represented. All these benchmark results not only exhibited the methodology features but also demonstrated high efficiency in detection of the pandemic variants. When it was applied to regional variant surveillance, our method displayed significantly earlier identification of feature communities of major WHO-named SARS-CoV-2 variants in contrast with Pangolin's monitoring. Conclusion An efficient genomic surveillance framework built from weekly co-mutation networks and a dynamic community-based variant dictionary tree enables early detection and continuous investigation of SARS-CoV-2 variants overcoming genomic data flood, aiding in the response to the COVID-19 pandemic.
Collapse
Affiliation(s)
- Qiang Huang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Huining Qiu
- Guangdong Artificial Intelligence Machine Vision Engineering Technology Research Center, Guangzhou, China
| | - Paul W. Bible
- College of Arts and Sciences, Marian University, Indianapolis, IN, United States
| | - Yong Huang
- Institute of Public Health, Guangzhou Medical University & Guangzhou Center for Disease Control and Prevention, Guangzhou, China
| | - Fangfang Zheng
- School of Traditional Chinese Medicine Healthcare, Guangdong Food and Drug Vocational College, Guangzhou, China
| | - Jing Gu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Jian Sun
- Department of Clinical Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China,*Correspondence: Jian Sun ✉
| | - Yuantao Hao
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China,Yuantao Hao ✉
| | - Yu Liu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China,Yu Liu ✉
| |
Collapse
|
8
|
Zhao X, Qin L, Ding X, Zhang Y, Niu X, Gao F, Jiang T, Chen L. Origin and Reversion of Omicron Core Mutations in the Evolution of SARS-CoV-2 Genomes. Viruses 2022; 15:30. [PMID: 36680069 PMCID: PMC9865174 DOI: 10.3390/v15010030] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/01/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
Genetic analyses showed nearly 30 amino acid mutations occurred in the spike protein of the Omicron variant of SARS-CoV-2. However, how these mutations occurred and changed during the generation and development of Omicron remains unclear. In this study, 6.7 million (all publicly available data from 2020/04/01 to 2022/04/01) SARS-CoV-2 genomes were analyzed to track the origin and evolution of Omicron variants and to reveal the genetic pathways of the generation of core mutations in Omicron. The haplotype network visualized the pre-Omicron, intact-Omicron, and post-Omicron variants and revealed their evolutionary direction. The correlation analysis showed the correlation feature of the core mutations in Omicron. Moreover, we found some core mutations, such as 142D, 417N, 440K, and 764K, reversed to ancestral residues (142G, 417K, 440N, and 764N) in the post-Omicron variant, suggesting the reverse mutations provided sources for the emergence of new variants. In summary, our analysis probed the origin and further evolution of Omicron sub-variants, which may add to our understanding of new variants and facilitate the control of the pandemic.
Collapse
Affiliation(s)
- Xinwei Zhao
- State Key Laboratory of Respiratory Disease, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Luyao Qin
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
- Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Xiao Ding
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
- Suzhou Institute of Systems Medicine, Suzhou 215123, China
| | - Yudi Zhang
- State Key Laboratory of Respiratory Disease, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Xuefeng Niu
- State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, China
| | - Feng Gao
- Institute of Molecular and Medical Virology, Guangdong Provincial Key Laboratory of Virology, Institute of Medical Microbiology, School of Medicine, Jinan University, Guangzhou 510632, China
| | - Taijiao Jiang
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
- Suzhou Institute of Systems Medicine, Suzhou 215123, China
- Guangzhou Laboratory, Guangzhou 510005, China
| | - Ling Chen
- State Key Laboratory of Respiratory Disease, Guangdong Provincial Key Laboratory of Biocomputing, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
- State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, China
- Guangzhou Laboratory, Guangzhou 510005, China
| |
Collapse
|
9
|
Evolutionary Pattern Comparisons of the SARS-CoV-2 Delta Variant in Countries/Regions with High and Low Vaccine Coverage. Viruses 2022; 14:v14102296. [PMID: 36298851 PMCID: PMC9611485 DOI: 10.3390/v14102296] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/09/2022] [Accepted: 10/10/2022] [Indexed: 11/16/2022] Open
Abstract
It has been argued that vaccine-breakthrough infections of SARS-CoV-2 would likely accelerate the emergence of novel variants with immune evasion. This study explored the evolutionary patterns of the Delta variant in countries/regions with relatively high and low vaccine coverage based on large-scale sequences. Our results showed that (i) the sequences were grouped into two clusters (L and R); the R cluster was dominant, its proportion increased over time and was higher in the high-vaccine-coverage areas; (ii) genetic diversities in the countries/regions with low vaccine coverage were higher than those in the ones with high vaccine coverage; (iii) unique mutations and co-mutations were detected in different countries/regions; in particular, common co-mutations were exhibited in highly occurring frequencies in the areas with high vaccine coverage and presented in increasing frequencies over time in the areas with low vaccine coverage; (iv) five sites on the S protein were under strong positive selection in different countries/regions, with three in non-C to U sites (I95T, G142D and T950N), and the occurring frequencies of I95T in high vaccine coverage areas were higher, while G142D and T950N were potentially immune-pressure-selected sites; and (v) mutation at the N6-methyladenosine site 4 on ORF7a (C27527T, P45L) was detected and might be caused by immune pressure. Our study suggested that certain variation differences existed between countries/regions with high and low vaccine coverage, but they were not likely caused by host immune pressure. We inferred that no extra immune pressures on SARS-CoV-2 were generated with high vaccine coverage, and we suggest promoting and strengthening the uptake of the COVID-19 vaccine worldwide, especially in less developed areas.
Collapse
|
10
|
Nucleotide-based genetic networks: Methods and applications. J Biosci 2022. [PMID: 36226367 PMCID: PMC9554864 DOI: 10.1007/s12038-022-00290-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Genomic variations have been acclaimed as among the key players in understanding the biological mechanisms behind migration, evolution, and adaptation to extreme conditions. Due to stochastic evolutionary forces, the frequency of polymorphisms is affected by changes in the frequency of nearby polymorphisms in the same DNA sample, making them connected in terms of evolution. This article presents all the ingredients to understand the cumulative effects and complex behaviors of genetic variations in the human mitochondrial genome by analyzing co-occurrence networks of nucleotides, and shows key results obtained from such analyses. The article emphasizes recent investigations of these co-occurrence networks, describing the role of interactions between nucleotides in fundamental processes of human migration and viral evolution. The corresponding co-mutation-based genetic networks revealed genetic signatures of human adaptation in extreme environments. This article provides the methods of constructing such networks in detail, along with their graph-theoretical properties, and applications of the genomic networks in understanding the role of nucleotide co-evolution in evolution of the whole genome.
Collapse
|
11
|
Al Khalaf R, Bernasconi A, Pinoli P, Ceri S. Analysis of co-occurring and mutually exclusive amino acid changes and detection of convergent and divergent evolution events in SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:4238-4250. [PMID: 35945925 PMCID: PMC9352683 DOI: 10.1016/j.csbj.2022.07.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/28/2022] Open
Abstract
The inflation of SARS-CoV-2 lineages with a high number of accumulated mutations (such as the recent case of Omicron) has risen concerns about the evolutionary capacity of this virus. Here, we propose a computational study to examine non-synonymous mutations gathered within genomes of SARS-CoV-2 from the beginning of the pandemic until February 2022. We provide both qualitative and quantitative descriptions of such corpus, focusing on statistically significant co-occurring and mutually exclusive mutations within single genomes. Then, we examine in depth the distributions of mutations over defined lineages and compare those of frequently co-occurring mutation pairs. Based on this comparison, we study mutations' convergence/divergence on the phylogenetic tree. As a result, we identify 1,818 co-occurring pairs of non-synonymous mutations showing at least one event of convergent evolution and 6,625 co-occurring pairs with at least one event of divergent evolution. Notable examples of both types are shown by means of a tree-based representation of lineages, visually capturing mutations' behaviors. Our method confirms several well-known cases; moreover, the provided evidence suggests that our workflow can explain aspects of the future mutational evolution of SARS-CoV-2.
Collapse
Affiliation(s)
- Ruba Al Khalaf
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Anna Bernasconi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Pietro Pinoli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| |
Collapse
|
12
|
Qin L, Meng J, Ding X, Jiang T. Mapping Genetic Events of SARS-CoV-2 Variants. Front Microbiol 2022; 13:890590. [PMID: 35910603 PMCID: PMC9329953 DOI: 10.3389/fmicb.2022.890590] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 06/20/2022] [Indexed: 12/18/2022] Open
Abstract
Genetic mutation and recombination are driving the evolution of SARS-CoV-2, leaving many genetic imprints which could be utilized to track the evolutionary pathway of SARS-CoV-2 and explore the relationships among variants. Here, we constructed a complete genetic map, showing the explicit evolutionary relationship among all SARS-CoV-2 variants including 58 groups and 46 recombination types identified from 3,392,553 sequences, which enables us to keep well informed of the evolution of SARS-CoV-2 and quickly determine the parents of novel variants. We found that the 5' and 3' of the spike and nucleoprotein genes have high frequencies to form the recombination junctions and that the RBD region in S gene is always exchanged as a whole. Although these recombinants did not show advantages in community transmission, it is necessary to keep a wary eye on the novel genetic events, in particular, the mutants with mutations on spike and recombinants with exchanged moieties on spike gene.
Collapse
Affiliation(s)
- Luyao Qin
- Institute of Systems Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Jing Meng
- Institute of Systems Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Xiao Ding
- Institute of Systems Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Taijiao Jiang
- Institute of Systems Medicine, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
- Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
- Guangzhou Laboratory, Guangzhou, China
| |
Collapse
|
13
|
Okamoto KW, Ong V, Wallace R, Wallace R, Chaves LF. When might host heterogeneity drive the evolution of asymptomatic, pandemic coronaviruses? NONLINEAR DYNAMICS 2022; 111:927-949. [PMID: 35757097 PMCID: PMC9207439 DOI: 10.1007/s11071-022-07548-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 02/05/2022] [Indexed: 06/15/2023]
Abstract
Controlling many infectious diseases, including SARS-Coronavirus-2 (SARS-CoV-2), requires surveillance followed by isolation, contact-tracing and quarantining. These interventions often begin by identifying symptomatic individuals. However, actively removing pathogen strains causing symptomatic infections may inadvertently select for strains less likely to cause symptomatic infections. Moreover, a pathogen's fitness landscape is structured around a heterogeneous host pool; uneven surveillance efforts and distinct transmission risks across host classes can meaningfully alter selection pressures. Here, we explore this interplay between evolution caused by disease control efforts and the evolutionary consequences of host heterogeneity. Using an evolutionary epidemiology model parameterized for coronaviruses, we show that intense symptoms-driven disease control selects for asymptomatic strains, particularly when these efforts are applied unevenly across host groups. Under these conditions, increasing quarantine efforts have diverging effects. If isolation alone cannot eradicate, intensive quarantine efforts combined with uneven detections of asymptomatic infections (e.g., via neglect of some host classes) can favor the evolution of asymptomatic strains. We further show how, when intervention intensity depends on the prevalence of symptomatic infections, higher removal efforts (and isolating symptomatic cases in particular) more readily select for asymptomatic strains than when these efforts do not depend on prevalence. The selection pressures on pathogens caused by isolation and quarantining likely lie between the extremes of no intervention and thoroughly successful eradication. Thus, analyzing how different public health responses can select for asymptomatic pathogen strains is critical for identifying disease suppression efforts that can effectively manage emerging infectious diseases. Supplementary Information The online version contains supplementary material available at 10.1007/s11071-022-07548-7.
Collapse
Affiliation(s)
- Kenichi W. Okamoto
- Department of Biology, University of St. Thomas, St. Paul, MN 55105 USA
- Agroecology and Rural Economics Research Corps, St. Paul, MN USA
| | - Virakbott Ong
- Department of Biology, University of St. Thomas, St. Paul, MN 55105 USA
| | - Robert Wallace
- Agroecology and Rural Economics Research Corps, St. Paul, MN USA
| | | | - Luis Fernando Chaves
- Instituto Conmemorativo Gorgas de Estudios de la Salud (ICGES), Avenida Justo Arosemena, Panama, Panama
| |
Collapse
|
14
|
Zhu M, Lai Y. Improvements Achieved by Multiple Imputation for Single-Cell RNA-Seq Data in Clustering Analysis and Differential Expression Analysis. J Comput Biol 2022; 29:634-649. [PMID: 35575729 DOI: 10.1089/cmb.2021.0597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In a single-cell RNA-seq (scRNA-seq) data set, a high proportion of missing values (or an excessive number of zeroes) are frequently observed. For the related follow-up tasks, such as clustering analysis and differential expression analysis, a data set without missing values is generally required. Many imputation approaches have been proposed for this purpose. Multiple imputation (MI) is a well-established approach to address possible biases in a follow-up analysis result based on one-time imputed data. There is a lack of investigation on this in the analysis of scRNA-seq data. In this study, we have investigated how to efficiently apply the MI approach to the clustering analysis and the differential expression analysis of scRNA-seq data. We proposed an MI procedure for clustering analysis and an MI procedure for differential expression analysis. To demonstrate the improvements achieved by MI in clustering analysis and differential expression analysis of scRNA-seq data, we analyzed three well-known scRNA-seq data sets. scIGANs, an scRNA-seq imputation method based on the generative adversarial networks (GANs), has been recently proposed for scRNA-seq data imputation. Multiple randomly imputed data sets can be conveniently generated by this method. We implemented our MI procedures based on scIGANs. We demonstrated that MI yielded improved performances on the clustering analysis and differential expression analysis results. Our applications to experimental scRNA-seq data illustrated the advantages of MI over one-time imputation of missing values in scRNA-seq data.
Collapse
Affiliation(s)
- Mengqiu Zhu
- Department of Statistics, The George Washington University, Washington, District of Columbia, USA
| | - Yinglei Lai
- School of Mathematical Science, University of Science and Technology of China, Hefei, China
| |
Collapse
|
15
|
Sokhansanj BA, Rosen GL. Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences. mSystems 2022; 7:e0003522. [PMID: 35311562 PMCID: PMC9040592 DOI: 10.1128/msystems.00035-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2022] [Indexed: 12/22/2022] Open
Abstract
Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.
Collapse
Affiliation(s)
- Bahrad A. Sokhansanj
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| | - Gail L. Rosen
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| |
Collapse
|
16
|
Huang Q, Zhang Q, Bible PW, Liang Q, Zheng F, Wang Y, Hao Y, Liu Y. A New Way to Trace SARS-CoV-2 Variants Through Weighted Network Analysis of Frequency Trajectories of Mutations. Front Microbiol 2022; 13:859241. [PMID: 35369526 PMCID: PMC8966897 DOI: 10.3389/fmicb.2022.859241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 02/18/2022] [Indexed: 11/13/2022] Open
Abstract
Early detection of SARS-CoV-2 variants enables timely tracking of clinically important strains in order to inform the public health response. Current subtype-based variant surveillance depending on prior subtype assignment according to lag features and their continuous risk assessment may delay this process. We proposed a weighted network framework to model the frequency trajectories of mutations (FTMs) for SARS-CoV-2 variant tracing, without requiring prior subtype assignment. This framework modularizes the FTMs and conglomerates synchronous FTMs together to represent the variants. It also generates module clusters to unveil the epidemic stages and their contemporaneous variants. Eventually, the module-based variants are assessed by phylogenetic tree through sub-sampling to facilitate communication and control of the epidemic. This process was benchmarked using worldwide GISAID data, which not only demonstrated all the methodology features but also showed the module-based variant identification had highly specific and sensitive mapping with the global phylogenetic tree. When applying this process to regional data like India and South Africa for SARS-CoV-2 variant surveillance, the approach clearly elucidated the national dispersal history of the viral variants and their co-circulation pattern, and provided much earlier warning of Beta (B.1.351), Delta (B.1.617.2), and Omicron (B.1.1.529). In summary, our work showed that the weighted network modeling of FTMs enables us to rapidly and easily track down SARS-CoV-2 variants overcoming prior viral subtyping with lag features, accelerating the understanding and surveillance of COVID-19.
Collapse
Affiliation(s)
- Qiang Huang
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Qiang Zhang
- College of Computer, Chengdu University, Chengdu, China
| | - Paul W Bible
- College of Arts and Sciences, Marian University, Indianapolis, IN, United States
| | - Qiaoxing Liang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Fangfang Zheng
- School of Traditional Chinese Medicine Healthcare, Guangdong Food and Drug Vocational College, Guangzhou, China
| | - Ying Wang
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yuantao Hao
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yu Liu
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
17
|
Jiang C, Mu X, Liu S, Liu Z, Du B, Wang J, Xu J. A Study of the Detection of SARS-CoV-2 ORF1ab Gene by the Use of Electrochemiluminescent Biosensor Based on Dual-Probe Hybridization. SENSORS 2022; 22:s22062402. [PMID: 35336572 PMCID: PMC8954742 DOI: 10.3390/s22062402] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 03/14/2022] [Accepted: 03/18/2022] [Indexed: 02/01/2023]
Abstract
To satisfy the need to develop highly sensitive methods for detecting the severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) and further enhance detection efficiency and capability, a new method was created for detecting SARS-CoV-2 of the open reading frames 1ab (ORF1ab) target gene by a electrochemiluminescence (ECL) biosensor based on dual-probe hybridization through the use of a detection model of "magnetic capture probes-targeted nucleic acids-Ru(bpy)32+ labeled signal probes". The detection model used magnetic particles coupled with a biotin-labeled complementary nucleic acid sequence of the SARS-CoV-2 ORF1ab target gene as the magnetic capture probes and Ru(bpy)32+ labeled amino modified another complementary nucleic acid sequence as the signal probes, which combined the advantages of the highly specific dual-probe hybridization and highly sensitive ECL biosensor technology. In the range of 0.1 fM~10 µM, the method made possible rapid and sensitive detection of the ORF1ab gene of the SARS-CoV-2 within 30 min, and the limit of detection (LOD) was 0.1 fM. The method can also meet the analytical requirements for simulated samples such as saliva and urine with the definite advantages of a simple operation without nucleic acid amplification, high sensitivity, reasonable reproducibility, and anti-interference solid abilities, expounding a new way for efficient and sensitive detection of SARS-CoV-2.
Collapse
|
18
|
UĞUREL OM, ATA O, TURGUT-BALIK D. Genomic chronicle of SARS-CoV-2: a mutational analysis with over 1 million genome sequences. Turk J Biol 2021; 45:425-435. [PMID: 34803444 PMCID: PMC8573839 DOI: 10.3906/biy-2106-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 07/30/2021] [Indexed: 11/26/2022] Open
Abstract
Use of information technologies to analyse big data on SARS-CoV-2 genome provides an insight for tracking variations and examining the evolution of the virus. Nevertheless, storing, processing, alignment and analyses of these numerous genomes are still a challenge. In this study, over 1 million SARS-CoV-2 genomes have been analysed to show distribution and relationship of variations that could enlighten development and evolution of the virus. In all genomes analysed in this study, a total of over 215M SNVs have been detected and average number of SNV per isolate was found to be 21.83. Single nucleotide variant (SNV) average is observed to reach 31.25 just in March 2021. The average variation number of isolates is increasing and compromising with total case numbers around the world. Remarkably, cytosine deamination, which is one of the most important biochemical processes in the evolutionary development of coronaviruses, accounts for 46% of all SNVs seen in SARS-CoV-2 genomes within 16 months. This study is one of the most comprehensive SARS-CoV-2 genomic analysis study in terms of number of genomes analysed in an academic publication so far, and reported results could be useful in monitoring the development of SARS-CoV-2.
Collapse
Affiliation(s)
- Osman Mutluhan UĞUREL
- Department of Bioengineering, Faculty of Chemical and Metallurgical Engineering, Yıldız Technical University, İstanbulTurkey
- Department of Basic Sciences, School of Engineering and Natural Sciences, Altınbaş University, İstanbulTurkey
| | - Oğuz ATA
- Department of Software Engineering, School of Engineering and Natural Sciences, Altınbaş University, İstanbulTurkey
| | - Dilek TURGUT-BALIK
- Department of Bioengineering, Faculty of Chemical and Metallurgical Engineering, Yıldız Technical University, İstanbulTurkey
| |
Collapse
|