1
|
Chandra R, Bansal C, Kang M, Blau T, Agarwal V, Singh P, Wilson LOW, Vasan S. Unsupervised machine learning framework for discriminating major variants of concern during COVID-19. PLoS One 2023; 18:e0285719. [PMID: 37200352 PMCID: PMC10194860 DOI: 10.1371/journal.pone.0285719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/28/2023] [Indexed: 05/20/2023] Open
Abstract
Due to the high mutation rate of the virus, the COVID-19 pandemic evolved rapidly. Certain variants of the virus, such as Delta and Omicron emerged with altered viral properties leading to severe transmission and death rates. These variants burdened the medical systems worldwide with a major impact to travel, productivity, and the world economy. Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data. This paper presents a framework that utilizes unsupervised machine learning methods to discriminate and visualize the associations between major COVID-19 variants based on their genome sequences. These methods comprise a combination of selected dimensionality reduction and clustering techniques. The framework processes the RNA sequences by performing a k-mer analysis on the data and further visualises and compares the results using selected dimensionality reduction methods that include principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Our framework also employs agglomerative hierarchical clustering to visualize the mutational differences among major variants of concern and country-wise mutational differences for selected variants (Delta and Omicron) using dendrograms. We also provide country-wise mutational differences for selected variants via dendrograms. We find that the proposed framework can effectively distinguish between the major variants and has the potential to identify emerging variants in the future.
Collapse
Affiliation(s)
- Rohitash Chandra
- Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics, UNSW Sydney, Sydney, Australia
| | - Chaarvi Bansal
- Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics, UNSW Sydney, Sydney, Australia
- Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Rajasthan, India
| | - Mingyue Kang
- Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics, UNSW Sydney, Sydney, Australia
| | - Tom Blau
- Data 61, CSIRO, Sydney, Australia
| | - Vinti Agarwal
- Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Rajasthan, India
| | - Pranjal Singh
- Department of Computer Science and Engineering, Indian Institute of Technology Guwathi, Assam, India
| | - Laurence O. W. Wilson
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, North Ryde, Australia
| | - Seshadri Vasan
- Department of Health Sciences, University of York, York, United Kingdom
| |
Collapse
|
2
|
Lila E, Aston JAD. Functional random effects modeling of brain shape and connectivity. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Eardi Lila
- Department of Biostatistics, University of Washington
| | | |
Collapse
|
3
|
Greco L, Inverardi PLN, Agostinelli C. Finite mixtures of multivariate Wrapped Normal distributions for model based clustering of p-torus data. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2128808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Affiliation(s)
- Luca Greco
- University G. Fortunato, Benevento, Italy
| | | | | |
Collapse
|
4
|
Zoubouloglou P, García-Portugués E, Marron JS. Scaled Torus Principal Component Analysis. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2119985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Pavlos Zoubouloglou
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill
| | | | - J. S. Marron
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill
| |
Collapse
|
5
|
Principal Component Analysis and Related Methods for Investigating the Dynamics of Biological Macromolecules. J 2022. [DOI: 10.3390/j5020021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Principal component analysis (PCA) is used to reduce the dimensionalities of high-dimensional datasets in a variety of research areas. For example, biological macromolecules, such as proteins, exhibit many degrees of freedom, allowing them to adopt intricate structures and exhibit complex functions by undergoing large conformational changes. Therefore, molecular simulations of and experiments on proteins generate a large number of structure variations in high-dimensional space. PCA and many PCA-related methods have been developed to extract key features from such structural data, and these approaches have been widely applied for over 30 years to elucidate macromolecular dynamics. This review mainly focuses on the methodological aspects of PCA and related methods and their applications for investigating protein dynamics.
Collapse
|
6
|
Mardia KV, Wiechers H, Eltzner B, Huckemann SF. Principal component analysis and clustering on manifolds. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Dai X, Lopez-Pintado S. Tukey's Depth for Object Data. J Am Stat Assoc 2022; 118:1760-1772. [PMID: 37791295 PMCID: PMC10545316 DOI: 10.1080/01621459.2021.2011298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 11/22/2021] [Indexed: 10/19/2022]
Abstract
We develop a novel exploratory tool for non-Euclidean object data based on data depth, extending celebrated Tukey's depth for Euclidean data. The proposed metric halfspace depth, applicable to data objects in a general metric space, assigns to data points depth values that characterize the centrality of these points with respect to the distribution and provides an interpretable center-outward ranking. Desirable theoretical properties that generalize standard depth properties postulated for Euclidean data are established for the metric halfspace depth. The depth median, defined as the deepest point, is shown to have high robustness as a location descriptor both in theory and in simulation. We propose an efficient algorithm to approximate the metric halfspace depth and illustrate its ability to adapt to the intrinsic data geometry. The metric halfspace depth was applied to an Alzheimer's disease study, revealing group differences in the brain connectivity, modeled as covariance matrices, for subjects in different stages of dementia. Based on phylogenetic trees of 7 pathogenic parasites, our proposed metric halfspace depth was also used to construct a meaningful consensus estimate of the evolutionary history and to identify potential outlier trees.
Collapse
Affiliation(s)
- Xiongtao Dai
- Department of Statistics, Iowa State University, Ames, Iowa 50011 USA
| | - Sara Lopez-Pintado
- Department of Health Sciences, Northeastern University, Boston, MA 02115 USA
| |
Collapse
|
8
|
Jung S, Park K, Kim B. Clustering on the torus by conformal prediction. Ann Appl Stat 2021. [DOI: 10.1214/21-aoas1459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Sungkyu Jung
- Department of Statistics, Seoul National University
| | - Kiho Park
- Department of Statistics, Seoul National University
| | - Byungwon Kim
- Department of Statistics, Kyungpook National University
| |
Collapse
|
9
|
Comments on: Recent advances in directional statistics. TEST-SPAIN 2021; 30:59-63. [PMID: 33758495 PMCID: PMC7976687 DOI: 10.1007/s11749-021-00760-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 01/30/2021] [Indexed: 11/02/2022]
|
10
|
|
11
|
Rejoinder on: Recent advances in directional statistics. TEST-SPAIN 2021. [DOI: 10.1007/s11749-021-00762-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
MacDonald IL. Rejoinder: Fitting a folded normal distribution without EM. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
13
|
Jung S, Foskey M, Marron JS. Response to ‘Fitting a folded normal distribution without EM’. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Nodehi A, Golalizadeh M, Maadooliat M, Agostinelli C. Estimation of parameters in multivariate wrapped models for data on a p-torus. Comput Stat 2020. [DOI: 10.1007/s00180-020-01006-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Affiliation(s)
- Zhigang Yao
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Zhenyue Zhang
- School of Mathematical Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
16
|
Kim B, Schulz J, Jung S. Kurtosis test of modality for rotationally symmetric distributions on hyperspheres. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2020.104603] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Kim B, Huckemann S, Schulz J, Jung S. Small‐sphere distributions for directional data with application to medical imaging. Scand Stat Theory Appl 2019. [DOI: 10.1111/sjos.12381] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Byungwon Kim
- Department of StatisticsUniversity of Pittsburgh Pittsburgh Pennsylvania
| | - Stephan Huckemann
- Felix Bernstein Institute for Mathematical Statistics in the BiosciencesUniversity of Göttingen Göttingen Germany
| | - Jörn Schulz
- Department of Electrical and Computer EngineeringUniversity of Stavanger Stavanger Norway
| | - Sungkyu Jung
- Department of StatisticsUniversity of Pittsburgh Pittsburgh Pennsylvania
- Department of StatisticsSeoul National University Seoul South Korea
| |
Collapse
|