1
|
Yoshidome T. Four-dimensional imaging for cryo-electron microscopy experiments using molecular simulations and manifold learning. J Comput Chem 2024; 45:738-751. [PMID: 38112413 DOI: 10.1002/jcc.27290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/20/2023] [Accepted: 12/01/2023] [Indexed: 12/21/2023]
Abstract
Elucidating protein conformational changes is essential because conformational changes are closely related to the functions of proteins. Cryo-electron microscopy (cryo-EM) experiment can be used to reconstruct protein conformational changes via a method that involves using the experimental data (two-dimensional protein images). In this study, a reconstruction method, referred to as the "four-dimensional imaging," was proposed. In our four-dimensional imaging technique, the protein conformational change was obtained using the two-dimensional protein images (the three-dimensional electron density maps used in previously proposed techniques were not used). The protein conformation for each two-dimensional protein image was obtained using our original protocol with molecular dynamics simulations. Using a manifold-learning technique and two-dimensional protein images, the protein conformations were arranged according to the conformational change of the protein. By arranging the protein conformations according to the arrangement of the protein images, four-dimensional imaging is constructed. A simulation for a cryo-EM experiment demonstrated the validity of our four-dimensional imaging technique.
Collapse
Affiliation(s)
- Takashi Yoshidome
- Department of Applied Physics, Graduate School of Engineering, Tohoku University, Sendai, Japan
| |
Collapse
|
2
|
Busch EL, Conley MI, Baskin-Sommers A. Manifold learning uncovers nonlinear interactions between the adolescent brain and social environment that predict psychopathology. bioRxiv 2024:2024.02.29.582854. [PMID: 38496476 PMCID: PMC10942356 DOI: 10.1101/2024.02.29.582854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Background Advanced statistical methods to model the interplay between adolescents and their social environments are essential for understanding how differences in brain function contribute to psychopathology. To progress adolescent mental health research beyond our present achievements - a complex account of brain and environmental risk factors without understanding the neurobiological embedding of the social environment - we need methods to unveil relationships between the developing brain and real-world environmental experiences. Methods Here, we investigated associations among psychopathology, social environments, and brain function using participants from the Adolescent Brain and Cognitive Development Study (N=5,235; 2,672 female). Manifold learning is a promising technique for uncovering latent structure from high-dimensional biomedical data like functional magnetic resonance imaging (fMRI). To model brain-social environment interactions and psychopathology, we developed a manifold learning technique called exogenous PHATE (E-PHATE). We used E-PHATE embeddings of participants' brain activation during emotional and cognitive processing to predict measures of cognition and psychopathology both cross-sectionally and longitudinally. Results Manifold embeddings of brain activation highlight individual differences in cognition and in psychopathology symptoms which are obscured in high-dimensional (voxel-wise) activity. Specifically, E-PHATE embeddings of participants' brain activation and social environments at baseline relate to overall psychopathology, externalizing, and internalizing behaviors at both the baseline and at a 2-year follow-up. Conclusions Our findings indicate that the adolescent brain's embedding in the social environment yields enriched insight into psychopathology. Using E-PHATE, we demonstrate how the harmonization of cutting-edge computational methods with longstanding developmental theories advances detection and prediction of adolescent psychopathology.
Collapse
Affiliation(s)
- Erica L. Busch
- Yale University, Department of Psychology, New Haven, CT, USA
| | - May I. Conley
- Yale University, Department of Psychology, New Haven, CT, USA
| | | |
Collapse
|
3
|
Song W, Zhang X, Yang G, Chen Y, Wang L, Xu H. A Study on Dimensionality Reduction and Parameters for Hyperspectral Imagery Based on Manifold Learning. Sensors (Basel) 2024; 24:2089. [PMID: 38610302 PMCID: PMC11014055 DOI: 10.3390/s24072089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/09/2024] [Accepted: 03/23/2024] [Indexed: 04/14/2024]
Abstract
With the rapid advancement of remote-sensing technology, the spectral information obtained from hyperspectral remote-sensing imagery has become increasingly rich, facilitating detailed spectral analysis of Earth's surface objects. However, the abundance of spectral information presents certain challenges for data processing, such as the "curse of dimensionality" leading to the "Hughes phenomenon", "strong correlation" due to high resolution, and "nonlinear characteristics" caused by varying surface reflectances. Consequently, dimensionality reduction of hyperspectral data emerges as a critical task. This paper begins by elucidating the principles and processes of hyperspectral image dimensionality reduction based on manifold theory and learning methods, in light of the nonlinear structures and features present in hyperspectral remote-sensing data, and formulates a dimensionality reduction process based on manifold learning. Subsequently, this study explores the capabilities of feature extraction and low-dimensional embedding for hyperspectral imagery using manifold learning approaches, including principal components analysis (PCA), multidimensional scaling (MDS), and linear discriminant analysis (LDA) for linear methods; and isometric mapping (Isomap), locally linear embedding (LLE), Laplacian eigenmaps (LE), Hessian locally linear embedding (HLLE), local tangent space alignment (LTSA), and maximum variance unfolding (MVU) for nonlinear methods, based on the Indian Pines hyperspectral dataset and Pavia University dataset. Furthermore, the paper investigates the optimal neighborhood computation time and overall algorithm runtime for feature extraction in hyperspectral imagery, varying by the choice of neighborhood k and intrinsic dimensionality d values across different manifold learning methods. Based on the outcomes of feature extraction, the study examines the classification experiments of various manifold learning methods, comparing and analyzing the variations in classification accuracy and Kappa coefficient with different selections of neighborhood k and intrinsic dimensionality d values. Building on this, the impact of selecting different bandwidths t for the Gaussian kernel in the LE method and different Lagrange multipliers λ for the MVU method on classification accuracy, given varying choices of neighborhood k and intrinsic dimensionality d, is explored. Through these experiments, the paper investigates the capability and effectiveness of different manifold learning methods in feature extraction and dimensionality reduction within hyperspectral imagery, as influenced by the selection of neighborhood k and intrinsic dimensionality d values, identifying the optimal neighborhood k and intrinsic dimensionality d value for each method. A comparison of classification accuracies reveals that the LTSA method yields superior classification results compared to other manifold learning approaches. The study demonstrates the advantages of manifold learning methods in processing hyperspectral image data, providing an experimental reference for subsequent research on hyperspectral image dimensionality reduction using manifold learning methods.
Collapse
Affiliation(s)
- Wenhui Song
- College of Geoscience and Surveying Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China; (W.S.); (L.W.); (H.X.)
| | - Xin Zhang
- Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100101, China;
| | - Guozhu Yang
- State Grid General Aviation Co., Ltd., Beijing 102209, China;
| | - Yijin Chen
- College of Geoscience and Surveying Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China; (W.S.); (L.W.); (H.X.)
| | - Lianchao Wang
- College of Geoscience and Surveying Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China; (W.S.); (L.W.); (H.X.)
| | - Hanghang Xu
- College of Geoscience and Surveying Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China; (W.S.); (L.W.); (H.X.)
| |
Collapse
|
4
|
Choi H, Byeon K, Lee J, Hong S, Park B, Park H. Identifying subgroups of eating behavior traits unrelated to obesity using functional connectivity and feature representation learning. Hum Brain Mapp 2024; 45:e26581. [PMID: 38224537 PMCID: PMC10789215 DOI: 10.1002/hbm.26581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/13/2023] [Accepted: 12/20/2023] [Indexed: 01/17/2024] Open
Abstract
Eating behavior is highly heterogeneous across individuals and cannot be fully explained using only the degree of obesity. We utilized unsupervised machine learning and functional connectivity measures to explore the heterogeneity of eating behaviors measured by a self-assessment instrument using 424 healthy adults (mean ± standard deviation [SD] age = 47.07 ± 18.89 years; 67% female). We generated low-dimensional representations of functional connectivity using resting-state functional magnetic resonance imaging and estimated latent features using the feature representation capabilities of an autoencoder by nonlinearly compressing the functional connectivity information. The clustering approaches applied to latent features identified three distinct subgroups. The subgroups exhibited different levels of hunger traits, while their body mass indices were comparable. The results were replicated in an independent dataset consisting of 212 participants (mean ± SD age = 38.97 ± 19.80 years; 35% female). The model interpretation technique of integrated gradients revealed that the between-group differences in the integrated gradient maps were associated with functional reorganization in heteromodal association and limbic cortices and reward-related subcortical structures such as the accumbens, amygdala, and caudate. The cognitive decoding analysis revealed that these systems are associated with reward- and emotion-related systems. Our findings provide insights into the macroscopic brain organization of eating behavior-related subgroups independent of obesity.
Collapse
Affiliation(s)
- Hyoungshin Choi
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwonRepublic of Korea
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwonRepublic of Korea
| | | | - Jong‐eun Lee
- Department of Electrical and Computer EngineeringSungkyunkwan UniversitySuwonRepublic of Korea
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwonRepublic of Korea
| | - Seok‐Jun Hong
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwonRepublic of Korea
- Center for the Developing BrainChild Mind InstituteNew YorkUSA
- Department of Biomedical EngineeringSungkyunkwan UniversitySuwonRepublic of Korea
| | - Bo‐yong Park
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwonRepublic of Korea
- Department of Data ScienceInha UniversityIncheonRepublic of Korea
- Department of Statistics and Data ScienceInha UniversityIncheonRepublic of Korea
| | - Hyunjin Park
- Center for Neuroscience Imaging ResearchInstitute for Basic ScienceSuwonRepublic of Korea
- School of Electronic and Electrical EngineeringSungkyunkwan UniversitySuwonRepublic of Korea
| |
Collapse
|
5
|
Gallos IK, Tryfonopoulos D, Shani G, Amditis A, Haick H, Dionysiou DD. Advancing Colorectal Cancer Diagnosis with AI-Powered Breathomics: Navigating Challenges and Future Directions. Diagnostics (Basel) 2023; 13:3673. [PMID: 38132257 PMCID: PMC10743128 DOI: 10.3390/diagnostics13243673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
Early detection of colorectal cancer is crucial for improving outcomes and reducing mortality. While there is strong evidence of effectiveness, currently adopted screening methods present several shortcomings which negatively impact the detection of early stage carcinogenesis, including low uptake due to patient discomfort. As a result, developing novel, non-invasive alternatives is an important research priority. Recent advancements in the field of breathomics, the study of breath composition and analysis, have paved the way for new avenues for non-invasive cancer detection and effective monitoring. Harnessing the utility of Volatile Organic Compounds in exhaled breath, breathomics has the potential to disrupt colorectal cancer screening practices. Our goal is to outline key research efforts in this area focusing on machine learning methods used for the analysis of breathomics data, highlight challenges involved in artificial intelligence application in this context, and suggest possible future directions which are currently considered within the framework of the European project ONCOSCREEN.
Collapse
Affiliation(s)
- Ioannis K. Gallos
- Institute of Communication and Computer Systems, National Technical University of Athens, Zografos Campus, 15780 Athens, Greece; (D.T.); (A.A.)
| | - Dimitrios Tryfonopoulos
- Institute of Communication and Computer Systems, National Technical University of Athens, Zografos Campus, 15780 Athens, Greece; (D.T.); (A.A.)
| | - Gidi Shani
- Laboratory for Nanomaterial-Based Devices, Technion—Israel Institute of Technology, Haifa 3200003, Israel; (G.S.); (H.H.)
| | - Angelos Amditis
- Institute of Communication and Computer Systems, National Technical University of Athens, Zografos Campus, 15780 Athens, Greece; (D.T.); (A.A.)
| | - Hossam Haick
- Laboratory for Nanomaterial-Based Devices, Technion—Israel Institute of Technology, Haifa 3200003, Israel; (G.S.); (H.H.)
| | - Dimitra D. Dionysiou
- Institute of Communication and Computer Systems, National Technical University of Athens, Zografos Campus, 15780 Athens, Greece; (D.T.); (A.A.)
| |
Collapse
|
6
|
Olson RH, Cohen Kalafut N, Wang D. MANGEM: A web app for multimodal analysis of neuronal gene expression, electrophysiology, and morphology. Patterns (N Y) 2023; 4:100847. [PMID: 38035195 PMCID: PMC10682747 DOI: 10.1016/j.patter.2023.100847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 08/07/2023] [Accepted: 09/01/2023] [Indexed: 12/02/2023]
Abstract
Single-cell techniques like Patch-seq have enabled the acquisition of multimodal data from individual neuronal cells, offering systematic insights into neuronal functions. However, these data can be heterogeneous and noisy. To address this, machine learning methods have been used to align cells from different modalities onto a low-dimensional latent space, revealing multimodal cell clusters. The use of those methods can be challenging without computational expertise or suitable computing infrastructure for computationally expensive methods. To address this, we developed a cloud-based web application, MANGEM (multimodal analysis of neuronal gene expression, electrophysiology, and morphology). MANGEM provides a step-by-step accessible and user-friendly interface to machine learning alignment methods of neuronal multimodal data. It can run asynchronously for large-scale data alignment, provide users with various downstream analyses of aligned cells, and visualize the analytic results. We demonstrated the usage of MANGEM by aligning multimodal data of neuronal cells in the mouse visual cortex.
Collapse
Affiliation(s)
| | - Noah Cohen Kalafut
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
7
|
Greenstreet L, Afanassiev A, Kijima Y, Heitz M, Ishiguro S, King S, Yachie N, Schiebinger G. DNA-GPS: A theoretical framework for optics-free spatial genomics and synthesis of current methods. Cell Syst 2023; 14:844-859.e4. [PMID: 37751737 DOI: 10.1016/j.cels.2023.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 04/19/2023] [Accepted: 08/25/2023] [Indexed: 09/28/2023]
Abstract
While single-cell sequencing technologies provide unprecedented insights into genomic profiles at the cellular level, they lose the spatial context of cells. Over the past decade, diverse spatial transcriptomics and multi-omics technologies have been developed to analyze molecular profiles of tissues. In this article, we categorize current spatial genomics technologies into three classes: optical imaging, positional indexing, and mathematical cartography. We discuss trade-offs in resolution and scale, identify limitations, and highlight synergies between existing single-cell and spatial genomics methods. Further, we propose DNA-GPS (global positioning system), a theoretical framework for large-scale optics-free spatial genomics that combines ideas from mathematical cartography and positional indexing. DNA-GPS has the potential to achieve scalable spatial genomics for multiple measurement modalities, and by eliminating the need for optical measurement, it has the potential to position cells in three-dimensions (3D).
Collapse
Affiliation(s)
- Laura Greenstreet
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada
| | - Anton Afanassiev
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada
| | - Yusuke Kijima
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada; Department of Aquatic Bioscience, The University of Tokyo, Tokyo, Japan
| | - Matthieu Heitz
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada
| | - Soh Ishiguro
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada
| | - Samuel King
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada
| | - Nozomu Yachie
- School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada; Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan; Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, Osaka, Japan; Graduate School of Media and Governance, Keio University, Fujisawa, Japan.
| | - Geoffrey Schiebinger
- Department of Mathematics, The University of British Columbia, Vancouver, BC, Canada; School of Biomedical Engineering, The University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
8
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. Cell Rep Methods 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
9
|
Cai H, Sheng X, Wu G, Hu B, Cheung YM, Chen J. Brain Network Classification for Accurate Detection of Alzheimer's Disease via Manifold Harmonic Discriminant Analysis. IEEE Trans Neural Netw Learn Syst 2023; PP:10.1109/TNNLS.2023.3301456. [PMID: 37566497 PMCID: PMC10858979 DOI: 10.1109/tnnls.2023.3301456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/13/2023]
Abstract
Mounting evidence shows that Alzheimer's disease (AD) manifests the dysfunction of the brain network much earlier before the onset of clinical symptoms, making its early diagnosis possible. Current brain network analyses treat high-dimensional network data as a regular matrix or vector, which destroys the essential network topology, thereby seriously affecting diagnosis accuracy. In this context, harmonic waves provide a solid theoretical background for exploring brain network topology. However, the harmonic waves are originally intended to discover neurological disease propagation patterns in the brain, which makes it difficult to accommodate brain disease diagnosis with high heterogeneity. To address this challenge, this article proposes a network manifold harmonic discriminant analysis (MHDA) method for accurately detecting AD. Each brain network is regarded as an instance drawn on a Stiefel manifold. Every instance is represented by a set of orthonormal eigenvectors (i.e., harmonic waves) derived from its Laplacian matrix, which fully respects the topological structure of the brain network. An MHDA method within the Stiefel space is proposed to identify the group-dependent common harmonic waves, which can be used as group-specific references for downstream analyses. Extensive experiments are conducted to demonstrate the effectiveness of the proposed method in stratifying cognitively normal (CN) controls, mild cognitive impairment (MCI), and AD.
Collapse
Affiliation(s)
- Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China
| | - Xiaoqi Sheng
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China
| | - Guorong Wu
- Department of Psychiatry and Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bin Hu
- School of Medical Technology at Beijing Institute of Technology, Beijing Institute of Technology, Beijing, China
| | - Yiu-Ming Cheung
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China
| | - Jiazhou Chen
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong, China
| |
Collapse
|
10
|
Xiu G, Chen H. Unravelling the variations of the society of England and Wales through diffusion mapping analysis of census 2011. J R Soc Interface 2023; 20:20230081. [PMID: 37608714 PMCID: PMC10445034 DOI: 10.1098/rsif.2023.0081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 07/31/2023] [Indexed: 08/24/2023] Open
Abstract
We propose a new approach to identify geographical clustering and inequality hotspots from decadal census data, with a particular emphasis on the method itself. Our method uses diffusion mapping to study the 181 408 output areas in England and Wales (EW), which enables us to decompose the census data's EW-specific feature structures. We further introduce a localization metric, inspired by statistical physics, to reveal the significance of minority groups in London. Our findings can be adapted to analogous datasets, illuminating spatial patterns and differentiating within datasets, especially when meaning factors for determining the datasets' structure are scarce and spatially heterogeneous. This approach enhances our ability to describe and explore patterns of social deprivation and segregation across the country, thereby contributing to the development of targeted policies. We also underscore the method's intrinsic objectivity, guaranteeing its ability to offer comprehensive and unbiased analysis, unswayed by preconceived hypotheses or subjective interpretations of data patterns.
Collapse
Affiliation(s)
- Gezhi Xiu
- School of Earth and Space Sciences, Peking University, Beijing, People’s Republic of China
- Centre for Complexity Sciences and Department of Mathematics, Imperial College London, London, UK
| | - Huanfa Chen
- Centre for Advanced Spatial Analysis (CASA), University College London, London, UK
| |
Collapse
|
11
|
Shi S, Xu Y, Xu X, Mo X, Ding J. A Preprocessing Manifold Learning Strategy Based on t-Distributed Stochastic Neighbor Embedding. Entropy (Basel) 2023; 25:1065. [PMID: 37510011 PMCID: PMC10378244 DOI: 10.3390/e25071065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/01/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023]
Abstract
In machine learning and data analysis, dimensionality reduction and high-dimensional data visualization can be accomplished by manifold learning using a t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm. We significantly improve this manifold learning scheme by introducing a preprocessing strategy for the t-SNE algorithm. In our preprocessing, we exploit Laplacian eigenmaps to reduce the high-dimensional data first, which can aggregate each data cluster and reduce the Kullback-Leibler divergence (KLD) remarkably. Moreover, the k-nearest-neighbor (KNN) algorithm is also involved in our preprocessing to enhance the visualization performance and reduce the computation and space complexity. We compare the performance of our strategy with that of the standard t-SNE on the MNIST dataset. The experiment results show that our strategy exhibits a stronger ability to separate different clusters as well as keep data of the same kind much closer to each other. Moreover, the KLD can be reduced by about 30% at the cost of increasing the complexity in terms of runtime by only 1-2%.
Collapse
Affiliation(s)
- Sha Shi
- State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| | - Yefei Xu
- State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| | - Xiaoyang Xu
- State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| | - Xiaofan Mo
- National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, China
| | - Jun Ding
- Institute of Information Sensing, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| |
Collapse
|
12
|
Gonzalez-Castillo J, Fernandez IS, Lam KC, Handwerker DA, Pereira F, Bandettini PA. Manifold learning for fMRI time-varying functional connectivity. Front Hum Neurosci 2023; 17:1134012. [PMID: 37497043 PMCID: PMC10366614 DOI: 10.3389/fnhum.2023.1134012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Whole-brain functional connectivity (FC) measured with functional MRI (fMRI) evolves over time in meaningful ways at temporal scales going from years (e.g., development) to seconds [e.g., within-scan time-varying FC (tvFC)]. Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers often seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) hoping those will retain important aspects of the data (e.g., relationships to behavior and disease progression). Limited prior empirical work suggests that manifold learning techniques (MLTs)-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tvFC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (ID; i.e., minimum number of latent dimensions) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs: Laplacian Eigenmaps (LEs), T-distributed Stochastic Neighbor Embedding (T-SNE), and Uniform Manifold Approximation and Projection (UMAP). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but LE could only capture one at a time. We observed substantial variability in embedding quality across MLTs, and within-MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.
Collapse
Affiliation(s)
- Javier Gonzalez-Castillo
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
| | - Isabel S. Fernandez
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
| | - Ka Chun Lam
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD, United States
| | - Daniel A. Handwerker
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
| | - Francisco Pereira
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD, United States
| | - Peter A. Bandettini
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD, United States
- Functional Magnetic Resonance Imaging (FMRI) Core, National Institute of Mental Health, Bethesda, MD, United States
| |
Collapse
|
13
|
Leon-Medina JX, Anaya M, Tibaduiza DA. New Electronic Tongue Sensor Array System for Accurate Liquor Beverage Classification. Sensors (Basel) 2023; 23:6178. [PMID: 37448027 DOI: 10.3390/s23136178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 06/22/2023] [Accepted: 06/28/2023] [Indexed: 07/15/2023]
Abstract
The use of sensors in different applications to improve the monitoring of a process and its variables is required as it enables information to be obtained directly from the process by ensuring its quality. This is now possible because of the advances in the fabrication of sensors and the development of equipment with a high processing capability. These elements enable the development of portable smart systems that can be used directly in the monitoring of the process and the testing of variables, which, in some cases, must evaluated by laboratory tests to ensure high-accuracy measurement results. One of these processes is taste recognition and, in general, the classification of liquids, where electronic tongues have presented some advantages compared with traditional monitoring because of the time reduction for the analysis, the possibility of online monitoring, and the use of strategies of artificial intelligence for the analysis of the data. However, although some methods and strategies have been developed, it is necessary to continue in the development of strategies that enable the results in the analysis of the data from electrochemical sensors to be improved. In this way, this paper explores the application of an electronic tongue system in the classification of liquor beverages, which was directly applied to an alcoholic beverage found in specific regions of Colombia. The system considers the use of eight commercial sensors and a data acquisition system with a machine-learning-based methodology developed for this aim. Results show the advantages of the system and its accuracy in the analysis and classification of this kind of alcoholic beverage.
Collapse
Affiliation(s)
- Jersson X Leon-Medina
- Department of Mechanical and Mechatronics Engineering, Universidad Nacional de Colombia-Sede Bogotá, Bogotá 111321, Colombia
- Control, Data and Artificial Intelligence (CoDAlab), Department of Mathematics, Escola d'Enginyeria de Barcelona Est (EEBE), Universitat Politècnica de Catalunya (UPC), 08019 Barcelona, Spain
| | - Maribel Anaya
- Department of Electrical and Electronic Engineering, Universidad Nacional de Colombia-Sede Bogotá, Bogotá 111321, Colombia
| | - Diego A Tibaduiza
- Department of Electrical and Electronic Engineering, Universidad Nacional de Colombia-Sede Bogotá, Bogotá 111321, Colombia
| |
Collapse
|
14
|
Villa A, Ingelaere S, Jacobs B, Vandenberk B, Van Huffel S, Willems R, Varon C. A unified framework for multi-lead ECG characterization using Laplacian Eigenmaps. Physiol Meas 2023. [PMID: 37336241 DOI: 10.1088/1361-6579/acdfb4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
BACKGROUND The analysis of multi-lead electrocardiographic (ECG) signals requires integrating the information derived from each lead to reach clinically relevant conclusions. This analysis could benefit from methods compacting the information in those leads into lower-dimensional representations (i.e., 2 or 3 dimensions instead of 12). OBJECTIVE We propose Laplacian Eigenmaps (LE) to create a unified framework where ECGs from different subjects can be compared and their abnormalities are enhanced. APPROACH We conceive a normal reference ECG space based on LE, calculated using signals of healthy subjects in sinus rhythm. Signals from new subjects can be mapped onto this reference space creating a loop per heartbeat that captures ECG abnormalities. A set of parameters, based on distance metrics and on the shape of loops, are proposed to quantify the differences between subjects. MAIN RESULTS This methodology was applied to find structural and arrhythmogenic changes in the ECG. The LE framework consistently captured the characteristics of healthy ECGs, confirming that normal signals behaved similarly in the LE space. Significant differences between normal signals and those from patients with ischemic heart disease or dilated cardiomyopathy were detected. In contrast, LE biomarkers did not identify differences between patients with cardiomyopathy and a history of ventricular arrhythmia and their matched controls. SIGNIFICANCE This LE unified framework offers a new representation of multi-lead signals, reducing dimensionality while enhancing imperceptible abnormalities and enabling the comparison of signals of different subjects.
Collapse
Affiliation(s)
- Amalia Villa
- STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven Department of Electrical Engineering, Kasteelpark Arenberg 10 postbus 2440, Leuven, 3001, BELGIUM
| | - Sebastian Ingelaere
- Department of Cardiovascular Diseases, Experimental Cardiology, KU Leuven, Herestraat 49 box 911, Leuven, 3000, BELGIUM
| | - Ben Jacobs
- Cochlear Benelux NV, Schaliënhoevedreef 20, Mechelen, 2800, BELGIUM
| | - Bert Vandenberk
- Department of Cardiac Sciences, Libin Cardiovascular Institute, Cumming School of Medicine, University of Calgary, University Drive NW, Calgary, Alberta, 2500 , CANADA
| | - Sabine Van Huffel
- Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Leuven, 3001, BELGIUM
| | - Rik Willems
- Department of Cardiovascular Diseases, Experimental Cardiology, KU Leuven, Herestraat 49 box 911, Leuven, 3000, BELGIUM
| | - Carolina Varon
- Microgravity Research Center, Universite Libre de Bruxelles, Campus du Solbosch, Bat. U, Porte D, Niveau 3 Av. F. D. Roosevelt, 50 CP 165/62, Brussels, B-1050, BELGIUM
| |
Collapse
|
15
|
Gunawardena R, Sarrigiannis PG, Blackburn DJ, He F. Kernel-based Nonlinear Manifold Learning for EEG-based Functional Connectivity Analysis and Channel Selection with Application to Alzheimer's Disease. Neuroscience 2023:S0306-4522(23)00253-1. [PMID: 37301505 DOI: 10.1016/j.neuroscience.2023.05.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 05/15/2023] [Accepted: 05/29/2023] [Indexed: 06/12/2023]
Abstract
Dynamical, causal, and cross-frequency coupling analysis using the electroencephalogram (EEG) has gained significant attention for diagnosing and characterizing neurological disorders. Selecting important EEG channels is crucial for reducing computational complexity in implementing these methods and improving classification accuracy. In neuroscience, measures of (dis)similarity between EEG channels are often used as functional connectivity (FC) features, and important channels are selected via feature selection. Developing a generic measure of (dis)similarity is important for FC analysis and channel selection. In this study, learning of (dis)similarity information within the EEG is achieved using kernel-based nonlinear manifold learning. The focus is on FC changes and, thereby, EEG channel selection. Isomap and Gaussian Process Latent Variable Model (Isomap-GPLVM) are employed for this purpose. The resulting kernel (dis)similarity matrix is used as a novel measure of linear and nonlinear FC between EEG channels. The analysis of EEG from healthy controls (HC) and patients with mild to moderate Alzheimer's disease (AD) are presented as a case study. Classification results are compared with other commonly used FC measures. Our analysis shows significant differences in FC between bipolar channels of the occipital region and other regions (i.e. parietal, centro-parietal, and fronto-central) between AD and HC groups. Furthermore, our results indicate that FC changes between channels along the fronto-parietal region and the rest of the EEG are important in diagnosing AD. Our results and its relation to functional networks are consistent with those obtained from previous studies using fMRI, resting-state fMRI and EEG.
Collapse
Affiliation(s)
- Rajintha Gunawardena
- Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry, CV1 5FB, UK
| | | | - Daniel J Blackburn
- Department of Neuroscience, The University of Sheffield, Sheffield, S10 2HQ, UK
| | - Fei He
- Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry, CV1 5FB, UK.
| |
Collapse
|
16
|
Merkurjev E, Nguyen DD, Wei GW. Multiscale Laplacian Learning. APPL INTELL 2023; 53:15727-15746. [PMID: 38031564 PMCID: PMC10686291 DOI: 10.1007/s10489-022-04333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2022] [Indexed: 11/29/2022]
Abstract
Machine learning has greatly influenced many fields, including science. However, despite of the tremendous accomplishments of machine learning, one of the key limitations of most existing machine learning approaches is their reliance on large labeled sets, and thus, data with limited labeled samples remains a challenge. Moreover, the performance of machine learning methods often severely hindered in case of diverse data, usually associated with smaller data sets or data associated with areas of study where the size of the data sets is constrained by high experimental cost and/or ethics. These challenges call for innovative strategies for dealing with these types of data. In this work, the aforementioned challenges are addressed by integrating graph-based frameworks, semi-supervised techniques, multiscale structures, and modified and adapted optimization procedures. This results in two innovative multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for tackling data with limited samples, diverse data, and small data sets. The first approach, multikernel manifold learning (MML), integrates manifold learning with multikernel information and incorporates a warped kernel regularizer using multiscale graph Laplacians. The second approach, the multiscale MBO (MMBO) method, introduces multiscale Laplacians to the modification of the famous classical Merriman-Bence-Osher (MBO) scheme, and makes use of fast solvers. We demonstrate the performance of our algorithms experimentally on a variety of benchmark data sets, and compare them favorably to the state-of-art approaches.
Collapse
Affiliation(s)
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, KY 40506, USA
| | - Guo-Wei Wei
- Department of Mathematics, Department of Biochemistry and Molecular Biology, Department of Electrical and Computer Engineering Michigan State University, MI 48824, USA
| |
Collapse
|
17
|
Massing JC, Fahimipour AK, Bunse C, Pinhassi J, Gross T. Quantification of metabolic niche occupancy dynamics in a Baltic Sea bacterial community. mSystems 2023:e0002823. [PMID: 37255288 DOI: 10.1128/msystems.00028-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023] Open
Abstract
Progress in molecular methods has enabled the monitoring of bacterial populations in time. Nevertheless, understanding community dynamics and its links with ecosystem functioning remains challenging due to the tremendous diversity of microorganisms. Conceptual frameworks that make sense of time-series of taxonomically-rich bacterial communities, regarding their potential ecological function, are needed. A key concept for organizing ecological functions is the niche, the set of strategies that enable a population to persist and define its impacts on the surroundings. Here we present a framework based on manifold learning, to organize genomic information into potentially occupied bacterial metabolic niches over time. Manifold learning tries to uncover low-dimensional data structures in high-dimensional datasets, that can be used to describe the data in reduced dimensions. We apply the method to re-construct the dynamics of putatively occupied metabolic niches using a long-term bacterial time-series from the Baltic Sea, the Linnaeus Microbial Observatory (LMO). The results reveal a relatively low-dimensional space of occupied metabolic niches comprising groups of taxa with similar functional capabilities. Time patterns of occupied niches were strongly driven by seasonality. Some metabolic niches were dominated by one bacterial taxon whereas others were occupied by multiple taxa, depending on season. These results illustrate the power of manifold learning approaches to advance our understanding of the links between community composition and functioning in microbial systems.IMPORTANCEThe increase in data availability of bacterial communities highlights the need for conceptual frameworks to advance our understanding of these complex and diverse communities alongside the production of such data. To understand the dynamics of these tremendously diverse communities, we need tools to identify overarching strategies and describe their role and function in the ecosystem in a comprehensive way. Here, we show that a manifold learning approach can coarse grain bacterial communities in terms of their metabolic strategies and that we can thereby quantitatively organize genomic information in terms of potentially occupied niches over time. This approach therefore advances our understanding of how fluctuations in bacterial abundances and species composition can relate to ecosystem functions and it can facilitate the analysis, monitoring and future predictions of the development of microbial communities.
Collapse
Affiliation(s)
- Jana C Massing
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB) at the University of Oldenburg, Oldenburg, Germany
- Helmholtz Centre for Marine and Polar Research, Alfred-Wegener-Institute, Bremerhaven, Germany
- Institute for Chemistry and Biology of the Marine Environment (ICBM) Carl-von-Ossietzky University, Oldenburg, Germany
| | - Ashkaan K Fahimipour
- Department of Biological Sciences, Florida Atlantic University, Boca Raton, Florida, USA
| | - Carina Bunse
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB) at the University of Oldenburg, Oldenburg, Germany
- Institute for Chemistry and Biology of the Marine Environment (ICBM) Carl-von-Ossietzky University, Oldenburg, Germany
- Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Jarone Pinhassi
- Centre for Ecology and Evolution in Microbial Model Systems (EEMiS), Linnaeus University, Kalmar, Sweden
| | - Thilo Gross
- Helmholtz Institute for Functional Marine Biodiversity (HIFMB) at the University of Oldenburg, Oldenburg, Germany
- Helmholtz Centre for Marine and Polar Research, Alfred-Wegener-Institute, Bremerhaven, Germany
- Institute for Chemistry and Biology of the Marine Environment (ICBM) Carl-von-Ossietzky University, Oldenburg, Germany
| |
Collapse
|
18
|
Chang X, Hallais S, Danas K, Roux S. PeakForce AFM Analysis Enhanced with Model Reduction Techniques. Sensors (Basel) 2023; 23:4730. [PMID: 37430644 DOI: 10.3390/s23104730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 04/28/2023] [Accepted: 05/11/2023] [Indexed: 07/12/2023]
Abstract
PeakForce quantitative nanomechanical AFM mode (PF-QNM) is a popular AFM technique designed to measure multiple mechanical features (e.g., adhesion, apparent modulus, etc.) simultaneously at the exact same spatial coordinates with a robust scanning frequency. This paper proposes compressing the initial high-dimensional dataset obtained from the PeakForce AFM mode into a subset of much lower dimensionality by a sequence of proper orthogonal decomposition (POD) reduction and subsequent machine learning on the low-dimensionality data. A substantial reduction in user dependency and subjectivity of the extracted results is obtained. The underlying parameters, or "state variables", governing the mechanical response can be easily extracted from the latter using various machine learning techniques. Two samples are investigated to illustrate the proposed procedure (i) a polystyrene film with low-density polyethylene nano-pods and (ii) a PDMS film with carbon-iron particles. The heterogeneity of material, as well as the sharp variation in topography, make the segmentation challenging. Nonetheless, the underlying parameters describing the mechanical response naturally offer a compact representation allowing for a more straightforward interpretation of the high-dimensional force-indentation data in terms of the nature (and proportion) of phases, interfaces, or topography. Finally, those techniques come with a low processing time cost and do not require a prior mechanical model.
Collapse
Affiliation(s)
- Xuyang Chang
- Université Paris-Saclay/CentraleSupélec/ENS Paris-Saclay/C.N.R.S., LMPS-Laboratoire de Mécanique Paris-Saclay, 91190 Gif-sur-Yvette, France
- LMS, C.N.R.S., École Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France
| | - Simon Hallais
- LMS, C.N.R.S., École Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France
| | - Kostas Danas
- LMS, C.N.R.S., École Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France
| | - Stéphane Roux
- Université Paris-Saclay/CentraleSupélec/ENS Paris-Saclay/C.N.R.S., LMPS-Laboratoire de Mécanique Paris-Saclay, 91190 Gif-sur-Yvette, France
| |
Collapse
|
19
|
Ma C, Han PK, Zhuo Y, Djebra Y, Marin T, El Fakhri G. Joint spectral quantification of MR spectroscopic imaging using linear tangent space alignment-based manifold learning. Magn Reson Med 2023; 89:1297-1313. [PMID: 36404676 PMCID: PMC9892363 DOI: 10.1002/mrm.29526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 10/07/2022] [Accepted: 10/24/2022] [Indexed: 11/22/2022]
Abstract
PURPOSE To develop a manifold learning-based method that leverages the intrinsic low-dimensional structure of MR Spectroscopic Imaging (MRSI) signals for joint spectral quantification. METHODS A linear tangent space alignment (LTSA) model was proposed to represent MRSI signals. In the proposed model, the signals of each metabolite were represented using a subspace model and the local coordinates of the subspaces were aligned to the global coordinates of the underlying low-dimensional manifold via linear transform. With the basis functions of the subspaces predetermined via quantum mechanics simulations, the global coordinates and the matrices for the local-to-global coordinate alignment were estimated by fitting the proposed LTSA model to noisy MRSI data with a spatial smoothness constraint on the global coordinates and a sparsity constraint on the matrices. RESULTS The performance of the proposed method was validated using numerical simulation data and in vivo proton-MRSI experimental data acquired on healthy volunteers at 3T. The results of the proposed method were compared with the QUEST method and the subspace-based method. In all the compared cases, the proposed method achieved superior performance over the QUEST and the subspace-based methods both qualitatively in terms of noise and artifacts in the estimated metabolite concentration maps, and quantitatively in terms of spectral quantification accuracy measured by normalized root mean square errors. CONCLUSION Joint spectral quantification using linear tangent space alignment-based manifold learning improves the accuracy of MRSI spectral quantification.
Collapse
Affiliation(s)
- Chao Ma
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA,Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Kyu Han
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA,Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Yue Zhuo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA,Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Yanis Djebra
- Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA,LTCI, Telecom Paris, Institut Polytechnique de Paris, Paris, France
| | - Thibault Marin
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA,Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, USA,Department of Radiology, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
20
|
Chen X, Li Y, Chen C. An Online Hashing Algorithm for Image Retrieval Based on Optical-Sensor Network. Sensors (Basel) 2023; 23:2576. [PMID: 36904780 PMCID: PMC10007520 DOI: 10.3390/s23052576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/20/2023] [Accepted: 02/24/2023] [Indexed: 06/18/2023]
Abstract
Online hashing is a valid storage and online retrieval scheme, which is meeting the rapid increase in data in the optical-sensor network and the real-time processing needs of users in the era of big data. Existing online-hashing algorithms rely on data tags excessively to construct the hash function, and ignore the mining of the structural features of the data itself, resulting in a serious loss of the image-streaming features and the reduction in retrieval accuracy. In this paper, an online hashing model that fuses global and local dual semantics is proposed. First, to preserve the local features of the streaming data, an anchor hash model, which is based on the idea of manifold learning, is constructed. Second, a global similarity matrix, which is used to constrain hash codes is built by the balanced similarity between the newly arrived data and previous data, which makes hash codes retain global data features as much as possible. Then, under a unified framework, an online hash model that integrates global and local dual semantics is learned, and an effective discrete binary-optimization solution is proposed. A large number of experiments on three datasets, including CIFAR10, MNIST and Places205, show that our proposed algorithm improves the efficiency of image retrieval effectively, compared with several existing advanced online-hashing algorithms.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yanlong Li
- Department of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
- Ministry of Education Key Laboratory of Cognitive Radio and Information Processing, Guilin University of Electronic Technology, Guilin 541004, China
| | - Chen Chen
- Department of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
| |
Collapse
|
21
|
Bernal Oñate CP, Melgarejo Meseguer FM, Carrera EV, Sánchez Muñoz JJ, García Alberola A, Rojo Álvarez JL. Different Ventricular Fibrillation Types in Low-Dimensional Latent Spaces. Sensors (Basel) 2023; 23:2527. [PMID: 36904731 PMCID: PMC10006875 DOI: 10.3390/s23052527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/08/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
The causes of ventricular fibrillation (VF) are not yet elucidated, and it has been proposed that different mechanisms might exist. Moreover, conventional analysis methods do not seem to provide time or frequency domain features that allow for recognition of different VF patterns in electrode-recorded biopotentials. The present work aims to determine whether low-dimensional latent spaces could exhibit discriminative features for different mechanisms or conditions during VF episodes. For this purpose, manifold learning using autoencoder neural networks was analyzed based on surface ECG recordings. The recordings covered the onset of the VF episode as well as the next 6 min, and comprised an experimental database based on an animal model with five situations, including control, drug intervention (amiodarone, diltiazem, and flecainide), and autonomic nervous system blockade. The results show that latent spaces from unsupervised and supervised learning schemes yielded moderate though quite noticeable separability among the different types of VF according to their type or intervention. In particular, unsupervised schemes reached a multi-class classification accuracy of 66%, while supervised schemes improved the separability of the generated latent spaces, providing a classification accuracy of up to 74%. Thus, we conclude that manifold learning schemes can provide a valuable tool for studying different types of VF while working in low-dimensional latent spaces, as the machine-learning generated features exhibit separability among different VF types. This study confirms that latent variables are better VF descriptors than conventional time or domain features, making this technique useful in current VF research on elucidation of the underlying VF mechanisms.
Collapse
Affiliation(s)
- Carlos Paúl Bernal Oñate
- Departamento de Eléctrica, Electrónica y Telecomunicaciones, Universidad de las Fuerzas Armadas—ESPE, Sangolqui 171103, Ecuador
| | | | - Enrique V. Carrera
- Departamento de Eléctrica, Electrónica y Telecomunicaciones, Universidad de las Fuerzas Armadas—ESPE, Sangolqui 171103, Ecuador
| | | | | | - José Luis Rojo Álvarez
- Department of Signal Theory and Communications, Telematics and Computing Systems, Universidad Rey Juan Carlos, 28943 Madrid, Spain
| |
Collapse
|
22
|
Campanioni S, González-Nóvoa JA, Busto L, Agís-Balboa RC, Veiga C. Data-Driven Phenotyping of Alzheimer's Disease under Epigenetic Conditions Using Partial Volume Correction of PET Studies and Manifold Learning. Biomedicines 2023; 11. [PMID: 36830810 DOI: 10.3390/biomedicines11020273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/10/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia. An increasing number of studies have confirmed epigenetic changes in AD. Consequently, a robust phenotyping mechanism must take into consideration the environmental effects on the patient in the generation of phenotypes. Positron Emission Tomography (PET) is employed for the quantification of pathological amyloid deposition in brain tissues. The objective is to develop a new methodology for the hyperparametric analysis of changes in cognitive scores and PET features to test for there being multiple AD phenotypes. We used a computational method to identify phenotypes in a retrospective cohort study (532 subjects), using PET and Magnetic Resonance Imaging (MRI) images and neuropsychological assessments, to develop a novel computational phenotyping method that uses Partial Volume Correction (PVC) and subsets of neuropsychological assessments in a non-biased fashion. Our pipeline is based on a Regional Spread Function (RSF) method for PVC and a t-distributed Stochastic Neighbor Embedding (t-SNE) manifold. The results presented demonstrate that (1) the approach to data-driven phenotyping is valid, (2) the different techniques involved in the pipelines produce different results, and (3) they permit us to identify the best phenotyping pipeline. The method identifies three phenotypes and permits us to analyze them under epigenetic conditions.
Collapse
|
23
|
Gonzalez-Castillo J, Fernandez I, Lam KC, Handwerker DA, Pereira F, Bandettini PA. Manifold Learning for fMRI time-varying FC. bioRxiv 2023:2023.01.14.523992. [PMID: 36789436 PMCID: PMC9928030 DOI: 10.1101/2023.01.14.523992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Whole-brain functional connectivity ( FC ) measured with functional MRI (fMRI) evolve over time in meaningful ways at temporal scales going from years (e.g., development) to seconds (e.g., within-scan time-varying FC ( tvFC )). Yet, our ability to explore tvFC is severely constrained by its large dimensionality (several thousands). To overcome this difficulty, researchers seek to generate low dimensional representations (e.g., 2D and 3D scatter plots) expected to retain its most informative aspects (e.g., relationships to behavior, disease progression). Limited prior empirical work suggests that manifold learning techniques ( MLTs )-namely those seeking to infer a low dimensional non-linear surface (i.e., the manifold) where most of the data lies-are good candidates for accomplishing this task. Here we explore this possibility in detail. First, we discuss why one should expect tv FC data to lie on a low dimensional manifold. Second, we estimate what is the intrinsic dimension (i.e., minimum number of latent dimensions; ID ) of tvFC data manifolds. Third, we describe the inner workings of three state-of-the-art MLTs : Laplacian Eigenmaps ( LE ), T-distributed Stochastic Neighbor Embedding ( T-SNE ), and Uniform Manifold Approximation and Projection ( UMAP ). For each method, we empirically evaluate its ability to generate neuro-biologically meaningful representations of tvFC data, as well as their robustness against hyper-parameter selection. Our results show that tvFC data has an ID that ranges between 4 and 26, and that ID varies significantly between rest and task states. We also show how all three methods can effectively capture subject identity and task being performed: UMAP and T-SNE can capture these two levels of detail concurrently, but L E could only capture one at a time. We observed substantial variability in embedding quality across MLTs , and within- MLT as a function of hyper-parameter selection. To help alleviate this issue, we provide heuristics that can inform future studies. Finally, we also demonstrate the importance of feature normalization when combining data across subjects and the role that temporal autocorrelation plays in the application of MLTs to tvFC data. Overall, we conclude that while MLTs can be useful to generate summary views of labeled tvFC data, their application to unlabeled data such as resting-state remains challenging.
Collapse
Affiliation(s)
| | - Isabel Fernandez
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD
| | - Ka Chun Lam
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD
| | - Daniel A Handwerker
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD
| | - Francisco Pereira
- Machine Learning Group, National Institute of Mental Health, Bethesda, MD
| | - Peter A Bandettini
- Section on Functional Imaging Methods, National Institute of Mental Health, Bethesda, MD,Machine Learning Group, National Institute of Mental Health, Bethesda, MD,FMRI Core, National Institute of Mental Health, Bethesda, MD
| |
Collapse
|
24
|
Xu N, Zhou Y, Patel A, Zhang N, Liu Y. Parkinson's Disease Diagnosis beyond Clinical Features: A Bio-marker using Topological Machine Learning of Resting-state Functional Magnetic Resonance Imaging. Neuroscience 2023; 509:43-50. [PMID: 36436700 DOI: 10.1016/j.neuroscience.2022.11.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 11/16/2022] [Accepted: 11/21/2022] [Indexed: 11/26/2022]
Abstract
Parkinson's disease (PD) is one of the leading causes of neurological disability, and its prevalence is expected to increase rapidly in the following few decades. PD diagnosis heavily depends on clinical features using the patient's symptoms. Therefore, an accurate, robust, and non-invasive bio-marker is of critical clinical importance for PD. This study proposes to develop a new bio-marker for PD diagnosis using resting-state functional Magnetic Resonance Imaging (rs-fMRI). Unlike most existing rs-fMRI data analytics using correlational analysis, a Topological Machine Learning approach is proposed to construct the bio-marker. The default functional network is identified first using rs-fMRI. Next, rs-fMRI's high dimensional spatial-temporal data structure is mapped on a Riemann Manifold using topological dimensional reduction. Following the topological dimensional reduction, machine learning is used for classification and sensitivity analysis. The proposed methodology is applied to three open fMRI databases for demonstration and validation. The PD diagnosis accuracy can reach 96.4% when the proposed methodology is used. Thus, rs-fMRI and topological machine learning provide a quantifiable and verifiable bio-marker for future PD early detection and treatment evaluation.
Collapse
Affiliation(s)
- Nan Xu
- School for Engineering of Matter, Transport and Energy, Arizona State University, Tempe, AZ, USA
| | - Yuxiang Zhou
- Department of Radiology, Mayo Clinic, Scottsdale, AZ, USA.
| | - Ameet Patel
- Department of Radiology, Mayo Clinic, Scottsdale, AZ, USA
| | - Na Zhang
- Independent Researcher, Chandler, AZ, USA
| | - Yongming Liu
- School for Engineering of Matter, Transport and Energy, Arizona State University, Tempe, AZ, USA.
| |
Collapse
|
25
|
Djebra Y, Marin T, Han PK, Bloch I, El Fakhri G, Ma C. Manifold Learning via Linear Tangent Space Alignment (LTSA) for Accelerated Dynamic MRI With Sparse Sampling. IEEE Trans Med Imaging 2023; 42:158-169. [PMID: 36121938 PMCID: PMC10024645 DOI: 10.1109/tmi.2022.3207774] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
The spatial resolution and temporal frame-rate of dynamic magnetic resonance imaging (MRI) can be improved by reconstructing images from sparsely sampled k -space data with mathematical modeling of the underlying spatiotemporal signals. These models include sparsity models, linear subspace models, and non-linear manifold models. This work presents a novel linear tangent space alignment (LTSA) model-based framework that exploits the intrinsic low-dimensional manifold structure of dynamic images for accelerated dynamic MRI. The performance of the proposed method was evaluated and compared to state-of-the-art methods using numerical simulation studies as well as 2D and 3D in vivo cardiac imaging experiments. The proposed method achieved the best performance in image reconstruction among all the compared methods. The proposed method could prove useful for accelerating many MRI applications, including dynamic MRI, multi-parametric MRI, and MR spectroscopic imaging.
Collapse
Affiliation(s)
- Yanis Djebra
- Gordon Center for Medical Imaging, Massachusetts General Hospital, and Department of Radiology, Harvard Medical School, Boston, MA 02129 USA and the LTCI, Telecom Paris, Institut Polytechnique de Paris, Paris, France
| | - Thibault Marin
- Gordon Center for Medical Imaging, Massachusetts General Hospital, and Department of Radiology, Harvard Medical School, Boston, MA 02129 USA
| | - Paul K. Han
- Gordon Center for Medical Imaging, Massachusetts General Hospital, and Department of Radiology, Harvard Medical School, Boston, MA 02129 USA
| | - Isabelle Bloch
- LIP6, Sorbonne University, CNRS Paris, France. This work was partly done while I. Bloch was with the LTCI, Telecom Paris, Institut Polytechnique de Paris, Paris, France
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Massachusetts General Hospital, and Department of Radiology, Harvard Medical School, Boston, MA 02129 USA
| | - Chao Ma
- Gordon Center for Medical Imaging, Massachusetts General Hospital, and Department of Radiology, Harvard Medical School, Boston, MA 02129 USA
| |
Collapse
|
26
|
Liu K, Wang F, He Y, Liu Y, Yang J, Yao Y. Data-Augmented Manifold Learning Thermography for Defect Detection and Evaluation of Polymer Composites. Polymers (Basel) 2022; 15. [PMID: 36616523 DOI: 10.3390/polym15010173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 12/25/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022] Open
Abstract
Infrared thermography techniques with thermographic data analysis have been widely applied to non-destructive tests and evaluations of subsurface defects in practical composite materials. However, the performance of these methods is still restricted by limited informative images and difficulties in feature extraction caused by inhomogeneous backgrounds and noise. In this work, a novel generative manifold learning thermography (GMLT) is proposed for defect detection and the evaluation of composites. Specifically, the spectral normalized generative adversarial networks serve as an image augmentation strategy to learn the thermal image distribution, thereby generating virtual images to enrich the dataset. Subsequently, the manifold learning method is employed for the unsupervised dimensionality reduction in all images. Finally, the partial least squares regression is presented to extract the explicit mapping of manifold learning for defect visualization. Moreover, probability density maps and quantitative metrics are proposed to evaluate and explain the obtained defect detection performance. Experimental results on carbon fiber-reinforced polymers demonstrate the superiorities of GMLT, compared with other methods.
Collapse
|
27
|
Shen WX, Liang SR, Jiang YY, Chen YZ. Enhanced metagenomic deep learning for disease prediction and consistent signature recognition by restructured microbiome 2D representations. Patterns (N Y) 2022; 4:100658. [PMID: 36699735 PMCID: PMC9868677 DOI: 10.1016/j.patter.2022.100658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/15/2022] [Accepted: 11/15/2022] [Indexed: 12/23/2022]
Abstract
Metagenomic analysis has been explored for disease diagnosis and biomarker discovery. Low sample sizes, high dimensionality, and sparsity of metagenomic data challenge metagenomic investigations. Here, an unsupervised microbial embedding, grouping, and mapping algorithm (MEGMA) was developed to transform metagenomic data into individualized multichannel microbiome 2D representation by manifold learning and clustering of microbial profiles (e.g., composition, abundance, hierarchy, and taxonomy). These 2D representations enable enhanced disease prediction by established ConvNet-based AggMapNet models, outperforming the commonly used machine learning and deep learning models in metagenomic benchmark datasets. These 2D representations combined with AggMapNet explainable module robustly identified more reliable and replicable disease-prediction microbes (biomarkers). Employing the MEGMA-AggMapNet pipeline for biomarker identification from 5 disease datasets, 84% of the identified biomarkers have been described in over 74 distinct works as important for these diseases. Moreover, the method also discovered highly consistent sets of biomarkers in cross-cohort colorectal cancer (CRC) patients and microbial shifts in different CRC stages.
Collapse
Affiliation(s)
- Wan Xiang Shen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Bioinformatics and Drug Design Group, Department of Pharmacy, and Center for Computational Science and Engineering, National University of Singapore, Singapore 117543, Singapore
| | - Shu Ran Liang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
| | - Yu Yang Jiang
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Corresponding author
| | - Yu Zong Chen
- The State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China,Shenzhen Bay Laboratory, Shenzhen 518000, China,Corresponding author
| |
Collapse
|
28
|
Evangelou N, Wichrowski NJ, Kevrekidis GA, Dietrich F, Kooshkbaghi M, McFann S, Kevrekidis IG. On the parameter combinations that matter and on those that do not: data-driven studies of parameter (non)identifiability. PNAS Nexus 2022; 1:pgac154. [PMID: 36714862 PMCID: PMC9802152 DOI: 10.1093/pnasnexus/pgac154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 08/11/2022] [Indexed: 02/01/2023]
Abstract
We present a data-driven approach to characterizing nonidentifiability of a model's parameters and illustrate it through dynamic as well as steady kinetic models. By employing Diffusion Maps and their extensions, we discover the minimal combinations of parameters required to characterize the output behavior of a chemical system: a set of effective parameters for the model. Furthermore, we introduce and use a Conformal Autoencoder Neural Network technique, as well as a kernel-based Jointly Smooth Function technique, to disentangle the redundant parameter combinations that do not affect the output behavior from the ones that do. We discuss the interpretability of our data-driven effective parameters, and demonstrate the utility of the approach both for behavior prediction and parameter estimation. In the latter task, it becomes important to describe level sets in parameter space that are consistent with a particular output behavior. We validate our approach on a model of multisite phosphorylation, where a reduced set of effective parameters (nonlinear combinations of the physical ones) has previously been established analytically.
Collapse
Affiliation(s)
| | | | - George A Kevrekidis
- Department of Mathematics and Statistics, University of Massachusetts, 710 N Pleasant St, Amherst, MA 01003, USA
| | - Felix Dietrich
- Department of Informatics, Technical University of Munich, Boltzmannstr. 3, Garching 85748, Germany
| | - Mahdi Kooshkbaghi
- The Program in Applied and Computational Mathematic, Princeton University, Washington Road, Princeton, NJ 08544, USA
| | - Sarah McFann
- Department of Chemical and Biological Engineering, Princeton University, 50–70 Olden St, Princeton, NJ 08544, USA,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Ioannis G Kevrekidis
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA,Department of Applied Mathematics and Statistics, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| |
Collapse
|
29
|
Harefa E, Zhou W. Laser-Induced Breakdown Spectroscopy Combined with Nonlinear Manifold Learning for Improvement Aluminum Alloy Classification Accuracy. Sensors (Basel) 2022; 22:s22093129. [PMID: 35590818 PMCID: PMC9102175 DOI: 10.3390/s22093129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/16/2022] [Accepted: 04/18/2022] [Indexed: 02/05/2023]
Abstract
Laser-induced breakdown spectroscopy (LIBS) spectra often include many intensity lines, and obtaining meaningful information from the input dataset and condensing the dimensions of the original data has become a significant challenge in LIBS applications. This study was conducted to classify five different types of aluminum alloys rapidly and noninvasively, utilizing the manifold dimensionality reduction technique and a support vector machine (SVM) classifier model integrated with LIBS technology. The augmented partial residual plot was used to determine the nonlinearity of the LIBS spectra dataset. To circumvent the curse of dimensionality, nonlinear manifold learning techniques, such as local tangent space alignment (LTSA), local linear embedding (LLE), isometric mapping (Isomap), and Laplacian eigenmaps (LE) were used. The performance of linear techniques, such as principal component analysis (PCA) and multidimensional scaling (MDS), was also investigated compared to nonlinear techniques. The reduced dimensions of the dataset were assigned as input datasets in the SVM classifier. The prediction labels indicated that the Isomap-SVM model had the best classification performance with the classification accuracy, the number of dimensions and the number of nearest neighbors being 96.67%, 11, and 18, respectively. These findings demonstrate that the combination of nonlinear manifold learning and multivariate analysis has the potential to classify the samples based on LIBS with reasonable accuracy.
Collapse
|
30
|
Zhang X, Liu X. Multiview Clustering of Adaptive Sparse Representation Based on Coupled P Systems. Entropy (Basel) 2022; 24:e24040568. [PMID: 35455231 PMCID: PMC9028410 DOI: 10.3390/e24040568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/12/2022] [Accepted: 04/15/2022] [Indexed: 12/10/2022]
Abstract
A multiview clustering (MVC) has been a significant technique to dispose data mining issues. Most of the existing studies on this topic adopt a fixed number of neighbors when constructing the similarity matrix of each view, like single-view clustering. However, this may reduce the clustering effect due to the diversity of multiview data sources. Moreover, most MVC utilizes iterative optimization to obtain clustering results, which consumes a significant amount of time. Therefore, this paper proposes a multiview clustering of adaptive sparse representation based on coupled P system (MVCS-CP) without iteration. The whole algorithm flow runs in the coupled P system. Firstly, the natural neighbor search algorithm without parameters automatically determines the number of neighbors of each view. In turn, manifold learning and sparse representation are employed to construct the similarity matrix, which preserves the internal geometry of the views. Next, a soft thresholding operator is introduced to form the unified graph to gain the clustering results. The experimental results on nine real datasets indicate that the MVCS-CP outperforms other state-of-the-art comparison algorithms.
Collapse
|
31
|
Stallaert W, Kedziora KM, Taylor CD, Zikry TM, Ranek JS, Sobon HK, Taylor SR, Young CL, Cook JG, Purvis JE. The structure of the human cell cycle. Cell Syst 2022; 13:230-240.e3. [PMID: 34800361 DOI: 10.1016/j.cels.2021.10.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 08/16/2021] [Accepted: 10/26/2021] [Indexed: 01/01/2023]
Abstract
Understanding the organization of the cell cycle has been a longstanding goal in cell biology. We combined time-lapse microscopy, highly multiplexed single-cell imaging of 48 core cell cycle proteins, and manifold learning to render a visualization of the human cell cycle. This data-driven approach revealed the comprehensive "structure" of the cell cycle: a continuum of molecular states that cells occupy as they transition from one cell division to the next, or as they enter or exit cell cycle arrest. Paradoxically, progression deeper into cell cycle arrest was accompanied by increases in proliferative effectors such as CDKs and cyclins, which can drive cell cycle re-entry by overcoming p21 induction. The structure also revealed the molecular trajectories into senescence and the unique combination of molecular features that define this irreversibly arrested state. This approach will enable the comparison of alternative cell cycles during development, in response to environmental perturbation and in disease. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
|
32
|
Abstract
In the last two decades, there has been an explosion of interest in modeling the brain as a network, where nodes correspond variously to brain regions or neurons, and edges correspond to structural or statistical dependencies between them. This kind of network construction, which preserves spatial, or structural, information while collapsing across time, has become broadly known as "network neuroscience." In this work, we provide an alternative application of network science to neural data: network-based analysis of non-linear time series and review applications of these methods to neural data. Instead of preserving spatial information and collapsing across time, network analysis of time series does the reverse: it collapses spatial information, instead preserving temporally extended dynamics, typically corresponding to evolution through some kind of phase/state-space. This allows researchers to infer a, possibly low-dimensional, "intrinsic manifold" from empirical brain data. We will discuss three methods of constructing networks from nonlinear time series, and how to interpret them in the context of neural data: recurrence networks, visibility networks, and ordinal partition networks. By capturing typically continuous, non-linear dynamics in the form of discrete networks, we show how techniques from network science, non-linear dynamics, and information theory can extract meaningful information distinct from what is normally accessible in standard network neuroscience approaches.
Collapse
Affiliation(s)
- Thomas F. Varley
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, United States
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, United States
| | - Olaf Sporns
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, United States
| |
Collapse
|
33
|
Mandivarapu JK, Camp B, Estrada R. Deep Active Learning via Open-Set Recognition. Front Artif Intell 2022; 5:737363. [PMID: 35198969 PMCID: PMC8859322 DOI: 10.3389/frai.2022.737363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 01/11/2022] [Indexed: 11/13/2022] Open
Abstract
In many applications, data is easy to acquire but expensive and time-consuming to label, prominent examples include medical imaging and NLP. This disparity has only grown in recent years as our ability to collect data improves. Under these constraints, it makes sense to select only the most informative instances from the unlabeled pool and request an oracle (e.g., a human expert) to provide labels for those samples. The goal of active learning is to infer the informativeness of unlabeled samples so as to minimize the number of requests to the oracle. Here, we formulate active learning as an open-set recognition problem. In this paradigm, only some of the inputs belong to known classes; the classifier must identify the rest as unknown. More specifically, we leverage variational neural networks (VNNs), which produce high-confidence (i.e., low-entropy) predictions only for inputs that closely resemble the training data. We use the inverse of this confidence measure to select the samples that the oracle should label. Intuitively, unlabeled samples that the VNN is uncertain about contain features that the network has not been exposed to; thus they are more informative for future training. We carried out an extensive evaluation of our novel, probabilistic formulation of active learning, achieving state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and FashionMNIST. Additionally, unlike current active learning methods, our algorithm can learn even in the presence of out-of-distribution outliers. As our experiments show, when the unlabeled pool consists of a mixture of samples from multiple datasets, our approach can automatically distinguish between samples from seen vs. unseen datasets. Overall, our results show that high-quality uncertainty measures are key for pool-based active learning.
Collapse
Affiliation(s)
| | | | - Rolando Estrada
- Department of Computer Science, Georgia State University, Atlanta, GA, United States
| |
Collapse
|
34
|
Chen T, Ma L, Tang Z, Yu LX. Identification of coumarin-based food additives using terahertz spectroscopy combined with manifold learning and improved support vector machine. J Food Sci 2022; 87:1108-1118. [PMID: 35122257 DOI: 10.1111/1750-3841.16064] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 12/31/2021] [Accepted: 01/04/2022] [Indexed: 11/30/2022]
Abstract
The purpose of this paper is to use terahertz (THz) spectroscopy combined with manifold learning and improved support vector machine (SVM) model to identify the coumarin-based food additives. The 216 THz absorbance spectra (144 for calibration set and 72 for prediction set) of six coumarin-based food additives are measured by using THz time-domain spectroscopy (THz-TDS) in the range of 0.5-2.0 THz. The method (P-t-SNE) combined principal component analysis (PCA) with manifold learning t-distributed stochastic neighbor embedding (t-SNE) is used for feature extraction of the THz spectra. Then, an improved SVM using differential evolution (DE) to improve gray wolf optimization (GWO) to optimize parameters is proposed. Finally, the result shows that the prediction set accuracy of PCA-DEGWO-SVM, P-t-SNE-DEGWO-SVM, and P-t-SNE-GWO-SVM models are 97.22%, 98.61%, and 95.83%, respectively, indicating that the accuracy by P-t-SNE is increased by about 1.39% compared with that processed by PCA, and the accuracy by DEGWO is also increased by about 2.78% compared with that processed by GWO. In conclusion, the improved model (P-t-SNE-DEGWO-SVM) has the best identification effect, and it is proved to be an effective method to identify coumarin-based food additives. PRACTICAL APPLICATION: The method used in this paper can be applied in the field of food safety detection. When detecting coumarin-based food additives, the method proposed in this paper is more time-saving and efficient than traditional detection methods. Through some more tests and adjustments, it will be possible to achieve rapid and on-site identification of various food additives.
Collapse
Affiliation(s)
- Tao Chen
- Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, China
| | - Lingjie Ma
- Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, China
| | - Zongqing Tang
- Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, China
| | - Ling Xiao Yu
- Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, China
| |
Collapse
|
35
|
Casanova R, Lyday RG, Bahrami M, Burdette JH, Simpson SL, Laurienti PJ. Embedding Functional Brain Networks in Low Dimensional Spaces Using Manifold Learning Techniques. Front Neuroinform 2022; 15:740143. [PMID: 35002665 PMCID: PMC8739961 DOI: 10.3389/fninf.2021.740143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 11/19/2021] [Indexed: 11/13/2022] Open
Abstract
Background: fMRI data is inherently high-dimensional and difficult to visualize. A recent trend has been to find spaces of lower dimensionality where functional brain networks can be projected onto manifolds as individual data points, leading to new ways to analyze and interpret the data. Here, we investigate the potential of two powerful non-linear manifold learning techniques for functional brain networks representation: (1) T-stochastic neighbor embedding (t-SNE) and (2) Uniform Manifold Approximation Projection (UMAP) a recent breakthrough in manifold learning. Methods: fMRI data from the Human Connectome Project (HCP) and an independent study of aging were used to generate functional brain networks. We used fMRI data collected during resting state data and during a working memory task. The relative performance of t-SNE and UMAP were investigated by projecting the networks from each study onto 2D manifolds. The levels of discrimination between different tasks and the preservation of the topology were evaluated using different metrics. Results: Both methods effectively discriminated the resting state from the memory task in the embedding space. UMAP discriminated with a higher classification accuracy. However, t-SNE appeared to better preserve the topology of the high-dimensional space. When networks from the HCP and aging studies were combined, the resting state and memory networks in general aligned correctly. Discussion: Our results suggest that UMAP, a more recent development in manifold learning, is an excellent tool to visualize functional brain networks. Despite dramatic differences in data collection and protocols, networks from different studies aligned correctly in the embedding space.
Collapse
Affiliation(s)
- Ramon Casanova
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Robert G Lyday
- Laboratory for Complex Brain Networks, Wake Forest School of Medicine, Winston-Salem, NC, United States.,Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Mohsen Bahrami
- Laboratory for Complex Brain Networks, Wake Forest School of Medicine, Winston-Salem, NC, United States.,Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Jonathan H Burdette
- Laboratory for Complex Brain Networks, Wake Forest School of Medicine, Winston-Salem, NC, United States.,Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Sean L Simpson
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, United States.,Laboratory for Complex Brain Networks, Wake Forest School of Medicine, Winston-Salem, NC, United States
| | - Paul J Laurienti
- Laboratory for Complex Brain Networks, Wake Forest School of Medicine, Winston-Salem, NC, United States.,Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, United States
| |
Collapse
|
36
|
Schuster V, Krogh A. A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder. Entropy (Basel) 2021; 23:e23111403. [PMID: 34828101 PMCID: PMC8625121 DOI: 10.3390/e23111403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/13/2021] [Accepted: 10/21/2021] [Indexed: 11/16/2022]
Abstract
Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.
Collapse
Affiliation(s)
- Viktoria Schuster
- Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark;
| | - Anders Krogh
- Center for Health Data Science, University of Copenhagen, 2200 Copenhagen, Denmark;
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen, Denmark
- Correspondence:
| |
Collapse
|
37
|
Kuchroo M, Godavarthi A, Tong A, Wolf G, Krishnaswamy S. MULTIMODAL DATA VISUALIZATION AND DENOISING WITH INTEGRATED DIFFUSION. IEEE Int Workshop Mach Learn Signal Process 2021; 2021. [PMID: 35340810 DOI: 10.1109/mlsp52302.2021.9596214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We propose a method called integrated diffusion for combining multimodal data, gathered via different sensors on the same system, to create a integrated data diffusion operator. As real world data suffers from both local and global noise, we introduce mechanisms to optimally calculate a diffusion operator that reflects the combined information in data by maintaining low frequency eigenvectors of each modality both globally and locally. We show the utility of this integrated operator in denoising and visualizing multimodal toy data as well as multi-omic data generated from blood cells, measuring both gene expression and chromatin accessibility. Our approach better visualizes the geometry of the integrated data and captures known cross-modality associations. More generally, integrated diffusion is broadly applicable to multimodal datasets generated by noisy sensors collected in a variety of fields.
Collapse
Affiliation(s)
- Manik Kuchroo
- Yale University, Dept. of Neuro., Mila - Quebec AI Institute.,Dept. of Genetics, Mila - Quebec AI Institute
| | | | | | - Guy Wolf
- Université de Montréal, Dept. of Math. & Stat., Mila - Quebec AI Institute
| | - Smita Krishnaswamy
- Dept. of Genetics, Mila - Quebec AI Institute.,Dept. of Comp. Sci., Mila - Quebec AI Institute
| |
Collapse
|
38
|
Boubehziz T, Quesada-Granja C, Dupont C, Villon P, De Vuyst F, Salsac AV. A Data-Driven Space-Time-Parameter Reduced-Order Model with Manifold Learning for Coupled Problems: Application to Deformable Capsules Flowing in Microchannels. Entropy (Basel) 2021; 23:1193. [PMID: 34573820 DOI: 10.3390/e23091193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 09/06/2021] [Accepted: 09/07/2021] [Indexed: 01/12/2023]
Abstract
An innovative data-driven model-order reduction technique is proposed to model dilute micrometric or nanometric suspensions of microcapsules, i.e., microdrops protected in a thin hyperelastic membrane, which are used in Healthcare as innovative drug vehicles. We consider a microcapsule flowing in a similar-size microfluidic channel and vary systematically the governing parameter, namely the capillary number, ratio of the viscous to elastic forces, and the confinement ratio, ratio of the capsule to tube size. The resulting space-time-parameter problem is solved using two global POD reduced bases, determined in the offline stage for the space and parameter variables, respectively. A suitable low-order spatial reduced basis is then computed in the online stage for any new parameter instance. The time evolution of the capsule dynamics is achieved by identifying the nonlinear low-order manifold of the reduced variables; for that, a point cloud of reduced data is computed and a diffuse approximation method is used. Numerical comparisons between the full-order fluid-structure interaction model and the reduced-order one confirm both accuracy and stability of the reduction technique over the whole admissible parameter domain. We believe that such an approach can be applied to a broad range of coupled problems especially involving quasistatic models of structural mechanics.
Collapse
|
39
|
Chen JM, Zovko M, Šimurina N, Zovko V. Fear in a Handful of Dust: The Epidemiological, Environmental, and Economic Drivers of Death by PM 2.5 Pollution. Int J Environ Res Public Health 2021; 18:8688. [PMID: 34444435 PMCID: PMC8393768 DOI: 10.3390/ijerph18168688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/03/2021] [Accepted: 08/14/2021] [Indexed: 01/13/2023]
Abstract
This study evaluates numerous epidemiological, environmental, and economic factors affecting morbidity and mortality from PM2.5 exposure in the 27 member states of the European Union. This form of air pollution inflicts considerable social and economic damage in addition to loss of life and well-being. This study creates and deploys a comprehensive data pipeline. The first step consists of conventional linear models and supervised machine learning alternatives. Those regression methods do more than predict health outcomes in the EU-27 and relate those predictions to independent variables. Linear regression and its machine learning equivalents also inform unsupervised machine learning methods such as clustering and manifold learning. Lower-dimension manifolds of this dataset's feature space reveal the relationship among EU-27 countries and their success (or failure) in managing PM2.5 morbidity and mortality. Principal component analysis informs further interpretation of variables along economic and health-based lines. A nonlinear environmental Kuznets curve may describe the fuller relationship between economic activity and premature death from PM2.5 exposure. The European Union should bridge the historical, cultural, and economic gaps that impair these countries' collective response to PM2.5 pollution.
Collapse
Affiliation(s)
- James Ming Chen
- College of Law, Michigan State University, East Lansing, MI 48824, USA
| | - Mira Zovko
- Ministry of Economy and Sustainable Development, 10000 Zagreb, Croatia;
| | - Nika Šimurina
- Faculty of Economics & Business, University of Zagreb, 10000 Zagreb, Croatia;
| | - Vatroslav Zovko
- Faculty of Teacher Education, University of Zagreb, 10000 Zagreb, Croatia;
| |
Collapse
|
40
|
Lovrić M, Đuričić T, Tran HTN, Hussain H, Lacić E, Rasmussen MA, Kern R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals (Basel) 2021; 14:758. [PMID: 34451855 PMCID: PMC8400160 DOI: 10.3390/ph14080758] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 02/07/2023] Open
Abstract
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis-PCA, uniform manifold approximation and projection-UMAP, and variational autoencoders-VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
| | - Tomislav Đuričić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Han T. N. Tran
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Hussain Hussain
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| | - Emanuel Lacić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
| | - Morten A. Rasmussen
- Copenhagen Studies on Asthma in Childhood, Herlev-Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark;
- Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (T.Đ.); (H.T.N.T.); (H.H.); (E.L.)
- Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
| |
Collapse
|
41
|
Saberi-Movahed F, Mohammadifard M, Mehrpooya A, Rezaei-Ravari M, Berahmand K, Rostami M, Karami S, Najafzadeh M, Hajinezhad D, Jamshidi M, Abedi F, Mohammadifard M, Farbod E, Safavi F, Dorvash M, Vahedi S, Eftekhari M, Saberi-Movahed F, Tavassoly I. Decoding Clinical Biomarker Space of COVID-19: Exploring Matrix Factorization-based Feature Selection Methods. medRxiv 2021:2021.07.07.21259699. [PMID: 34268522 PMCID: PMC8282111 DOI: 10.1101/2021.07.07.21259699] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
One of the most critical challenges in managing complex diseases like COVID-19 is to establish an intelligent triage system that can optimize the clinical decision-making at the time of a global pandemic. The clinical presentation and patients’ characteristics are usually utilized to identify those patients who need more critical care. However, the clinical evidence shows an unmet need to determine more accurate and optimal clinical biomarkers to triage patients under a condition like the COVID-19 crisis. Here we have presented a machine learning approach to find a group of clinical indicators from the blood tests of a set of COVID-19 patients that are predictive of poor prognosis and morbidity. Our approach consists of two interconnected schemes: Feature Selection and Prognosis Classification. The former is based on different Matrix Factorization (MF)-based methods, and the latter is performed using Random Forest algorithm. Our model reveals that Arterial Blood Gas (ABG) O 2 Saturation and C-Reactive Protein (CRP) are the most important clinical biomarkers determining the poor prognosis in these patients. Our approach paves the path of building quantitative and optimized clinical management systems for COVID-19 and similar diseases.
Collapse
Affiliation(s)
| | | | - Adel Mehrpooya
- School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia
| | | | - Kamal Berahmand
- School of Computer Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane Australia
| | | | - Saeed Karami
- Department of Mathematics, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731, Iran
| | - Mohammad Najafzadeh
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | | | - Mina Jamshidi
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Farshid Abedi
- Infectious Diseases Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | | | - Elnaz Farbod
- Baruch College, City University of New York, New York, USA
| | - Farinaz Safavi
- Neuroimmunology and Neurovirology Branch, National Institute of Neurological Disorders and Stroke, National Institute of Health, Bethesda, Maryland, USA
| | - Mohammadreza Dorvash
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Viewbank, VIC, Australia
| | | | - Mahdi Eftekhari
- Department of Computer Engineering, University of Kerman, Kerman, Iran
| | - Farid Saberi-Movahed
- Department of Applied Mathematics, Faculty of Sciences and Modern Technologies, Graduate University of Advanced Technology, Kerman, Iran
| | - Iman Tavassoly
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY10029
| |
Collapse
|
42
|
Margazoglou G, Grafke T, Laio A, Lucarini V. Dynamical landscape and multistability of a climate model. Proc Math Phys Eng Sci 2021; 477:20210019. [PMID: 35153562 PMCID: PMC8299554 DOI: 10.1098/rspa.2021.0019] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 05/04/2021] [Indexed: 12/15/2022] Open
Abstract
We apply two independent data analysis methodologies to locate stable climate states in an intermediate complexity climate model and analyse their interplay. First, drawing from the theory of quasi-potentials, and viewing the state space as an energy landscape with valleys and mountain ridges, we infer the relative likelihood of the identified multistable climate states and investigate the most likely transition trajectories as well as the expected transition times between them. Second, harnessing techniques from data science, and specifically manifold learning, we characterize the data landscape of the simulation output to find climate states and basin boundaries within a fully agnostic and unsupervised framework. Both approaches show remarkable agreement, and reveal, apart from the well known warm and snowball earth states, a third intermediate stable state in one of the two versions of PLASIM, the climate model used in this study. The combination of our approaches allows to identify how the negative feedback of ocean heat transport and entropy production via the hydrological cycle drastically change the topography of the dynamical landscape of Earth’s climate.
Collapse
Affiliation(s)
- Georgios Margazoglou
- Department of Mathematics and Statistics, University of Reading, Reading, UK.,Centre for the Mathematics of Planet Earth, University of Reading, Reading, UK
| | - Tobias Grafke
- Mathematics Institute, University of Warwick, Coventry, UK
| | - Alessandro Laio
- International School for Advanced Studies (SISSA), Trieste, Italy
| | - Valerio Lucarini
- Department of Mathematics and Statistics, University of Reading, Reading, UK.,Centre for the Mathematics of Planet Earth, University of Reading, Reading, UK
| |
Collapse
|
43
|
Jaenal A, Moreno FA, Gonzalez-Jimenez J. Appearance-Based Sequential Robot Localization Using a Patchwise Approximation of a Descriptor Manifold. Sensors (Basel) 2021; 21:2483. [PMID: 33918493 PMCID: PMC8038242 DOI: 10.3390/s21072483] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 03/26/2021] [Accepted: 03/29/2021] [Indexed: 11/16/2022]
Abstract
This paper addresses appearance-based robot localization in 2D with a sparse, lightweight map of the environment composed of descriptor-pose image pairs. Based on previous research in the field, we assume that image descriptors are samples of a low-dimensional Descriptor Manifold that is locally articulated by the camera pose. We propose a piecewise approximation of the geometry of such Descriptor Manifold through a tessellation of so-called Patches of Smooth Appearance Change (PSACs), which defines our appearance map. Upon this map, the presented robot localization method applies both a Gaussian Process Particle Filter (GPPF) to perform camera tracking and a Place Recognition (PR) technique for relocalization within the most likely PSACs according to the observed descriptor. A specific Gaussian Process (GP) is trained for each PSAC to regress a Gaussian distribution over the descriptor for any particle pose lying within that PSAC. The evaluation of the observed descriptor in this distribution gives us a likelihood, which is used as the weight for the particle. Besides, we model the impact of appearance variations on image descriptors as a white noise distribution within the GP formulation, ensuring adequate operation under lighting and scene appearance changes with respect to the conditions in which the map was constructed. A series of experiments with both real and synthetic images show that our method outperforms state-of-the-art appearance-based localization methods in terms of robustness and accuracy, with median errors below 0.3 m and 6°.
Collapse
Affiliation(s)
| | - Francisco-Angel Moreno
- Machine Perception and Intelligent Robotics Group (MAPIR), Department of System Engineering and Automation Biomedical Research Institute of Malaga (IBIMA), University of Malaga, 29071 Málaga, Spain; (A.J.); (J.G.-J.)
| | | |
Collapse
|
44
|
Nakajima R, Laskaris N, Rhee JK, Baker BJ, Kosmidis EK. GEVI cell-type specific labelling and a manifold learning approach provide evidence for lateral inhibition at the population level in the mouse hippocampal CA1 area. Eur J Neurosci 2021; 53:3019-3038. [PMID: 33675122 DOI: 10.1111/ejn.15177] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 02/04/2021] [Accepted: 02/22/2021] [Indexed: 01/04/2023]
Abstract
The CA1 area in the mammalian hippocampus is essential for spatial learning. Pyramidal cells are the hippocampus output neurons and their activities are regulated by inhibition exerted by a diversified population of interneurons. Lateral inhibition has been suggested as the mechanism enabling the reconfiguration of pyramidal cell assembly activity observed during spatial learning tasks in rodents. However, lateral inhibition in the CA1 lacks the overwhelming evidence reported in other hippocampal areas such as the CA3 and the dentate gyrus. The use of genetically encoded voltage indicators and fast optical recordings permits the construction of cell-type specific response maps of neuronal activity. Here, we labelled mouse CA1 pyramidal neurons with the genetically encoded voltage indicator ArcLight and optically recorded their response to Schaffer Collaterals stimulation in vitro. By undertaking a manifold learning approach, we report a hyperpolarization-dominated area focused in the perisomatic region of pyramidal cells receiving late excitatory synaptic input. Functional network organization metrics revealed that information transfer was higher in this area. The localized hyperpolarization disappeared when GABAA receptors were pharmacologically blocked. This is the first report where the spatiotemporal pattern of lateral inhibition is visualized in the CA1 by expressing a genetically encoded voltage indicator selectively in principal neurons. Our analysis suggests a fundamental role of lateral inhibition in CA1 information processing.
Collapse
Affiliation(s)
- Ryuichi Nakajima
- Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - Nikolaos Laskaris
- AIIA Lab, Informatics Department, Aristotle University of Thessaloniki, Thessaloniki, Greece.,NeuroInformatics GRoup, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Jun Kyu Rhee
- Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea.,Division of Bio-Medical Science and Technology, KIST School, Korea University of Science and Technology (UST), Seoul, Republic of Korea
| | - Bradley J Baker
- Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea.,Division of Bio-Medical Science and Technology, KIST School, Korea University of Science and Technology (UST), Seoul, Republic of Korea
| | - Efstratios K Kosmidis
- NeuroInformatics GRoup, Aristotle University of Thessaloniki, Thessaloniki, Greece.,Department of Medicine, Laboratory of Physiology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
45
|
Gallos IK, Gkiatis K, Matsopoulos GK, Siettos C. ISOMAP and machine learning algorithms for the construction of embedded functional connectivity networks of anatomically separated brain regions from resting state fMRI data of patients with Schizophrenia. AIMS Neurosci 2021; 8:295-321. [PMID: 33709030 PMCID: PMC7940114 DOI: 10.3934/neuroscience.2021016] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/18/2021] [Indexed: 11/18/2022] Open
Abstract
We construct Functional Connectivity Networks (FCN) from resting state fMRI (rsfMRI) recordings towards the classification of brain activity between healthy and schizophrenic subjects using a publicly available dataset (the COBRE dataset) of 145 subjects (74 healthy controls and 71 schizophrenic subjects). First, we match the anatomy of the brain of each individual to the Desikan-Killiany brain atlas. Then, we use the conventional approach of correlating the parcellated time series to construct FCN and ISOMAP, a nonlinear manifold learning algorithm to produce low-dimensional embeddings of the correlation matrices. For the classification analysis, we computed five key local graph-theoretic measures of the FCN and used the LASSO and Random Forest (RF) algorithms for feature selection. For the classification we used standard linear Support Vector Machines. The classification performance is tested by a double cross-validation scheme (consisting of an outer and an inner loop of "Leave one out" cross-validation (LOOCV)). The standard cross-correlation methodology produced a classification rate of 73.1%, while ISOMAP resulted in 79.3%, thus providing a simpler model with a smaller number of features as chosen from LASSO and RF, namely the participation coefficient of the right thalamus and the strength of the right lingual gyrus.
Collapse
Affiliation(s)
- Ioannis K Gallos
- School of Applied Mathematical and Physical Sciences, National Technical University of Athens, Greece
| | - Kostakis Gkiatis
- School of Electrical and Computer Engineering, National Technical University of Athens, Greece
| | - George K Matsopoulos
- School of Electrical and Computer Engineering, National Technical University of Athens, Greece
| | - Constantinos Siettos
- Dipartimento di Matematica e Applicazioni “Renato Caccioppoli”, Università degli Studi di Napoli Federico II, Italy
| |
Collapse
|
46
|
Kohli D, Cloninger A, Mishne G. LDLE: Low Distortion Local Eigenmaps. J Mach Learn Res 2021; 22:282. [PMID: 35873072 PMCID: PMC9307127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We present Low Distortion Local Eigenmaps (LDLE), a manifold learning technique which constructs a set of low distortion local views of a data set in lower dimension and registers them to obtain a global embedding. The local views are constructed using the global eigenvectors of the graph Laplacian and are registered using Procrustes analysis. The choice of these eigenvectors may vary across the regions. In contrast to existing techniques, LDLE can embed closed and non-orientable manifolds into their intrinsic dimension by tearing them apart. It also provides gluing instruction on the boundary of the torn embedding to help identify the topology of the original manifold. Our experimental results will show that LDLE largely preserved distances up to a constant scale while other techniques produced higher distortion. We also demonstrate that LDLE produces high quality embeddings even when the data is noisy or sparse.
Collapse
Affiliation(s)
- Dhruv Kohli
- Department of Mathematics, University of California San Diego, CA 92093, USA
| | - Alexander Cloninger
- Department of Mathematics, University of California San Diego, CA 92093, USA
| | - Gal Mishne
- Halicioğlu Data Science Institute, University of California San Diego, CA 92093, USA
| |
Collapse
|
47
|
Peterfreund E, Lindenbaum O, Dietrich F, Bertalan T, Gavish M, Kevrekidis IG, Coifman RR. Local conformal autoencoder for standardized data coordinates. Proc Natl Acad Sci U S A 2020; 117:30918-27. [PMID: 33229581 DOI: 10.1073/pnas.2014627117] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
A fundamental issue in empirical science is the ability to calibrate between different types of measurements/observations of the same phenomenon. This naturally suggests the selection of canonical variables, in the spirit of principal components, to enable matching/calibration among different observation modalities/instruments. We develop a method for extracting standardized, nonlinear, intrinsic coordinates from measured data, leading to a generalized isometric embedding of the observations. This is achieved through a local burst data acquisition strategy that allows us to capture the local z-scored structure. We implement this method using a local conformal autoencoder architecture and illustrate it computationally. The proposed embedding is fast, parallelizable, easy to implement using existing open-source neural network implementations and exhibits surprising interpolation and extrapolation capabilities. We propose a local conformal autoencoder (LOCA) for standardized data coordinates. LOCA is a deep learning-based method for obtaining standardized data coordinates from scientific measurements. Data observations are modeled as samples from an unknown, nonlinear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized, latent variables. We assume a repeated measurement sampling strategy, common in scientific measurements, and present a method for learning an embedding in Rd that is isometric to the latent variables of the manifold. The coordinates recovered by our method are invariant to diffeomorphisms of the manifold, making it possible to match between different instrumental observations of the same phenomenon. Our embedding is obtained using LOCA, which is an algorithm that learns to rectify deformations by using a local z-scoring procedure, while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA in various model settings and observe that it exhibits promising interpolation and extrapolation capabilities, superior to the current state of the art. Finally, we demonstrate LOCA’s efficacy in single-site Wi-Fi localization data and for the reconstruction of three-dimensional curved surfaces from two-dimensional projections.
Collapse
|
48
|
Gu L, Zhang X, You S, Zhao S, Liu Z, Harada T. Semi-Supervised Learning in Medical Images Through Graph-Embedded Random Forest. Front Neuroinform 2020; 14:601829. [PMID: 33240071 PMCID: PMC7683389 DOI: 10.3389/fninf.2020.601829] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 09/23/2020] [Indexed: 11/29/2022] Open
Abstract
One major challenge in medical imaging analysis is the lack of label and annotation which usually requires medical knowledge and training. This issue is particularly serious in the brain image analysis such as the analysis of retinal vasculature, which directly reflects the vascular condition of Central Nervous System (CNS). In this paper, we present a novel semi-supervised learning algorithm to boost the performance of random forest under limited labeled data by exploiting the local structure of unlabeled data. We identify the key bottleneck of random forest to be the information gain calculation and replace it with a graph-embedded entropy which is more reliable for insufficient labeled data scenario. By properly modifying the training process of standard random forest, our algorithm significantly improves the performance while preserving the virtue of random forest such as low computational burden and robustness over over-fitting. Our method has shown a superior performance on both medical imaging analysis and machine learning benchmarks.
Collapse
Affiliation(s)
- Lin Gu
- RIKEN AIP, Tokyo, Japan.,Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Tokyo, Japan
| | - Xiaowei Zhang
- Bioinformatics Institute (BII), ASTAR, Singapore, Singapore
| | - Shaodi You
- Faculty of Science, Institute of Informatics, University of Amsterdam, Amsterdam, Netherlands
| | - Shen Zhao
- Department of Medical Physics, Western University, London, ON, Canada
| | - Zhenzhong Liu
- Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China.,National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin, China
| | - Tatsuya Harada
- RIKEN AIP, Tokyo, Japan.,Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
49
|
Leon-Medina JX, Anaya M, Pozo F, Tibaduiza D. Nonlinear Feature Extraction Through Manifold Learning in an Electronic Tongue Classification Task. Sensors (Basel) 2020; 20:E4834. [PMID: 32867066 DOI: 10.3390/s20174834] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 08/22/2020] [Accepted: 08/23/2020] [Indexed: 11/22/2022]
Abstract
A nonlinear feature extraction-based approach using manifold learning algorithms is developed in order to improve the classification accuracy in an electronic tongue sensor array. The developed signal processing methodology is composed of four stages: data unfolding, scaling, feature extraction, and classification. This study aims to compare seven manifold learning algorithms: Isomap, Laplacian Eigenmaps, Locally Linear Embedding (LLE), modified LLE, Hessian LLE, Local Tangent Space Alignment (LTSA), and t-Distributed Stochastic Neighbor Embedding (t-SNE) to find the best classification accuracy in a multifrequency large-amplitude pulse voltammetry electronic tongue. A sensitivity study of the parameters of each manifold learning algorithm is also included. A data set of seven different aqueous matrices is used to validate the proposed data processing methodology. A leave-one-out cross validation was employed in 63 samples. The best accuracy (96.83%) was obtained when the methodology uses Mean-Centered Group Scaling (MCGS) for data normalization, the t-SNE algorithm for feature extraction, and k-nearest neighbors (kNN) as classifier.
Collapse
|
50
|
McVey C, Hsieh F, Manriquez D, Pinedo P, Horback K. Mind the Queue: A Case Study in Visualizing Heterogeneous Behavioral Patterns in Livestock Sensor Data Using Unsupervised Machine Learning Techniques. Front Vet Sci 2020; 7:523. [PMID: 33134329 PMCID: PMC7518149 DOI: 10.3389/fvets.2020.00523] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 07/07/2020] [Indexed: 12/28/2022] Open
Abstract
Sensor technologies allow ethologists to continuously monitor the behaviors of large numbers of animals over extended periods of time. This creates new opportunities to study livestock behavior in commercial settings, but also new methodological challenges. Densely sampled behavioral data from large heterogeneous groups can contain a range of complex patterns and stochastic structures that may be difficult to visualize using conventional exploratory data analysis techniques. The goal of this research was to assess the efficacy of unsupervised machine learning tools in recovering complex behavioral patterns from such datasets to better inform subsequent statistical modeling. This methodological case study was carried out using records on milking order, or the sequence in which cows arrange themselves as they enter the milking parlor. Data was collected over a 6-month period from a closed group of 200 mixed-parity Holstein cattle on an organic dairy. Cows at the front and rear of the queue proved more consistent in their entry position than animals at the center of the queue, a systematic pattern of heterogeneity more clearly visualized using entropy estimates, a scale and distribution-free alternative to variance robust to outliers. Dimension reduction techniques were then used to visualize relationships between cows. No evidence of social cohesion was recovered, but Diffusion Map embeddings proved more adept than PCA at revealing the underlying linear geometry of this data. Median parlor entry positions from the pre- and post-pasture subperiods were highly correlated (R = 0.91), suggesting a surprising degree of temporal stationarity. Data Mechanics visualizations, however, revealed heterogeneous non-stationary among subgroups of animals in the center of the group and herd-level temporal outliers. A repeated measures model recovered inconsistent evidence of a relationships between entry position and cow attributes. Mutual conditional entropy tests, a permutation-based approach to assessing bivariate correlations robust to non-independence, confirmed a significant but non-linear association with peak milk yield, but revealed the age effect to be potentially confounded by health status. Finally, queueing records were related back to behaviors recorded via ear tag accelerometers using linear models and mutual conditional entropy tests. Both approaches recovered consistent evidence of differences in home pen behaviors across subsections of the queue.
Collapse
Affiliation(s)
- Catherine McVey
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| | - Fushing Hsieh
- Department of Statistics, University of California, Davis, Davis, CA, United States
| | - Diego Manriquez
- Department of Animal Science, Colorado State University, Fort Collins, CO, United States
| | - Pablo Pinedo
- Department of Animal Science, Colorado State University, Fort Collins, CO, United States
| | - Kristina Horback
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|