1
|
Ma Y, Zhao Y, Ma Y. Kernel Bayesian nonlinear matrix factorization based on variational inference for human-virus protein-protein interaction prediction. Sci Rep 2024; 14:5693. [PMID: 38454139 PMCID: PMC10920681 DOI: 10.1038/s41598-024-56208-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024] Open
Abstract
Identification of potential human-virus protein-protein interactions (PPIs) contributes to the understanding of the mechanisms of viral infection and to the development of antiviral drugs. Existing computational models often have more hyperparameters that need to be adjusted manually, which limits their computational efficiency and generalization ability. Based on this, this study proposes a kernel Bayesian logistic matrix decomposition model with automatic rank determination, VKBNMF, for the prediction of human-virus PPIs. VKBNMF introduces auxiliary information into the logistic matrix decomposition and sets the prior probabilities of the latent variables to build a Bayesian framework for automatic parameter search. In addition, we construct the variational inference framework of VKBNMF to ensure the solution efficiency. The experimental results show that for the scenarios of paired PPIs, VKBNMF achieves an average AUPR of 0.9101, 0.9316, 0.8727, and 0.9517 on the four benchmark datasets, respectively, and for the scenarios of new human (viral) proteins, VKBNMF still achieves a higher hit rate. The case study also further demonstrated that VKBNMF can be used as an effective tool for the prediction of human-virus PPIs.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yongbiao Zhao
- School of Computer, Central China Normal University, Wuhan, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| |
Collapse
|
2
|
Cui W, Wang D. Image semantic learning method based on social heterogeneous graph networks1. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-222981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Image semantic learning techniques are crucial for image understanding and classification. In social networks, image data is widely disseminated thanks to convenient acquisition and intuitive expression characteristics. However, due to the freedom of users to publish information, the image has apparent context dependence and semantic fuzziness, which brings difficulties to image representation learning. Fortunately, social attributes such as hashtags carry rich semantic relations, which can be conducive to understanding the meaning of images. Therefore, this paper proposes a new method named Social Heterogeneous Graph Networks (SHGN) for image semantic learning in social networks. First, a heterogeneous graph is built to expand image semantic relations by social attributes. Then the consistent semantic space is reconstructed through cross-media feature alignment. Finally, an image semantic extended learning network is designed to capture and integrate the social semantics and visual feature, which obtains a rich semantic representation of images from a social context. The experiments demonstrate that SHGN can achieve efficient image representation, and favorably against many baseline algorithms.
Collapse
Affiliation(s)
- Wanqiu Cui
- School of National Security, People’s Public Security University of China, Beijing, China
| | - Dawei Wang
- Institute of Scientific and Technical Information of China, Beijing, China
| |
Collapse
|
3
|
Ma Z, Lai Y, Xie J, Meng D, Kleijn WB, Guo J, Yu J. Dirichlet Process Mixture of Generalized Inverted Dirichlet Distributions for Positive Vector Data With Extended Variational Inference. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6089-6102. [PMID: 34086578 DOI: 10.1109/tnnls.2021.3072209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A Bayesian nonparametric approach for estimation of a Dirichlet process (DP) mixture of generalized inverted Dirichlet distributions [i.e., an infinite generalized inverted Dirichlet mixture model (InGIDMM)] has been proposed. The generalized inverted Dirichlet distribution has been proven to be efficient in modeling the vectors that contain only positive elements. Under the classical variational inference (VI) framework, the key challenge in the Bayesian estimation of InGIDMM is that the expectation of the joint distribution of data and variables cannot be explicitly calculated. Therefore, numerical methods are usually applied to simulate the optimal posterior distributions. With the recently proposed extended VI (EVI) framework, we introduce lower bound approximations to the original variational objective function in the VI framework such that an analytically tractable solution can be derived. Hence, the problem in numerical simulation has been overcome. By applying the DP mixture technique, an InGIDMM can automatically determine the number of mixture components from the observed data. Moreover, the DP mixture model with an infinite number of mixture components also avoids the problems of underfitting and overfitting. The performance of the proposed approach is demonstrated with both synthesized data and real-life data applications.
Collapse
|
4
|
Alameda-Pineda X, Drouard V, Horaud RP. Variational Inference and Learning of Piecewise Linear Dynamical Systems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3753-3764. [PMID: 33571096 DOI: 10.1109/tnnls.2021.3054407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Modeling the temporal behavior of data is of primordial importance in many scientific and engineering fields. Baseline methods assume that both the dynamic and observation equations follow linear-Gaussian models. However, there are many real-world processes that cannot be characterized by a single linear behavior. Alternatively, it is possible to consider a piecewise linear model which, combined with a switching mechanism, is well suited when several modes of behavior are needed. Nevertheless, switching dynamical systems are intractable because their computational complexity increases exponentially with time. In this article, we propose a variational approximation of piecewise linear dynamical systems. We provide full details of the derivation of two variational expectation-maximization algorithms: a filter and a smoother. We show that the model parameters can be split into two sets: static and dynamic parameters, and that the former parameters can be estimated offline together with the number of linear modes, or the number of states of the switching variable. We apply the proposed method to the head-pose tracking, and we thoroughly compare our algorithms with several state of the art trackers.
Collapse
|
5
|
Deep Learning for Human Disease Detection, Subtype Classification, and Treatment Response Prediction Using Epigenomic Data. Biomedicines 2021; 9:biomedicines9111733. [PMID: 34829962 PMCID: PMC8615388 DOI: 10.3390/biomedicines9111733] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/26/2021] [Accepted: 11/17/2021] [Indexed: 12/25/2022] Open
Abstract
Deep learning (DL) is a distinct class of machine learning that has achieved first-class performance in many fields of study. For epigenomics, the application of DL to assist physicians and scientists in human disease-relevant prediction tasks has been relatively unexplored until very recently. In this article, we critically review published studies that employed DL models to predict disease detection, subtype classification, and treatment responses, using epigenomic data. A comprehensive search on PubMed, Scopus, Web of Science, Google Scholar, and arXiv.org was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. Among 1140 initially identified publications, we included 22 articles in our review. DNA methylation and RNA-sequencing data are most frequently used to train the predictive models. The reviewed models achieved a high accuracy ranged from 88.3% to 100.0% for disease detection tasks, from 69.5% to 97.8% for subtype classification tasks, and from 80.0% to 93.0% for treatment response prediction tasks. We generated a workflow to develop a predictive model that encompasses all steps from first defining human disease-related tasks to finally evaluating model performance. DL holds promise for transforming epigenomic big data into valuable knowledge that will enhance the development of translational epigenomics.
Collapse
|
6
|
Li X, Chang D, Ma Z, Tan ZH, Xue JH, Cao J, Guo J. Deep InterBoost networks for small-sample image classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.135] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
7
|
Yang J, Wu X, Liang J, Sun X, Cheng MM, Rosin PL, Wang L. Self-Paced Balance Learning for Clinical Skin Disease Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2832-2846. [PMID: 31199274 DOI: 10.1109/tnnls.2019.2917524] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Class imbalance is a challenging problem in many classification tasks. It induces biased classification results for minority classes that contain less training samples than others. Most existing approaches aim to remedy the imbalanced number of instances among categories by resampling the majority and minority classes accordingly. However, the imbalanced level of difficulty of recognizing different categories is also crucial, especially for distinguishing samples with many classes. For example, in the task of clinical skin disease recognition, several rare diseases have a small number of training samples, but they are easy to diagnose because of their distinct visual properties. On the other hand, some common skin diseases, e.g., eczema, are hard to recognize due to the lack of special symptoms. To address this problem, we propose a self-paced balance learning (SPBL) algorithm in this paper. Specifically, we introduce a comprehensive metric termed the complexity of image category that is a combination of both sample number and recognition difficulty. First, the complexity is initialized using the model of the first pace, where the pace indicates one iteration in the self-paced learning paradigm. We then assign each class a penalty weight that is larger for more complex categories and smaller for easier ones, after which the curriculum is reconstructed by rearranging the training samples. Consequently, the model can iteratively learn discriminative representations via balancing the complexity in each pace. Experimental results on the SD-198 and SD-260 benchmark data sets demonstrate that the proposed SPBL algorithm performs favorably against the state-of-the-art methods. We also demonstrate the effectiveness of the SPBL algorithm's generalization capacity on various tasks, such as indoor scene image recognition and object classification.
Collapse
|
8
|
He X, Tang J, Du X, Hong R, Ren T, Chua TS. Fast Matrix Factorization With Nonuniform Weights on Missing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2791-2804. [PMID: 30676983 DOI: 10.1109/tnnls.2018.2890117] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Matrix factorization (MF) has been widely used to discover the low-rank structure and to predict the missing entries of data matrix. In many real-world learning systems, the data matrix can be very high dimensional but sparse. This poses an imbalanced learning problem since the scale of missing entries is usually much larger than that of the observed entries, but they cannot be ignored due to the valuable negative signal. For efficiency concern, existing work typically applies a uniform weight on missing entries to allow a fast learning algorithm. However, this simplification will decrease modeling fidelity, resulting in suboptimal performance for downstream applications. In this paper, we weight the missing data nonuniformly, and more generically, we allow any weighting strategy on the missing data. To address the efficiency challenge, we propose a fast learning method, for which the time complexity is determined by the number of observed entries in the data matrix rather than the matrix size. The key idea is twofold: 1) we apply truncated singular value decomposition on the weight matrix to get a more compact representation of the weights and 2) we learn MF parameters with elementwise alternating least squares (eALS) and memorize the key intermediate variables to avoid repeating computations that are unnecessary. We conduct extensive experiments on two recommendation benchmarks, demonstrating the correctness, efficiency, and effectiveness of our fast eALS method.
Collapse
|
9
|
Ma Z, Xie J, Lai Y, Taghia J, Xue JH, Guo J. Insights Into Multiple/Single Lower Bound Approximation for Extended Variational Inference in Non-Gaussian Structured Data Modeling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2240-2254. [PMID: 30908264 DOI: 10.1109/tnnls.2019.2899613] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
For most of the non-Gaussian statistical models, the data being modeled represent strongly structured properties, such as scalar data with bounded support (e.g., beta distribution), vector data with unit length (e.g., Dirichlet distribution), and vector data with positive elements (e.g., generalized inverted Dirichlet distribution). In practical implementations of non-Gaussian statistical models, it is infeasible to find an analytically tractable solution to estimating the posterior distributions of the parameters. Variational inference (VI) is a widely used framework in Bayesian estimation. Recently, an improved framework, namely, the extended VI (EVI), has been introduced and applied successfully to a number of non-Gaussian statistical models. EVI derives analytically tractable solutions by introducing lower bound approximations to the variational objective function. In this paper, we compare two approximation strategies, namely, the multiple lower bounds (MLBs) approximation and the single lower bound (SLB) approximation, which can be applied to carry out the EVI. For implementation, two different conditions, the weak and the strong conditions, are discussed. Convergence of the EVI depends on the selection of the lower bound, regardless of the choice of weak or strong condition. We also discuss the convergence properties to clarify the differences between MLB and SLB. Extensive comparisons are made based on some EVI-based non-Gaussian statistical models. Theoretical analysis is conducted to demonstrate the differences between the weak and strong conditions. Experimental results based on real data show advantages of the SLB approximation over the MLB approximation.
Collapse
|
10
|
|
11
|
Zhao L, Chen Z, Yang Y, Zou L, Wang ZJ. ICFS Clustering With Multiple Representatives for Large Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:728-738. [PMID: 30047910 DOI: 10.1109/tnnls.2018.2851979] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
With the prevailing development of Cyber-physical-social systems and Internet of Things, large-scale data have been collected consistently. Mining large data effectively and efficiently becomes increasingly important to promote the development and improve the service quality of these applications. Clustering, a popular data mining technique, aims to identify underlying patterns hidden in the data. Most clustering methods assume the static data, thus they are unfavorable for analyzing large, unbalanced dynamic data. In this paper, to address this concern, we focus on incremental clustering by extending the novel [clustering by fast search (CFS) and find of density peaks] method to incrementally handle large-scale dynamic data. Specifically, we first discuss two challenges, i.e., assignment of new arriving objects and dynamic adjustment of clusters, in incremental CFS (ICFS) clustering. We then propose two ICFS clustering algorithms, ICFS with multiple representatives (ICFSMR) and the enhanced ICFSMR (E_ICFSMR) to tackle the two challenges. In ICFSMR, we explore the convex hull theory to modify the representatives identified for each cluster. E_ICFSMR improves the generality and effectiveness of ICFSMR by exploring one-time cluster adjustment strategy after integration of each data chunk. We evaluate the proposed methods with extensive experiments on four benchmark data sets, as well as the air quality and traffic monitoring time series, with comparisons to CFS and other three state-of-the-art incremental clustering methods. Experimental results demonstrate that the proposed methods outperform the compared methods in terms of both effectiveness and efficiency.
Collapse
|
12
|
Ma Z, Lai Y, Kleijn WB, Song YZ, Wang L, Guo J. Variational Bayesian Learning for Dirichlet Process Mixture of Inverted Dirichlet Distributions in Non-Gaussian Image Feature Modeling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:449-463. [PMID: 29994731 DOI: 10.1109/tnnls.2018.2844399] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we develop a novel variational Bayesian learning method for the Dirichlet process (DP) mixture of the inverted Dirichlet distributions, which has been shown to be very flexible for modeling vectors with positive elements. The recently proposed extended variational inference (EVI) framework is adopted to derive an analytically tractable solution. The convergency of the proposed algorithm is theoretically guaranteed by introducing single lower bound approximation to the original objective function in the EVI framework. In principle, the proposed model can be viewed as an infinite inverted Dirichlet mixture model that allows the automatic determination of the number of mixture components from data. Therefore, the problem of predetermining the optimal number of mixing components has been overcome. Moreover, the problems of overfitting and underfitting are avoided by the Bayesian estimation approach. Compared with several recently proposed DP-related methods and conventional applied methods, the good performance and effectiveness of the proposed method have been demonstrated with both synthesized data and real data evaluations.
Collapse
|
13
|
Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.054] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
14
|
Campillo-Funollet E, Venkataraman C, Madzvamuse A. Bayesian Parameter Identification for Turing Systems on Stationary and Evolving Domains. Bull Math Biol 2019; 81:81-104. [PMID: 30311137 PMCID: PMC6320356 DOI: 10.1007/s11538-018-0518-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 09/28/2018] [Indexed: 02/05/2023]
Abstract
In this study, we apply the Bayesian paradigm for parameter identification to a well-studied semi-linear reaction-diffusion system with activator-depleted reaction kinetics, posed on stationary as well as evolving domains. We provide a mathematically rigorous framework to study the inverse problem of finding the parameters of a reaction-diffusion system given a final spatial pattern. On the stationary domain the parameters are finite-dimensional, but on the evolving domain we consider the problem of identifying the evolution of the domain, i.e. a time-dependent function. Whilst others have considered these inverse problems using optimisation techniques, the Bayesian approach provides a rigorous mathematical framework for incorporating the prior knowledge on uncertainty in the observation and in the parameters themselves, resulting in an approximation of the full probability distribution for the parameters, given the data. Furthermore, using previously established results, we can prove well-posedness results for the inverse problem, using the well-posedness of the forward problem. Although the numerical approximation of the full probability is computationally expensive, parallelised algorithms make the problem solvable using high-performance computing.
Collapse
Affiliation(s)
| | | | - Anotida Madzvamuse
- School of Mathematical and Physical Sciences, University of Sussex, Brighton, UK
| |
Collapse
|
15
|
|
16
|
Li X, Ma Z, Peng P, Guo X, Huang F, Wang X, Guo J. Supervised latent Dirichlet allocation with a mixture of sparse softmax. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.05.077] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Liao J, baoran li, Wang J, Qi Q, Li T. Rapid Relevance Feedback Strategy Based on Distributed CBIR System. INT J SEMANT WEB INF 2018. [DOI: 10.4018/ijswis.2018040101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article describes the capability of online data storage which has been enhanced by the emergence of cloud datacenter development. Distributed Hash Table (DHT) based image retrieval system using locality sensitive hash (LSH) has provided an efficient way to set up distributed Content Based Image Retrieval (CBIR) frameworks. However, with the fixed LSH function adopted, LSH and other codebook-based distributed retrieval systems are facing the problem of flexibility, and also are difficult to satisfy the user's demand. In this article, LRFMIR is proposed to introduce semantic search into DHT based CBIR system. LRFMIR is established on a DHT based network, where a flexible result truncating strategy is employed to fuse provided results by using multiple features measurements. Experiments show that LRFMIR provides a higher accuracy and recall rate than single feature employed retrieval systems, and possesses good load balancing and query efficiency performance.
Collapse
Affiliation(s)
- Jianxin Liao
- Beijing University of Posts and Telecommunications, Beijing, China
| | - baoran li
- Beijing University of Posts and Telecommunications, Beijing, China
| | - Jingyu Wang
- Beijing University of Posts and Telecommunications, Beijing, China
| | - Qi Qi
- Beijing University of Posts and Telecommunications, Beijing, China
| | - Tonghong Li
- Technical University of Madrid, Madrid, Spain
| |
Collapse
|
18
|
Model-based non-Gaussian interest topic distribution for user retweeting in social networks. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.04.078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
19
|
Lim KL, Wang H. Fast approximation of variational Bayes Dirichlet process mixture using the maximization–maximization algorithm. Int J Approx Reason 2018. [DOI: 10.1016/j.ijar.2017.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
20
|
|
21
|
|
22
|
Huang J, Xiao M. State of the art on road traffic sensing and learning based on mobile user network log data. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.03.096] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
23
|
Lai Y, Ping Y, Xiao K, Hao B, Zhang X. Variational Bayesian inference for a Dirichlet process mixture of beta distributions and application. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.07.068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
24
|
|
25
|
|
26
|
Improving deep neural networks with multi-layer maxout networks and a novel initialization method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.05.103] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
27
|
Ma Z, Xue JH, Leijon A, Tan ZH, Yang Z, Guo J. Decorrelation of Neutral Vector Variables: Theory and Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:129-143. [PMID: 27834653 DOI: 10.1109/tnnls.2016.2616445] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we propose novel strategies for neutral vector variable decorrelation. Two fundamental invertible transformations, namely, serial nonlinear transformation and parallel nonlinear transformation, are proposed to carry out the decorrelation. For a neutral vector variable, which is not multivariate-Gaussian distributed, the conventional principal component analysis cannot yield mutually independent scalar variables. With the two proposed transformations, a highly negatively correlated neutral vector can be transformed to a set of mutually independent scalar variables with the same degrees of freedom. We also evaluate the decorrelation performances for the vectors generated from a single Dirichlet distribution and a mixture of Dirichlet distributions. The mutual independence is verified with the distance correlation measurement. The advantages of the proposed decorrelation strategies are intensively studied and demonstrated with synthesized data and practical application evaluations.
Collapse
|
28
|
Wang Y, Liu F, Xia ST, Wu J. Link sign prediction by Variational Bayesian Probabilistic Matrix Factorization with Student-t Prior. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.04.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Zhang C, Xu W, Ma Z, Gao S, Li Q, Guo J. Construction of semantic bootstrapping models for relation extraction. Knowl Based Syst 2015. [DOI: 10.1016/j.knosys.2015.03.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
30
|
Ma Z, Teschendorff AE, Yu H, Taghia J, Guo J. Comparisons of non-Gaussian statistical models in DNA methylation analysis. Int J Mol Sci 2014; 15:10835-54. [PMID: 24937687 PMCID: PMC4100184 DOI: 10.3390/ijms150610835] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/12/2014] [Accepted: 06/10/2014] [Indexed: 12/25/2022] Open
Abstract
As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Collapse
Affiliation(s)
- Zhanyu Ma
- Pattern Recognition and Intelligent System Lab.,Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road,Beijing 100876, China.
| | - Andrew E Teschendorff
- Computational Systems Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai 200031, China.
| | - Hong Yu
- Pattern Recognition and Intelligent System Lab.,Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road,Beijing 100876, China.
| | - Jalil Taghia
- Communication Theory Lab., KTH - Royal Institute of Technology, Osquldas väg 10,10044 Stockholm, Sweden.
| | - Jun Guo
- Pattern Recognition and Intelligent System Lab.,Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road,Beijing 100876, China.
| |
Collapse
|