1
|
Ignatenko V, Surkov A, Koltcov S. Random forests with parametric entropy-based information gains for classification and regression problems. PeerJ Comput Sci 2024; 10:e1775. [PMID: 38196961 PMCID: PMC10773894 DOI: 10.7717/peerj-cs.1775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/04/2023] [Indexed: 01/11/2024]
Abstract
The random forest algorithm is one of the most popular and commonly used algorithms for classification and regression tasks. It combines the output of multiple decision trees to form a single result. Random forest algorithms demonstrate the highest accuracy on tabular data compared to other algorithms in various applications. However, random forests and, more precisely, decision trees, are usually built with the application of classic Shannon entropy. In this article, we consider the potential of deformed entropies, which are successfully used in the field of complex systems, to increase the prediction accuracy of random forest algorithms. We develop and introduce the information gains based on Renyi, Tsallis, and Sharma-Mittal entropies for classification and regression random forests. We test the proposed algorithm modifications on six benchmark datasets: three for classification and three for regression problems. For classification problems, the application of Renyi entropy allows us to improve the random forest prediction accuracy by 19-96% in dependence on the dataset, Tsallis entropy improves the accuracy by 20-98%, and Sharma-Mittal entropy improves accuracy by 22-111% compared to the classical algorithm. For regression problems, the application of deformed entropies improves the prediction by 2-23% in terms of R2 in dependence on the dataset.
Collapse
Affiliation(s)
- Vera Ignatenko
- Social and Cognitive Informatics Laboratory, National Research University Higher School of Economics, Saint-Petersburg, Russia
| | - Anton Surkov
- Social and Cognitive Informatics Laboratory, National Research University Higher School of Economics, Saint-Petersburg, Russia
| | - Sergei Koltcov
- Social and Cognitive Informatics Laboratory, National Research University Higher School of Economics, Saint-Petersburg, Russia
| |
Collapse
|
2
|
Koltcov S, Surkov A, Filippov V, Ignatenko V. Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics. PeerJ Comput Sci 2024; 10:e1758. [PMID: 38196953 PMCID: PMC10773852 DOI: 10.7717/peerj-cs.1758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 11/26/2023] [Indexed: 01/11/2024]
Abstract
Topic modeling is a widely used instrument for the analysis of large text collections. In the last few years, neural topic models and models with word embeddings have been proposed to increase the quality of topic solutions. However, these models were not extensively tested in terms of stability and interpretability. Moreover, the question of selecting the number of topics (a model parameter) remains a challenging task. We aim to partially fill this gap by testing four well-known and available to a wide range of users topic models such as the embedded topic model (ETM), Gaussian Softmax distribution model (GSM), Wasserstein autoencoders with Dirichlet prior (W-LDA), and Wasserstein autoencoders with Gaussian Mixture prior (WTM-GMM). We demonstrate that W-LDA, WTM-GMM, and GSM possess poor stability that complicates their application in practice. ETM model with additionally trained embeddings demonstrates high coherence and rather good stability for large datasets, but the question of the number of topics remains unsolved for this model. We also propose a new topic model based on granulated sampling with word embeddings (GLDAW), demonstrating the highest stability and good coherence compared to other considered models. Moreover, the optimal number of topics in a dataset can be determined for this model.
Collapse
Affiliation(s)
- Sergei Koltcov
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, Saint-Petersburg, Russia
| | - Anton Surkov
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, Saint-Petersburg, Russia
| | - Vladimir Filippov
- Scientific Research Institute for Optoelectronic Instrument Engineering, Sosnovy Bor, Leningrad Region, Russia
| | - Vera Ignatenko
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, Saint-Petersburg, Russia
| |
Collapse
|
3
|
Onyeaju MC, Omugbe E, Onate CA, Okon IB, Eyube ES, Okorie US, Ikot AN, Ogwu DA, Osuhor PO. Information theory and thermodynamic properties of diatomic molecules using molecular potential. J Mol Model 2023; 29:311. [PMID: 37698769 DOI: 10.1007/s00894-023-05708-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 08/25/2023] [Indexed: 09/13/2023]
Abstract
Owing to the devise applications of molecules in industries, the bound state solution of the non-relativistic wave equation with a molecular potential function has been obtained in a closed-form using the Nikiforov-Uvarov method. The solutions of the bound state are then applied to study the information-theoretic measures such as the one-dimensional Shannon and Renyi entropic densities. The expectation values for the position and momentum spaces were obtained to verify the Heisenberg's uncertainty principle. Utilizing the energy spectrum equation, the thermodynamic vibrational partition function is obtained via the Poisson summation. Other thermodynamic function variations with absolute temperature have been obtained numerically for four diatomic molecules (H2, N2, O2, and HF) using Maple 18 software. The Shannon global entropic sum inequality has also been verified. The Renyi sum for constrained index parameters satisfies the global entropic inequality. The thermodynamic properties of the four molecules are similar and conform to works reported in the existing literature. The obtained vibrational energies are in fair agreement with the ones obtained using other forms of potential energy. The result further indicates that the lowest bounds for the Shannon, Renyi, and Heisenberg inequalities are ground states phenomena.
Collapse
Affiliation(s)
- M C Onyeaju
- Theoretical Physics Group, Department of Physics, University of Port Harcourt, Choba, Rivers State, Nigeria.
- Department of Physics, Faculty of Basic and Applied Science, University of Africa, Toru-Orua, Bayelsa State, Nigeria.
| | - E Omugbe
- Department of Physics, University of Agriculture and Environmental Sciences, P.M.B. 1038, Umuagwo, Imo State, Nigeria
| | - C A Onate
- Department of Physics, Kogi State University, Anyigba, Kogi State, Nigeria
| | - I B Okon
- Theoretical Physics Group, Department of Physics, University of Uyo, Uyo, Nigeria
| | - E S Eyube
- Department of Physics, Faculty of Physical Sciences, Modibbo Adama University, P.M.B. 2076, Yola, Adamawa State, Nigeria
| | - U S Okorie
- Department of Physics, Akwa Ibom State University, P.M.B 1167, Ikot AkpodenUyo, Nigeria
| | - A N Ikot
- Theoretical Physics Group, Department of Physics, University of Port Harcourt, Choba, Rivers State, Nigeria
| | - D A Ogwu
- Department of Physics, Faculty of Science, University of Delta, Agbor, Delta State, Nigeria
| | - P O Osuhor
- Department of Physics, Faculty of Science, University of Delta, Agbor, Delta State, Nigeria
| |
Collapse
|
4
|
Qiang C, Li Z, Deng Y. Multifractal analysis of mass function. Soft comput 2023; 27:1-14. [PMID: 37362275 PMCID: PMC10233544 DOI: 10.1007/s00500-023-08502-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2023] [Indexed: 06/28/2023]
Abstract
In order to explore the fractal characteristic in Dempster-Shafer evidence theory, a fractal dimension of mass function is proposed recently, to reveal the invariance of scale of belief entropy. When mass function degenerates to probability, the fractal dimension is equivalent to classical Renyi information dimension only with α = 1 , which can measure the change rate of Shannon entropy with the size of framework. For Renyi dimension, different parameters α represent the relationship between different entropies and framework size. However, this compatibility is not shown in existing fractal dimension. Thus, in this paper, we introduce parameter α to generalize the existing dimension. Due to the diversity of the value of α , we name the new dimension: multifractal dimension of mass function. In addition, inspired by multifractal spectrum of Cantor set, we explore the relation between the belief degree of focal element and the number of focal element with same belief degree for some special assignments. Relevant results are also presented by spectrum. We provide a static discounting coefficient generating method to modify mass function to improve the accuracy of classify result. The experiment is conducted in three datasets, and the result shows the effectiveness of our method.
Collapse
Affiliation(s)
- Chenhui Qiang
- Institute of Fundamental and Frontier Science, University of Electronic Science and Technology of China, Chengdu, 610054 China
- Yingcai Honors College, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Zhen Li
- China Mobile Information Technology Center, Beijing, 100029 China
| | - Yong Deng
- School of Medicine, Vanderbilt University, Nashville, TN 37240 USA
| |
Collapse
|
5
|
Chakraborti S, Karmakar A, Guha R, Ngan C, Kumar Das R, Whitaker N. Induction of epithelial to mesenchymal transition in HPV16 E6/E7 oncogene transfected C33A cell line. Tissue Cell 2023; 82:102041. [PMID: 36827821 DOI: 10.1016/j.tice.2023.102041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 02/03/2023] [Accepted: 02/15/2023] [Indexed: 02/19/2023]
Abstract
This study focuses on the induction of EMT by HPV16 in the C33A cell line. Expression of β-catenin, EMT-transcription factors (EMT-TFs), and c-myc in the nuclei of HPV16 E6/E7 oncogene transfected and non-transfected C33A cells were investigated through immunofluorescence and RT-PCR. Microphotographs of β-catenin, c-myc, and DAPI-stained nuclei were processed and analyzed by Python and ImageJ respectively. Microphotographs of immunocytochemically stained transfected and control cells were then processed and analyzed with the help of ImageJ and Python programming. The intensity and the integrated density of β-catenin were computed at the cell membrane area as well as the cytoplasmic area along with the integrated density of c-myc and Renyi entropy of DAPI-stained nuclei was quantified by ImageJ software. Python programming was implemented to determine the total percentage of white pixels depicting the presence of β-catenin in the cytoplasmic area of cells. The signal of β-catenin at the cytoplasmic area was found significantly higher in transfected samples which implies the nuclear accumulation of β-catenin. The expression of the c-myc protein was found significantly higher in transfected cells along with significantly higher nuclear entropy. RT-PCR result shows two folds of up-regulation of EMT-TFs Snail1, Twist1, and Zeb2 and down-regulation of Snail2 and Twist2. The study concludes that HPV16 E6/E7 oncogene can induce EMT.
Collapse
Affiliation(s)
- Sourangshu Chakraborti
- Centre for Biomaterials, Cellular and Molecular Theranostics, Vellore Institute of Technology, Vellore, Tamilnadu, India
| | - Aparajita Karmakar
- Department of Data Science, Prasanna School of Public Health, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | - Riana Guha
- School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamilnadu, India
| | - Christopher Ngan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Raunak Kumar Das
- Centre for Biomaterials, Cellular and Molecular Theranostics, Vellore Institute of Technology, Vellore, Tamilnadu, India; School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia.
| | - Noel Whitaker
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| |
Collapse
|
6
|
Muhammad M, Liu L, Abba B, Muhammad I, Bouchane M, Zhang H, Musa S. A New Extension of the Topp-Leone-Family of Models with Applications to Real Data. Ann Data Sci 2022; 10:225-250. [PMID: 38625258 PMCID: PMC9579674 DOI: 10.1007/s40745-022-00456-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 07/28/2022] [Accepted: 09/26/2022] [Indexed: 10/29/2022]
Abstract
In this article, we proposed a new extension of the Topp-Leone family of distributions. Some important properties of the model are developed, such as quantile function, stochastic ordering, model series representation, moments, stress-strength reliability parameter, Renyi entropy, order statistics, and moment of residual life. A particular member called new extended Topp-Leone exponential (NETLE) is discussed. Maximum likelihood estimation (MLE), least-square estimation (LSE), and percentile estimation (PE) are used for the model parameter estimation. Simulation studies were conducted using NETLE to assess the MLE, LSE, and PE performance by examining their bias and mean square error (MSE), and the result was satisfactory. Finally, the applications of the NETLE to two real data sets are provided to illustrate the importance of the NETLG families in practice; the data sets consist of daily new deaths due to COVID-19 in California and New Jersey, USA. The new model outperformed many other existing Topp-Leone's and exponential related distributions based on the real data illustrations.
Collapse
Affiliation(s)
- Mustapha Muhammad
- Department of Mathematics, Guangdong University of Petrochemical Technology, Maoming, 525000 China
| | - Lixia Liu
- School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 People’s Republic of China
| | - Badamasi Abba
- School of Mathematics and Statistics, Central South University, Changsha, China
- Department of Mathematics, Yusuf Maitama Sule University, Kano, Nigeria
| | - Isyaku Muhammad
- School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, 611731 China
| | - Mouna Bouchane
- Key Laboratory of Augmented Reality College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang, China
| | - Hexin Zhang
- School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 People’s Republic of China
| | - Sani Musa
- Department of Mathematics and Computer Science, Sule Lamido University, Kafin-Hausa, Jigawa Nigeria
| |
Collapse
|
7
|
Tong Q, Liu Z, Lu F, Feng Z, Wan Q. A New De-Noising Method Based on Enhanced Time-Frequency Manifold and Kurtosis-Wavelet Dictionary for Rolling Bearing Fault Vibration Signal. Sensors (Basel) 2022; 22:6108. [PMID: 36015870 PMCID: PMC9413349 DOI: 10.3390/s22166108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/11/2022] [Accepted: 08/12/2022] [Indexed: 06/15/2023]
Abstract
The transient pulses caused by local faults of rolling bearings are an important measurement information for fault diagnosis. However, extracting transient pulses from complex nonstationary vibration signals with a large amount of background noise is challenging, especially in the early stage. To improve the anti-noise ability and detect incipient faults, a novel signal de-noising method based on enhanced time-frequency manifold (ETFM) and kurtosis-wavelet dictionary is proposed. First, to mine the high-dimensional features, the C-C method and Cao's method are combined to determine the embedding dimension and delay time of phase space reconstruction. Second, the input parameters of the liner local tangent space arrangement (LLTSA) algorithm are determined by the grid search method based on Renyi entropy, and the dimension is reduced by manifold learning to obtain the ETFM with the highest time-frequency aggregation. Finally, a kurtosis-wavelet dictionary is constructed for selecting the best atom and eliminating the noise and reconstruct the defective signal. Actual simulations showed that the proposed method is more effective in noise suppression than traditional algorithms and that it can accurately reproduce the amplitude and phase information of the raw signal.
Collapse
Affiliation(s)
- Qingbin Tong
- School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China
- Beijing Rail Transit Electrical Engineering Technology Research Center, Beijing 100044, China
| | - Ziyu Liu
- School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Feiyu Lu
- School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Ziwei Feng
- School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China
| | - Qingzhu Wan
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
| |
Collapse
|
8
|
Koltcov S, Ignatenko V, Terpilovskii M, Rosso P. Analysis and tuning of hierarchical topic models based on Renyi entropy approach. PeerJ Comput Sci 2021; 7:e608. [PMID: 34401473 PMCID: PMC8330431 DOI: 10.7717/peerj-cs.608] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 05/31/2021] [Indexed: 06/13/2023]
Abstract
Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the "correct" number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy.
Collapse
Affiliation(s)
- Sergei Koltcov
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, Russia
| | - Vera Ignatenko
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, Russia
| | - Maxim Terpilovskii
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, Russia
| | - Paolo Rosso
- Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg, Russia
- Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia, Spain
| |
Collapse
|
9
|
Sheela P, Puthankattil SD. A noise-robust sparse approach to the time-frequency representation of visual evoked potentials. Comput Biol Med 2021; 135:104561. [PMID: 34153788 DOI: 10.1016/j.compbiomed.2021.104561] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 06/06/2021] [Accepted: 06/06/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND Visual evoked potential (VEP) offers a promising research strategy in the effort to characterise brain disorders. Pertinent signal processing techniques enable the development of potential applications of VEP. A joint time-frequency (TF) representation provides more comprehensive information about the underlying complex structures of these signals than individual time or frequency analysis. However, this representation comes at the expense of low TF resolution, increased data volume, poor energy concentration and increased computational time. Owing to the high non-stationarity and low signal-to-noise ratio of VEP, a TF representation that retains only the pertinent components is indispensable. METHOD The objective of this study is to investigate and demonstrate the ability of various TF approaches to provide an energy-concentrated and sparse TF representation of VEP. The performance of each method has been assessed for its energy concentration and reconstruction ability on both simulated and real VEPs. Renyi entropy, computation time and correlation coefficient are chosen as the performance measures for the assessment. RESULTS In comparison with the other state-of-the-art approaches, Synchroextracting transform (SET) exhibits the lowest Renyi entropy and the highest correlation coefficient, thereby ensuring a compact TF representation for the better characterisation of VEP signals. These results are also statistically verified through the Friedman test (p<0.001). CONCLUSION SET assures a powerful TF framework with improved energy concentration at a faster pace while remaining invertible and preserving vital information.
Collapse
Affiliation(s)
- Priyalakshmi Sheela
- Department of Electrical Engineering, National Institute of Technology, Calicut, 673601, Kerala, India.
| | - Subha D Puthankattil
- Department of Electrical Engineering, National Institute of Technology, Calicut, 673601, Kerala, India.
| |
Collapse
|
10
|
Bitner A, Fialkowski M. Entropy of the Land Parcel Mosaic as a Measure of the Degree of Urbanization. Entropy (Basel) 2021; 23:543. [PMID: 33925218 DOI: 10.3390/e23050543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 04/15/2021] [Accepted: 04/26/2021] [Indexed: 11/17/2022]
Abstract
Quantifying the urbanization level is an essential yet challenging task in urban studies because of the high complexity of this phenomenon. The urbanization degree has been estimated using a variety of social, economic, and spatial measures. Among the spatial characteristics, the Shannon entropy of the landscape pattern has recently been intensively explored as one of the most effective urbanization indexes. Here, we introduce a new measure of the spatial entropy of land that characterizes its parcel mosaic, the structure resulting from the division of land into cadastral parcels. We calculate the entropies of the parcel areas’ distribution function in different portions of the urban systems. We have established that the Shannon and Renyi entropies R0 and R1/2 are most effective at differentiating the degree of a spatial organization of the land. Our studies are based on 30 urban systems located in the USA, Australia, and Poland, and three desert areas from Australia. In all the cities, the entropies behave the same as functions of the distance from the center. They attain the lowest values in the city core and reach substantially higher values in suburban areas. Thus, the parcel mosaic entropies provide a spatial characterization of land to measure its urbanization level effectively.
Collapse
|
11
|
Abstract
This article explores a graph clustering method that is derived from an information theoretic method that clusters points in R n relying on Renyi entropy, which involves computing the usual Euclidean distance between these points. Two view points are adopted: (1) the graph to be clustered is first embedded into R d for some dimension d so as to minimize the distortion of the embedding, then the resulting points are clustered, and (2) the graph is clustered directly, using as distance the shortest path distance for undirected graphs, and a variation of the Jaccard distance for directed graphs. In both cases, a hierarchical approach is adopted, where both the initial clustering and the agglomeration steps are computed using Renyi entropy derived evaluation functions. Numerical examples are provided to support the study, showing the consistency of both approaches (evaluated in terms of F-scores).
Collapse
Affiliation(s)
- Frédérique Oggier
- Division of Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Anwitaman Datta
- School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
12
|
Lahmiri S, Bekiros S. Renyi entropy and mutual information measurement of market expectations and investor fear during the COVID-19 pandemic. Chaos Solitons Fractals 2020; 139:110084. [PMID: 32834621 PMCID: PMC7347498 DOI: 10.1016/j.chaos.2020.110084] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Accepted: 07/02/2020] [Indexed: 05/22/2023]
Abstract
The COVID-19 pandemic has seriously affected world economies. In this regard, it is expected that information level and sharing between equity, digital currency, and energy markets has been altered due to the pandemic outbreak. Specifically, the resulting twisted risk among markets is presumed to rise during the abnormal state of world economy. The purpose of the current study is twofold. First, by using Renyi entropy, we analyze the multiscale entropy function in the return time series of Bitcoin, S&P500, WTI, Brent, Gas, Gold, Silver, and investor fear index represented by VIX. Second, by estimating mutual information, we analyze the information sharing between these markets. The analyses are conducted before and during the COVID-19 pandemic. The empirical results from Renyi entropy indicate that for all market indices, randomness and disorder are more concentrated in less probable events. The empirical results from mutual information showed that the information sharing network between markets has changed during the COVID-19 pandemic. From a managerial perspective, we conclude that during the pandemic (i) portfolios composed of Bitcoin and Silver, Bitcoin and WTI, Bitcoin and Gold, Bitcoin and Brent, or Bitcoin and S&P500 could be risky, (ii) diversification opportunities exist by investing in portfolios composed of Gas and Silver, Gold and Silver, Gold and Gas, Brent and Silver, Brent and Gold, or Bitcoin and Gas, and that (iii) the VIX exhibited the lowest level of information disorder at all scales before and during the pandemic. Thus, it seems that the pandemic has not influenced the expectations of investors. Our results provide an insight of the response of stocks, cryptocurrencies, energy, precious metal markets, to expectations of investors in the aftermath of the COVID-19 pandemic in terms of information ordering and sharing.
Collapse
Affiliation(s)
- Salim Lahmiri
- Department of Supply Chain and Business Technology Management, John Molson School of Business, Concordia University, Montreal, Canada
| | - Stelios Bekiros
- Department of Economics, European University Institute, Florence, Italy
- Rimini Centre for Economic Analysis, Wilfrid Laurier University, Waterloo, Canada
| |
Collapse
|
13
|
Koltcov S, Ignatenko V. Renormalization Analysis of Topic Models. Entropy (Basel) 2020; 22:e22050556. [PMID: 33286328 PMCID: PMC7517079 DOI: 10.3390/e22050556] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 11/16/2022]
Abstract
In practice, to build a machine learning model of big data, one needs to tune model parameters. The process of parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory of statistical physics provides techniques allowing us to optimize this process. The paper shows that a function of the output of topic modeling demonstrates self-similar behavior under variation of the number of clusters. Such behavior allows using a renormalization technique. A combination of renormalization procedure with the Renyi entropy approach allows for quick searching of the optimal number of topics. In this paper, the renormalization procedure is developed for the probabilistic Latent Semantic Analysis (pLSA), and the Latent Dirichlet Allocation model with variational Expectation-Maximization algorithm (VLDA) and the Latent Dirichlet Allocation model with granulated Gibbs sampling procedure (GLDA). The experiments were conducted on two test datasets with a known number of topics in two different languages and on one unlabeled test dataset with an unknown number of topics. The paper shows that the renormalization procedure allows for finding an approximation of the optimal number of topics at least 30 times faster than the grid search without significant loss of quality.
Collapse
|
14
|
Koltcov S, Ignatenko V, Boukhers Z, Staab S. Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy. Entropy (Basel) 2020; 22:e22040394. [PMID: 33286169 PMCID: PMC7516868 DOI: 10.3390/e22040394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 03/24/2020] [Accepted: 03/25/2020] [Indexed: 11/16/2022]
Abstract
Topic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is inspired by the concepts from statistical physics, where an inferred topical structure of a collection can be considered an information statistical system residing in a non-equilibrium state. By testing our approach on four models-Probabilistic Latent Semantic Analysis (pLSA), Additive Regularization of Topic Models (BigARTM), Latent Dirichlet Allocation (LDA) with Gibbs sampling, LDA with variational inference (VLDA)-we, first of all, show that the minimum of Renyi entropy coincides with the "true" number of topics, as determined in two labelled collections. Simultaneously, we find that Hierarchical Dirichlet Process (HDP) model as a well-known approach for topic number optimization fails to detect such optimum. Next, we demonstrate that large values of the regularization coefficient in BigARTM significantly shift the minimum of entropy from the topic number optimum, which effect is not observed for hyper-parameters in LDA with Gibbs sampling. We conclude that regularization may introduce unpredictable distortions into topic models that need further research.
Collapse
Affiliation(s)
- Sergei Koltcov
- National Research University Higher School of Economics, Soyuza Pechatnikov Street 16, 190121 St Petersburg, Russia;
- Correspondence: ; Tel.: +7-911-981-9165
| | - Vera Ignatenko
- National Research University Higher School of Economics, Soyuza Pechatnikov Street 16, 190121 St Petersburg, Russia;
| | - Zeyd Boukhers
- Institute for Web Science and Technologies, Universität Koblenz-Landau, Universitätsstrasse 1, 56070 Koblenz, Germany;
| | - Steffen Staab
- Institute for Parallel and Distributed Systems (IPVS), Universität Stuttgart, Universitätsstraße 32, 50569 Stuttgart, Germany;
- Web and Internet Science Research Group, University of Southampton, University Road, Southampton SO17 1BJ, UK
| |
Collapse
|
15
|
Muhammad M, Liu L. A New Extension of the Generalized Half Logistic Distribution with Applications to Real Data. Entropy (Basel) 2019; 21:E339. [PMID: 33267053 DOI: 10.3390/e21040339] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 03/18/2019] [Accepted: 03/25/2019] [Indexed: 11/17/2022]
Abstract
In this paper, we introduced a new three-parameter probability model called Poisson generalized half logistic (PoiGHL). The new model possesses an increasing, decreasing, unimodal and bathtub failure rates depending on the parameters. The relationship of PoiGHL with the exponentiated Weibull Poisson (EWP), Poisson exponentiated Erlang-truncated exponential (PEETE), and Poisson generalized Gompertz (PGG) model is discussed. We also characterized the PoiGHL sub model, i.e the half logistic Poisson (HLP), based on certain functions of a random variable by truncated moments. Several mathematical and statistical properties of the PoiGHL are investigated such as moments, mean deviations, Bonferroni and Lorenz curves, order statistics, Shannon and Renyi entropy, Kullback-Leibler divergence, moments of residual life, and probability weighted moments. Estimation of the model parameters was achieved by maximum likelihood technique and assessed by simulation studies. The stress-strength analysis was discussed in detail based on maximum likelihood estimation (MLE), we derived the asymptotic confidence interval of R=P(X1<X2) based on the MLEs, and examine by simulation studies. In three applications to real data set PoiGHL provided better fit and outperform some other popular distributions. In the stress-strength parameter estimation PoiGHL model illustrated as a reliable choice in reliability analysis as shown using two real data set.
Collapse
|
16
|
Zhou H, Shi T, Liao G, Xuan J, Duan J, Su L, He Z, Lai W. Weighted Kernel Entropy Component Analysis for Fault Diagnosis of Rolling Bearings. Sensors (Basel) 2017; 17:E625. [PMID: 28335480 PMCID: PMC5375911 DOI: 10.3390/s17030625] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 03/13/2017] [Accepted: 03/15/2017] [Indexed: 11/03/2022]
Abstract
This paper presents a supervised feature extraction method called weighted kernel entropy component analysis (WKECA) for fault diagnosis of rolling bearings. The method is developed based on kernel entropy component analysis (KECA) which attempts to preserve the Renyi entropy of the data set after dimension reduction. It makes full use of the labeled information and introduces a weight strategy in the feature extraction. The class-related weights are introduced to denote differences among the samples from different patterns, and genetic algorithm (GA) is implemented to seek out appropriate weights for optimizing the classification results. The features based on wavelet packet decomposition are derived from the original signals. Then the intrinsic geometric features extracted by WKECA are fed into the support vector machine (SVM) classifier to recognize different operating conditions of bearings, and we obtain the overall accuracy (97%) for the experimental samples. The experimental results demonstrated the feasibility and effectiveness of the proposed method.
Collapse
Affiliation(s)
- Hongdi Zhou
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Tielin Shi
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Guanglan Liao
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Jianping Xuan
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Jie Duan
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Lei Su
- School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China.
| | - Zhenzhi He
- School of Mechanical & Electrical Engineering, Jiangsu Normal University, Xuzhou 221116, China.
| | - Wuxing Lai
- State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| |
Collapse
|
17
|
Tripathy RK, Sharma LN, Dandapat S. Detection of Shockable Ventricular Arrhythmia using Variational Mode Decomposition. J Med Syst 2016; 40:79. [PMID: 26798076 DOI: 10.1007/s10916-016-0441-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Accepted: 01/11/2016] [Indexed: 10/22/2022]
Abstract
Ventricular tachycardia (VT) and ventricular fibrillation (VF) are shockable ventricular cardiac ailments. Detection of VT/VF is one of the important step in both automated external defibrillator (AED) and implantable cardioverter defibrillator (ICD) therapy. In this paper, we propose a new method for detection and classification of shockable ventricular arrhythmia (VT/VF) and non-shockable ventricular arrhythmia (normal sinus rhythm, ventricular bigeminy, ventricular ectopic beats, and ventricular escape rhythm) episodes from Electrocardiogram (ECG) signal. The variational mode decomposition (VMD) is used to decompose the ECG signal into number of modes or sub-signals. The energy, the renyi entropy and the permutation entropy of first three modes are evaluated and these values are used as diagnostic features. The mutual information based feature scoring is employed to select optimal set of diagnostic features. The performance of the diagnostic features is evaluated using random forest (RF) classifier. Experimental results reveal that, the feature subset derived from mutual information based scoring and the RF classifier produces accuracy, sensitivity and specificity values of 97.23 %, 96.54 %, and 97.97 %, respectively. The proposed method is compared with some of the existing techniques for detection of shockable ventricular arrhythmia episodes from ECG.
Collapse
|
18
|
Cornforth DJ, Tarvainen MP, Jelinek HF. How to Calculate Renyi Entropy from Heart Rate Variability, and Why it Matters for Detecting Cardiac Autonomic Neuropathy. Front Bioeng Biotechnol 2014; 2:34. [PMID: 25250311 PMCID: PMC4159033 DOI: 10.3389/fbioe.2014.00034] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 08/23/2014] [Indexed: 11/21/2022] Open
Abstract
Cardiac autonomic neuropathy (CAN) is a disease that involves nerve damage leading to an abnormal control of heart rate. An open question is to what extent this condition is detectable from heart rate variability (HRV), which provides information only on successive intervals between heart beats, yet is non-invasive and easy to obtain from a three-lead ECG recording. A variety of measures may be extracted from HRV, including time domain, frequency domain, and more complex non-linear measures. Among the latter, Renyi entropy has been proposed as a suitable measure that can be used to discriminate CAN from controls. However, all entropy methods require estimation of probabilities, and there are a number of ways in which this estimation can be made. In this work, we calculate Renyi entropy using several variations of the histogram method and a density method based on sequences of RR intervals. In all, we calculate Renyi entropy using nine methods and compare their effectiveness in separating the different classes of participants. We found that the histogram method using single RR intervals yields an entropy measure that is either incapable of discriminating CAN from controls, or that it provides little information that could not be gained from the SD of the RR intervals. In contrast, probabilities calculated using a density method based on sequences of RR intervals yield an entropy measure that provides good separation between groups of participants and provides information not available from the SD. The main contribution of this work is that different approaches to calculating probability may affect the success of detecting disease. Our results bring new clarity to the methods used to calculate the Renyi entropy in general, and in particular, to the successful detection of CAN.
Collapse
Affiliation(s)
- David J. Cornforth
- Applied Informatics Research Group, Faculty of Science and IT, The University of Newcastle, Callaghan, NSW, Australia
| | - Mika P. Tarvainen
- University of Eastern Finland, Kuopio, Finland
- Kuopio University Hospital, Kuopio, Finland
| | - Herbert F. Jelinek
- Applied Informatics Research Group, Faculty of Science and IT, The University of Newcastle, Callaghan, NSW, Australia
- Charles Sturt University, Albury, NSW, Australia
| |
Collapse
|
19
|
Wang X, Jiao Y, Tang T, Wang H, Lu Z. Investigating univariate temporal patterns for intrinsic connectivity networks based on complexity and low-frequency oscillation: a test-retest reliability study. Neuroscience 2013; 254:404-26. [PMID: 24042040 DOI: 10.1016/j.neuroscience.2013.09.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/18/2013] [Accepted: 09/04/2013] [Indexed: 11/25/2022]
Abstract
Intrinsic connectivity networks (ICNs) are composed of spatial components and time courses. The spatial components of ICNs were discovered with moderate-to-high reliability. So far as we know, few studies focused on the reliability of the temporal patterns for ICNs based their individual time courses. The goals of this study were twofold: to investigate the test-retest reliability of temporal patterns for ICNs, and to analyze these informative univariate metrics. Additionally, a correlation analysis was performed to enhance interpretability. Our study included three datasets: (a) short- and long-term scans, (b) multi-band echo-planar imaging (mEPI), and (c) eyes open or closed. Using dual regression, we obtained the time courses of ICNs for each subject. To produce temporal patterns for ICNs, we applied two categories of univariate metrics: network-wise complexity and network-wise low-frequency oscillation. Furthermore, we validated the test-retest reliability for each metric. The network-wise temporal patterns for most ICNs (especially for default mode network, DMN) exhibited moderate-to-high reliability and reproducibility under different scan conditions. Network-wise complexity for DMN exhibited fair reliability (ICC<0.5) based on eyes-closed sessions. Specially, our results supported that mEPI could be a useful method with high reliability and reproducibility. In addition, these temporal patterns were with physiological meanings, and certain temporal patterns were correlated to the node strength of the corresponding ICN. Overall, network-wise temporal patterns of ICNs were reliable and informative and could be complementary to spatial patterns of ICNs for further study.
Collapse
Affiliation(s)
- X Wang
- School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China; Key Laboratory of Child Development and Learning Science (Ministry of Education), Southeast University, Nanjing 210096, China
| | | | | | | | | |
Collapse
|