1
|
Xin G, Fan P, Letaief KB. Semantic Communication: A Survey of Its Theoretical Development. Entropy (Basel) 2024; 26:102. [PMID: 38392357 PMCID: PMC10888479 DOI: 10.3390/e26020102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 01/20/2024] [Accepted: 01/22/2024] [Indexed: 02/24/2024]
Abstract
In recent years, semantic communication has received significant attention from both academia and industry, driven by the growing demands for ultra-low latency and high-throughput capabilities in emerging intelligent services. Nonetheless, a comprehensive and effective theoretical framework for semantic communication has yet to be established. In particular, finding the fundamental limits of semantic communication, exploring the capabilities of semantic-aware networks, or utilizing theoretical guidance for deep learning in semantic communication are very important yet still unresolved issues. In general, the mathematical theory of semantic communication and the mathematical representation of semantics are referred to as semantic information theory. In this paper, we introduce the pertinent advancements in semantic information theory. Grounded in the foundational work of Claude Shannon, we present the latest developments in semantic entropy, semantic rate-distortion, and semantic channel capacity. Additionally, we analyze some open problems in semantic information measurement and semantic coding, providing a theoretical basis for the design of a semantic communication system. Furthermore, we carefully review several mathematical theories and tools and evaluate their applicability in the context of semantic communication. Finally, we shed light on the challenges encountered in both semantic communication and semantic information theory.
Collapse
Affiliation(s)
- Gangtao Xin
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Pingyi Fan
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
- Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Khaled B Letaief
- Department of Electrical and Computer Engineering, Hong Kong University of Science and Technology (HKUST), Hong Kong 999077
| |
Collapse
|
2
|
Quétant G, Belousov Y, Kinakh V, Voloshynovskiy S. TURBO: The Swiss Knife of Auto-Encoders. Entropy (Basel) 2023; 25:1471. [PMID: 37895592 PMCID: PMC10606332 DOI: 10.3390/e25101471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/13/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023]
Abstract
We present a novel information-theoretic framework, termed as TURBO, designed to systematically analyse and generalise auto-encoding methods. We start by examining the principles of information bottleneck and bottleneck-based networks in the auto-encoding setting and identifying their inherent limitations, which become more prominent for data with multiple relevant, physics-related representations. The TURBO framework is then introduced, providing a comprehensive derivation of its core concept consisting of the maximisation of mutual information between various data representations expressed in two directions reflecting the information flows. We illustrate that numerous prevalent neural network models are encompassed within this framework. The paper underscores the insufficiency of the information bottleneck concept in elucidating all such models, thereby establishing TURBO as a preferable theoretical reference. The introduction of TURBO contributes to a richer understanding of data representation and the structure of neural network models, enabling more efficient and versatile applications.
Collapse
Affiliation(s)
- Guillaume Quétant
- Centre Universitaire d’Informatique, Université de Genève, Route de Drize 7, CH-1227 Carouge, Switzerland; (Y.B.); (V.K.)
| | | | | | - Slava Voloshynovskiy
- Centre Universitaire d’Informatique, Université de Genève, Route de Drize 7, CH-1227 Carouge, Switzerland; (Y.B.); (V.K.)
| |
Collapse
|
3
|
Györfi L, Linder T, Walk H. Lossless Transformations and Excess Risk Bounds in Statistical Inference. Entropy (Basel) 2023; 25:1394. [PMID: 37895515 PMCID: PMC10606681 DOI: 10.3390/e25101394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/26/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023]
Abstract
We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss when estimating a random variable from an observed feature vector and the minimum expected loss when estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless, and we show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a δ-lossless transformation and give sufficient conditions for a given transformation to be universally δ-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottlenecks, and deep learning are also surveyed.
Collapse
Affiliation(s)
- László Györfi
- Department of Computer Science and Information Theory, Budapest University of Technology and Economics, H-1111 Budapest, Hungary;
| | - Tamás Linder
- Department of Mathematics and Statistics, Queen’s University, Kingston, ON K7L 3N6, Canada
| | - Harro Walk
- Fachbereich Mathematik, Universität Stuttgart, 70569 Stuttgart, Germany;
| |
Collapse
|
4
|
Ngo DD, Vo VL, Nguyen T, Nguyen MH, Le MH. Image-Based Ship Detection Using Deep Variational Information Bottleneck. Sensors (Basel) 2023; 23:8093. [PMID: 37836922 PMCID: PMC10574962 DOI: 10.3390/s23198093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 10/15/2023]
Abstract
Image-based ship detection is a critical function in maritime security. However, lacking high-quality training datasets makes it challenging to train a robust supervision deep learning model. Conventional methods use data augmentation to increase training samples. This approach is not robust because the data augmentation may not present a complex background or occlusion well. This paper proposes to use an information bottleneck and a reparameterization trick to address the challenge. The information bottleneck learns features that focus only on the object and eliminate all backgrounds. It helps to avoid background variance. In addition, the reparameterization introduces uncertainty during the training phase. It helps to learn more robust detectors. Comprehensive experiments show that the proposed method outperforms conventional methods on Seaship datasets, especially when the number of training samples is small. In addition, this paper discusses how to integrate the information bottleneck and the reparameterization into well-known object detection frameworks efficiently.
Collapse
Affiliation(s)
- Duc-Dat Ngo
- Faculty of Electrical and Electronics Engineering, University of Technology and Education, Ho Chi Minh City 7000, Vietnam; (D.-D.N.); (V.-L.V.)
| | - Van-Linh Vo
- Faculty of Electrical and Electronics Engineering, University of Technology and Education, Ho Chi Minh City 7000, Vietnam; (D.-D.N.); (V.-L.V.)
| | - Tri Nguyen
- Faculty of Information Technology, Industrial University of Ho Chi Minh City, Ho Chi Minh City 7000, Vietnam;
| | - Manh-Hung Nguyen
- Faculty of Electrical and Electronics Engineering, University of Technology and Education, Ho Chi Minh City 7000, Vietnam; (D.-D.N.); (V.-L.V.)
| | - My-Ha Le
- Faculty of Electrical and Electronics Engineering, University of Technology and Education, Ho Chi Minh City 7000, Vietnam; (D.-D.N.); (V.-L.V.)
| |
Collapse
|
5
|
Charvin H, Catenacci Volpi N, Polani D. Exact and Soft Successive Refinement of the Information Bottleneck. Entropy (Basel) 2023; 25:1355. [PMID: 37761653 PMCID: PMC10528077 DOI: 10.3390/e25091355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/08/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
The information bottleneck (IB) framework formalises the essential requirement for efficient information processing systems to achieve an optimal balance between the complexity of their representation and the amount of information extracted about relevant features. However, since the representation complexity affordable by real-world systems may vary in time, the processing cost of updating the representations should also be taken into account. A crucial question is thus the extent to which adaptive systems can leverage the information content of already existing IB-optimal representations for producing new ones, which target the same relevant features but at a different granularity. We investigate the information-theoretic optimal limits of this process by studying and extending, within the IB framework, the notion of successive refinement, which describes the ideal situation where no information needs to be discarded for adapting an IB-optimal representation's granularity. Thanks in particular to a new geometric characterisation, we analytically derive the successive refinability of some specific IB problems (for binary variables, for jointly Gaussian variables, and for the relevancy variable being a deterministic function of the source variable), and provide a linear-programming-based tool to numerically investigate, in the discrete case, the successive refinement of the IB. We then soften this notion into a quantification of the loss of information optimality induced by several-stage processing through an existing measure of unique information. Simple numerical experiments suggest that this quantity is typically low, though not entirely negligible. These results could have important implications for (i) the structure and efficiency of incremental learning in biological and artificial agents, (ii) the comparison of IB-optimal observation channels in statistical decision problems, and (iii) the IB theory of deep neural networks.
Collapse
Affiliation(s)
- Hippolyte Charvin
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield AL10 9AB, UK; (N.C.V.); (D.P.)
| | | | | |
Collapse
|
6
|
Manookin MB, Rieke F. Two Sides of the Same Coin: Efficient and Predictive Neural Coding. Annu Rev Vis Sci 2023; 9:293-311. [PMID: 37220331 DOI: 10.1146/annurev-vision-112122-020941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Some visual properties are consistent across a wide range of environments, while other properties are more labile. The efficient coding hypothesis states that many of these regularities in the environment can be discarded from neural representations, thus allocating more of the brain's dynamic range to properties that are likely to vary. This paradigm is less clear about how the visual system prioritizes different pieces of information that vary across visual environments. One solution is to prioritize information that can be used to predict future events, particularly those that guide behavior. The relationship between the efficient coding and future prediction paradigms is an area of active investigation. In this review, we argue that these paradigms are complementary and often act on distinct components of the visual input. We also discuss how normative approaches to efficient coding and future prediction can be integrated.
Collapse
Affiliation(s)
- Michael B Manookin
- Department of Ophthalmology, University of Washington, Seattle, Washington, USA;
- Vision Science Center, University of Washington, Seattle, Washington, USA
- Karalis Johnson Retina Center, University of Washington, Seattle, Washington, USA
| | - Fred Rieke
- Department of Physiology and Biophysics, University of Washington, Seattle, Washington, USA;
- Vision Science Center, University of Washington, Seattle, Washington, USA
| |
Collapse
|
7
|
Sun M, Huang W, Zhang H, Shi Y, Wang J, Gong Q, Wang X. Temporal contexts for motion tracking in ultrasound sequences with information bottleneck. Med Phys 2023; 50:5553-5567. [PMID: 36866782 DOI: 10.1002/mp.16339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/13/2023] [Accepted: 02/18/2023] [Indexed: 03/04/2023] Open
Abstract
BACKGROUND Recently, deep convolutional neural networks (CNNs) have been widely adopted for ultrasound sequence tracking and shown to perform satisfactorily. However, existing trackers ignore the rich temporal contexts that exists between consecutive frames, making it difficult for these trackers to perceive information about the motion of the target. PURPOSE In this paper, we propose a sophisticated method to fully utilize temporal contexts for ultrasound sequences tracking with information bottleneck. This method determines the temporal contexts between consecutive frames to perform both feature extraction and similarity graph refinement, and information bottleneck is integrated into the feature refinement process. METHODS The proposed tracker combined three models. First, online temporal adaptive convolutional neural network (TAdaCNN) is proposed to focus on feature extraction and enhance spatial features using temporal information. Second, information bottleneck (IB) is incorporated to achieve more accurate target tracking by maximally limiting the amount of information in the network and discarding irrelevant information. Finally, we propose temporal adaptive transformer (TA-Trans) that efficiently encodes temporal knowledge by decoding it for similarity graph refinement. The tracker was trained on 2015 MICCAI Challenge on Liver Ultrasound Tracking (CLUST) dataset to evaluate the performance of the proposed method by calculating the tracking error (TE) between the predicted landmarks and the ground truth landmarks for each frame. The experimental results are compared with 13 state-of-the-art methods, and ablation studies are conducted. RESULTS On CLUST 2015 dataset, our proposed model achieves a mean TE of 0.81 ± 0.74 mm and a maximum TE of 1.93 mm for 85 point-landmarks across 39 ultrasound sequences in the 2D sequences. Tracking speed ranged from 41 to 63 frames per second (fps). CONCLUSIONS This study demonstrates a new integrated workflow for ultrasound sequences motion tracking. The results show that the model has excellent accuracy and robustness. Reliable and accurate motion estimation is provided for applications requiring real-time motion estimation in the context of ultrasound-guided radiation therapy.
Collapse
Affiliation(s)
- Mengxue Sun
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Wenhui Huang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Huili Zhang
- Shandong Innovation and Development Research Institute, Jinan, China
| | - Yunfeng Shi
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiale Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Qingtao Gong
- Ulsan Ship and Ocean College, Ludong University, Yantai, China
| | - Xiaoyan Wang
- Department of Urology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
8
|
Cui Z, Wu Y, Zhang QH, Wang SG, He Y, Huang DS. MV-CVIB: a microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer. Front Microbiol 2023; 14:1238199. [PMID: 37675425 PMCID: PMC10477591 DOI: 10.3389/fmicb.2023.1238199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 08/02/2023] [Indexed: 09/08/2023] Open
Abstract
Introduction Imbalances in gut microbes have been implied in many human diseases, including colorectal cancer (CRC), inflammatory bowel disease, type 2 diabetes, obesity, autism, and Alzheimer's disease. Compared with other human diseases, CRC is a gastrointestinal malignancy with high mortality and a high probability of metastasis. However, current studies mainly focus on the prediction of colorectal cancer while neglecting the more serious malignancy of metastatic colorectal cancer (mCRC). In addition, high dimensionality and small samples lead to the complexity of gut microbial data, which increases the difficulty of traditional machine learning models. Methods To address these challenges, we collected and processed 16S rRNA data and calculated abundance data from patients with non-metastatic colorectal cancer (non-mCRC) and mCRC. Different from the traditional health-disease classification strategy, we adopted a novel disease-disease classification strategy and proposed a microbiome-based multi-view convolutional variational information bottleneck (MV-CVIB). Results The experimental results show that MV-CVIB can effectively predict mCRC. This model can achieve AUC values above 0.9 compared to other state-of-the-art models. Not only that, MV-CVIB also achieved satisfactory predictive performance on multiple published CRC gut microbiome datasets. Discussion Finally, multiple gut microbiota analyses were used to elucidate communities and differences between mCRC and non-mCRC, and the metastatic properties of CRC were assessed by patient age and microbiota expression.
Collapse
Affiliation(s)
- Zhen Cui
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Yan Wu
- College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Qin-Hu Zhang
- EIT Institute for Advanced Study, Ningbo, Zhejiang, China
| | - Si-Guo Wang
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Ying He
- Institute of Machine Learning and Systems Biology, College of Electronics and Information Engineering, Tongji University, Shanghai, China
| | | |
Collapse
|
9
|
Lyu Z, Aminian G, Rodrigues MRD. On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches. Entropy (Basel) 2023; 25:1063. [PMID: 37510010 PMCID: PMC10377965 DOI: 10.3390/e25071063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023]
Abstract
It is well-known that a neural network learning process-along with its connections to fitting, compression, and generalization-is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases.
Collapse
Affiliation(s)
- Zhaoyan Lyu
- Department of Electronic and Electrical Engineering, University College London, Gower St., London WC1E 6BT, UK
| | - Gholamali Aminian
- The Alan Turing Institute, British Library, 96 Euston Rd., London NW1 2DB, UK
| | - Miguel R D Rodrigues
- Department of Electronic and Electrical Engineering, University College London, Gower St., London WC1E 6BT, UK
| |
Collapse
|
10
|
Beck E, Bockelmann C, Dekorsy A. Semantic Information Recovery in Wireless Networks. Sensors (Basel) 2023; 23:6347. [PMID: 37514641 PMCID: PMC10384469 DOI: 10.3390/s23146347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/01/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
Motivated by the recent success of Machine Learning (ML) tools in wireless communications, the idea of semantic communication by Weaver from 1949 has gained attention. It breaks with Shannon's classic design paradigm by aiming to transmit the meaning of a message, i.e., semantics, rather than its exact version and, thus, enables savings in information rate. In this work, we extend the fundamental approach from Basu et al. for modeling semantics to the complete communications Markov chain. Thus, we model semantics by means of hidden random variables and define the semantic communication task as the data-reduced and reliable transmission of messages over a communication channel such that semantics is best preserved. We consider this task as an end-to-end Information Bottleneck problem, enabling compression while preserving relevant information. As a solution approach, we propose the ML-based semantic communication system SINFONY and use it for a distributed multipoint scenario; SINFONY communicates the meaning behind multiple messages that are observed at different senders to a single receiver for semantic recovery. We analyze SINFONY by processing images as message examples. Numerical results reveal a tremendous rate-normalized SNR shift up to 20 dB compared to classically designed communication systems.
Collapse
Affiliation(s)
- Edgar Beck
- Department of Communications Engineering, University of Bremen, 28359 Bremen, Germany
| | - Carsten Bockelmann
- Department of Communications Engineering, University of Bremen, 28359 Bremen, Germany
| | - Armin Dekorsy
- Department of Communications Engineering, University of Bremen, 28359 Bremen, Germany
| |
Collapse
|
11
|
Möller M, Polani D. Emergence of common concepts, symmetries and conformity in agent groups-an information-theoretic model. Interface Focus 2023; 13:20230006. [PMID: 37065261 PMCID: PMC10102731 DOI: 10.1098/rsfs.2023.0006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 03/17/2023] [Indexed: 04/18/2023] Open
Abstract
The paper studies principles behind structured, especially symmetric, representations through enforced inter-agent conformity. For this, we consider agents in a simple environment who extract individual representations of this environment through an information maximization principle. The representations obtained by different agents differ in general to some extent from each other. This gives rise to ambiguities in how the environment is represented by the different agents. Using a variant of the information bottleneck principle, we extract a 'common conceptualization' of the world for this group of agents. It turns out that the common conceptualization appears to capture much higher regularities or symmetries of the environment than the individual representations. We further formalize the notion of identifying symmetries in the environment both with respect to 'extrinsic' (birds-eye) operations on the environment as well as with respect to 'intrinsic' operations, i.e. subjective operations corresponding to the reconfiguration of the agent's embodiment. Remarkably, using the latter formalism, one can re-wire an agent to conform to the highly symmetric common conceptualization to a much higher degree than an unrefined agent; and that, without having to re-optimize the agent from scratch. In other words, one can 're-educate' an agent to conform to the de-individualized 'concept' of the agent group with comparatively little effort.
Collapse
Affiliation(s)
- Marco Möller
- Theory of Complex Systems Group, Institute of Solid State Physics, Technical University of Darmstadt, Germany
- Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire, Hatfield, UK
| | - Daniel Polani
- Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire, Hatfield, UK
| |
Collapse
|
12
|
Ullmann D, Taran O, Voloshynovskiy S. Multivariate Time Series Information Bottleneck. Entropy (Basel) 2023; 25:e25050831. [PMID: 37238586 DOI: 10.3390/e25050831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/10/2023] [Accepted: 05/17/2023] [Indexed: 05/28/2023]
Abstract
Time series (TS) and multiple time series (MTS) predictions have historically paved the way for distinct families of deep learning models. The temporal dimension, distinguished by its evolutionary sequential aspect, is usually modeled by decomposition into the trio of "trend, seasonality, noise", by attempts to copy the functioning of human synapses, and more recently, by transformer models with self-attention on the temporal dimension. These models may find applications in finance and e-commerce, where any increase in performance of less than 1% has large monetary repercussions, they also have potential applications in natural language processing (NLP), medicine, and physics. To the best of our knowledge, the information bottleneck (IB) framework has not received significant attention in the context of TS or MTS analyses. One can demonstrate that a compression of the temporal dimension is key in the context of MTS. We propose a new approach with partial convolution, where a time sequence is encoded into a two-dimensional representation resembling images. Accordingly, we use the recent advances made in image extension to predict an unseen part of an image from a given one. We show that our model compares well with traditional TS models, has information-theoretical foundations, and can be easily extended to more dimensions than only time and space. An evaluation of our multiple time series-information bottleneck (MTS-IB) model proves its efficiency in electricity production, road traffic, and astronomical data representing solar activity, as recorded by NASA's interface region imaging spectrograph (IRIS) satellite.
Collapse
Affiliation(s)
- Denis Ullmann
- Faculty of Science, University of Geneva, CUI, 1227 Carouge, Switzerland
| | - Olga Taran
- Faculty of Science, University of Geneva, CUI, 1227 Carouge, Switzerland
| | | |
Collapse
|
13
|
Manh HN. Rethinking Feature Generalization in Vacant Space Detection. Sensors (Basel) 2023; 23:4776. [PMID: 37430688 DOI: 10.3390/s23104776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/12/2023] [Accepted: 05/12/2023] [Indexed: 07/12/2023]
Abstract
Vacant space detection is critical in modern parking lots. However, deploying a detection model as a service is not an easy task. As the camera in a new parking is set up at different heights or viewing angles from the original parking lot where the training data are collected, the performance of the vacant space detector could be degraded. Therefore, in this paper, we proposed a method to learn generalized features so that the detector can work better in different environments. In detail, the features are suitable for a vacant detection task and robust to environmental change. We use a reparameterization process to model the variance from the environment. In addition, a variational information bottleneck is used to ensure the learned feature focus on only the appearance of a car in a specific parking space. Experimental results show that performances on a new parking lot increase significantly when only data from source parking are used in the training phase.
Collapse
Affiliation(s)
- Hung-Nguyen Manh
- Faculty of Electrical and Electronics Engineering, HCMC University of Technology and Education, Ho Chi Minh City 7000, Vietnam
| |
Collapse
|
14
|
Baumgartner T, Klatt S, Donath L. Revealing the Mutual Information between Body-Worn Sensors and Metabolic Cost in Running. Sensors (Basel) 2023; 23:1756. [PMID: 36850354 PMCID: PMC9959695 DOI: 10.3390/s23041756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/01/2023] [Accepted: 02/02/2023] [Indexed: 06/18/2023]
Abstract
Running power is a popular measure to gauge objective intensity. It has recently been shown, though, that foot-worn sensors alone cannot reflect variations in the exerted energy that stems from changes in the running economy. In order to support long-term improvement in running, these changes need to be taken into account. We propose leveraging the presence of two additional sensors worn by the most ambitious recreational runners for improved measurement: a watch and a heart rate chest strap. Using these accelerometers, which are already present and distributed over the athlete's body, carries more information about metabolic demand than a single foot-worn sensor. In this work, we demonstrate the mutual information between acceleration data and the metabolic demand of running by leveraging the information bottleneck of a constrained convolutional neural network. We perform lab measurements on 29 ambitious recreational runners (age = 28 ± 7 years, weekly running distance = 50 ± 25 km, V˙O2max = 60.3 ± 7.4 mL · min-1·kg-1). We show that information about the metabolic demand of running is contained in kinetic data. Additionally, we prove that the combination of three sensors (foot, torso, and lower arm) carries significantly more information than a single foot-worn sensor. We advocate for the development of running power systems that incorporate the sensors in watches and chest straps to improve the validity of running power and, thereby, long-term training planning.
Collapse
Affiliation(s)
- Tobias Baumgartner
- Institute of Exercise Training and Sport Informatics, Department of Cognitive and Team/Racket Sport Research, German Sport University Cologne, 50933 Cologne, Germany
| | - Stefanie Klatt
- Institute of Exercise Training and Sport Informatics, Department of Cognitive and Team/Racket Sport Research, German Sport University Cologne, 50933 Cologne, Germany
- School of Sport and Health Sciences, University of Brighton, Eastbourne BN20 7SR, UK
| | - Lars Donath
- Department of Intervention Research in Exercise Training, German Sport University Cologne, 50933 Cologne, Germany
| |
Collapse
|
15
|
Deng B, Jia K. Counterfactual Supervision-Based Information Bottleneck for Out-of-Distribution Generalization. Entropy (Basel) 2023; 25:193. [PMID: 36832560 PMCID: PMC9955031 DOI: 10.3390/e25020193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/14/2022] [Accepted: 01/16/2023] [Indexed: 06/18/2023]
Abstract
Learning invariant (causal) features for out-of-distribution (OOD) generalization have attracted extensive attention recently, and among the proposals, invariant risk minimization (IRM) is a notable solution. In spite of its theoretical promise for linear regression, the challenges of using IRM in linear classification problems remain. By introducing the information bottleneck (IB) principle into the learning of IRM, the IB-IRM approach has demonstrated its power to solve these challenges. In this paper, we further improve IB-IRM from two aspects. First, we show that the key assumption of support overlap of invariant features used in IB-IRM guarantees OOD generalization, and it is still possible to achieve the optimal solution without this assumption. Second, we illustrate two failure modes where IB-IRM (and IRM) could fail in learning the invariant features, and to address such failures, we propose a Counterfactual Supervision-based Information Bottleneck (CSIB) learning algorithm that recovers the invariant features. By requiring counterfactual inference, CSIB works even when accessing data from a single environment. Empirical experiments on several datasets verify our theoretical results.
Collapse
Affiliation(s)
| | - Kui Jia
- Correspondence: (B.D.); (K.J.)
| |
Collapse
|
16
|
Xiang G, Dian S, Du S, Lv Z. Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation. Sensors (Basel) 2023; 23:762. [PMID: 36679561 PMCID: PMC9864208 DOI: 10.3390/s23020762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 06/17/2023]
Abstract
Deep Reinforcement Learning (DRL) algorithms have been widely studied for sequential decision-making problems, and substantial progress has been achieved, especially in autonomous robotic skill learning. However, it is always difficult to deploy DRL methods in practical safety-critical robot systems, since the training and deployment environment gap always exists, and this issue would become increasingly crucial due to the ever-changing environment. Aiming at efficiently robotic skill transferring in a dynamic environment, we present a meta-reinforcement learning algorithm based on a variational information bottleneck. More specifically, during the meta-training stage, the variational information bottleneck first has been applied to infer the complete basic tasks for the whole task space, then the maximum entropy regularized reinforcement learning framework has been used to learn the basic skills consistent with that of basic tasks. Once the training stage is completed, all of the tasks in the task space can be obtained by a nonlinear combination of the basic tasks, thus, the according skills to accomplish the tasks can also be obtained by some way of a combination of the basic skills. Empirical results on several highly nonlinear, high-dimensional robotic locomotion tasks show that the proposed variational information bottleneck regularized deep reinforcement learning algorithm can improve sample efficiency by 200-5000 times on new tasks. Furthermore, the proposed algorithm achieves substantial asymptotic performance improvement. The results indicate that the proposed meta-reinforcement learning framework makes a significant step forward to deploy the DRL-based algorithm to practical robot systems.
Collapse
Affiliation(s)
- Guofei Xiang
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
- National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology, Baotou 014031, China
| | - Songyi Dian
- College of Electrical Engineering, Sichuan University, Chengdu 610065, China
| | - Shaofeng Du
- National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology, Baotou 014031, China
| | - Zhonghui Lv
- National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology, Baotou 014031, China
| |
Collapse
|
17
|
Abstract
In recent years, deep learning has shown very competitive performance in seizure detection. However, most of the currently used methods either convert electroencephalogram (EEG) signals into spectral images and employ 2D-CNNs, or split the one-dimensional (1D) features of EEG signals into many segments and employ 1D-CNNs. Moreover, these investigations are further constrained by the absence of consideration for temporal links between time series segments or spectrogram images. Therefore, we propose a Dual-Modal Information Bottleneck (Dual-modal IB) network for EEG seizure detection. The network extracts EEG features from both time series and spectrogram dimensions, allowing information from different modalities to pass through the Dual-modal IB, requiring the model to gather and condense the most pertinent information in each modality and only share what is necessary. Specifically, we make full use of the information shared between the two modality representations to obtain key information for seizure detection and to remove irrelevant feature between the two modalities. In addition, to explore the intrinsic temporal dependencies, we further introduce a bidirectional long-short-term memory (BiLSTM) for Dual-modal IB model, which is used to model the temporal relationships between the information after each modality is extracted by convolutional neural network (CNN). For CHB-MIT dataset, the proposed framework can achieve an average segment-based sensitivity of 97.42%, specificity of 99.32%, accuracy of 98.29%, and an average event-based sensitivity of 96.02%, false detection rate (FDR) of 0.70/h. We release our code at https://github.com/LLLL1021/Dual-modal-IB.
Collapse
Affiliation(s)
- Jiale Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, P. R. China
| | - Xinting Ge
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, P. R. China
| | - Yunfeng Shi
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, P. R. China
| | - Mengxue Sun
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, P. R. China
| | - Qingtao Gong
- Ulsan Ship and Ocean College, Ludong University, Yantai 264025, P. R. China
| | - Haipeng Wang
- Institute of Information Fusion, Naval, Aviation University, Yantai 264001, P. R. China
| | - Wenhui Huang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, P. R. China
| |
Collapse
|
18
|
Li X, Cui J, Song J, Jia M, Zou Z, Ding G, Zheng Y. Contextual Features and Information Bottleneck-Based Multi-Input Network for Breast Cancer Classification from Contrast-Enhanced Spectral Mammography. Diagnostics (Basel) 2022; 12:diagnostics12123133. [PMID: 36553140 PMCID: PMC9777091 DOI: 10.3390/diagnostics12123133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 12/02/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
In computer-aided diagnosis methods for breast cancer, deep learning has been shown to be an effective method to distinguish whether lesions are present in tissues. However, traditional methods only classify masses as benign or malignant, according to their presence or absence, without considering the contextual features between them and their adjacent tissues. Furthermore, for contrast-enhanced spectral mammography, the existing studies have only performed feature extraction on a single image per breast. In this paper, we propose a multi-input deep learning network for automatic breast cancer classification. Specifically, we simultaneously input four images of each breast with different feature information into the network. Then, we processed the feature maps in both horizontal and vertical directions, preserving the pixel-level contextual information within the neighborhood of the tumor during the pooling operation. Furthermore, we designed a novel loss function according to the information bottleneck theory to optimize our multi-input network and ensure that the common information in the multiple input images could be fully utilized. Our experiments on 488 images (256 benign and 232 malignant images) from 122 patients show that the method's accuracy, precision, sensitivity, specificity, and f1-score values are 0.8806, 0.8803, 0.8810, 0.8801, and 0.8806, respectively. The qualitative, quantitative, and ablation experiment results show that our method significantly improves the accuracy of breast cancer classification and reduces the false positive rate of diagnosis. It can reduce misdiagnosis rates and unnecessary biopsies, helping doctors determine accurate clinical diagnoses of breast cancer from multiple CESM images.
Collapse
Affiliation(s)
- Xinmeng Li
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Jia Cui
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
- Correspondence:
| | - Jingqi Song
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Mingyu Jia
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| | - Zhenxing Zou
- Department of Radiology, Yantai Yuhuangding Hospital, Yantai 264001, China
| | - Guocheng Ding
- Department of Radiology, Yantai Yuhuangding Hospital, Yantai 264001, China
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China
| |
Collapse
|
19
|
Gyevnar B, Dagan G, Haley C, Guo S, Mollica F. Communicative Efficiency or Iconic Learning: Do Acquisition and Communicative Pressures Interact to Shape Colour- Naming Systems? Entropy (Basel) 2022; 24:1542. [PMID: 36359632 PMCID: PMC9689105 DOI: 10.3390/e24111542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 10/19/2022] [Accepted: 10/21/2022] [Indexed: 06/16/2023]
Abstract
Language evolution is driven by pressures for simplicity and informativity; however, the timescale on which these pressures operate is debated. Over several generations, learners' biases for simple and informative systems can guide language evolution. Over repeated instances of dyadic communication, the principle of least effort dictates that speakers should bias systems towards simplicity and listeners towards informativity, similarly guiding language evolution. At the same time, it has been argued that learners only provide a bias for simplicity and, thus, language users must provide a bias for informativity. To what extent do languages evolve during acquisition versus use? We address this question by formally defining and investigating the communicative efficiency of acquisition trajectories. We illustrate our approach using colour-naming systems, replicating the communicative efficiency model of Zaslavsky, Kemp, Regier & Tishby (2018, PNAS) and the acquisition model of Beekhuizen & Stevenson (2018, Cogn. Sci.). We find that to the extent that language is iconic, learning alone is sufficient to shape language evolution. Regarding colour-naming systems specifically, we find that incorporating learning biases into communicative efficiency accounts might explain how speakers and listeners trade off communicative effort.
Collapse
|
20
|
Monsees T, Griebel O, Herrmann M, Wübben D, Dekorsy A, Wehn N. Minimum-Integer Computation Finite Alphabet Message Passing Decoder: From Theory to Decoder Implementations towards 1 Tb/s. Entropy (Basel) 2022; 24:1452. [PMID: 37420472 DOI: 10.3390/e24101452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 10/05/2022] [Accepted: 10/08/2022] [Indexed: 07/09/2023]
Abstract
In Message Passing (MP) decoding of Low-Density Parity Check (LDPC) codes, extrinsic information is exchanged between Check Nodes (CNs) and Variable Nodes (VNs). In a practical implementation, this information exchange is limited by quantization using only a small number of bits. In recent investigations, a novel class of Finite Alphabet Message Passing (FA-MP) decoders are designed to maximize the Mutual Information (MI) using only a small number of bits per message (e.g., 3 or 4 bits) with a communication performance close to high-precision Belief Propagation (BP) decoding. In contrast to the conventional BP decoder, operations are given as discrete-input discrete-output mappings which can be described by multidimensional LUTs (mLUTs). A common approach to avoid exponential increases in the size of mLUTs with the node degree is given by the sequential LUT (sLUT) design approach, i.e., by using a sequence of two-dimensional Lookup-Tables (LUTs) for the design, leading to a slight performance degradation. Recently, approaches such as Reconstruction-Computation-Quantization (RCQ) and Mutual Information-Maximizing Quantized Belief Propagation (MIM-QBP) have been proposed to avoid the complexity drawback of using mLUTs by using pre-designed functions that require calculations over a computational domain. It has been shown that these calculations are able to represent the mLUT mapping exactly by executing computations with infinite precision over real numbers. Based on the framework of MIM-QBP and RCQ, the Minimum-Integer Computation (MIC) decoder design generates low-bit integer computations that are derived from the Log-Likelihood Ratio (LLR) separation property of the information maximizing quantizer to replace the mLUT mappings either exactly or approximately. We derive a novel criterion for the bit resolution that is required to represent the mLUT mappings exactly. Furthermore, we show that our MIC decoder has exactly the communication performance of the corresponding mLUT decoder, but with much lower implementation complexity. We also perform an objective comparison between the state-of-the-art Min-Sum (MS) and the FA-MP decoder implementations for throughput towards 1 Tb/s in a state-of-the-art 28 nm Fully-Depleted Silicon-on-Insulator (FD-SOI) technology. Furthermore, we demonstrate that our new MIC decoder implementation outperforms previous FA-MP decoders and MS decoders in terms of reduced routing complexity, area efficiency and energy efficiency.
Collapse
Affiliation(s)
- Tobias Monsees
- Department of Communications Engineering, University of Bremen, 28359 Bremen, Germany
| | - Oliver Griebel
- Microelectronic Systems Design Research Group, TU Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Matthias Herrmann
- Microelectronic Systems Design Research Group, TU Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Dirk Wübben
- Department of Communications Engineering, University of Bremen, 28359 Bremen, Germany
| | - Armin Dekorsy
- Department of Communications Engineering, University of Bremen, 28359 Bremen, Germany
| | - Norbert Wehn
- Microelectronic Systems Design Research Group, TU Kaiserslautern, 67663 Kaiserslautern, Germany
| |
Collapse
|
21
|
Dikshtein M, Ordentlich O, Shamai (Shitz) S. The Double-Sided Information Bottleneck Function. Entropy (Basel) 2022; 24:1321. [PMID: 36141207 PMCID: PMC9497801 DOI: 10.3390/e24091321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/11/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]
Abstract
A double-sided variant of the information bottleneck method is considered. Let (X,Y) be a bivariate source characterized by a joint pmf PXY. The problem is to find two independent channels PU|X and PV|Y (setting the Markovian structure U→X→Y→V), that maximize I(U;V) subject to constraints on the relevant mutual information expressions: I(U;X) and I(V;Y). For jointly Gaussian X and Y, we show that Gaussian channels are optimal in the low-SNR regime but not for general SNR. Similarly, it is shown that for a doubly symmetric binary source, binary symmetric channels are optimal when the correlation is low and are suboptimal for high correlations. We conjecture that Z and S channels are optimal when the correlation is 1 (i.e., X=Y) and provide supporting numerical evidence. Furthermore, we present a Blahut-Arimoto type alternating maximization algorithm and demonstrate its performance for a representative setting. This problem is closely related to the domain of biclustering.
Collapse
Affiliation(s)
- Michael Dikshtein
- Department of Electrical and Computer Engineering, Technion, Haifa 3200003, Israel
| | - Or Ordentlich
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | | |
Collapse
|
22
|
Parker AE, Dimitrov AG. Symmetry-Breaking Bifurcations of the Information Bottleneck and Related Problems. Entropy (Basel) 2022; 24:1231. [PMID: 36141117 PMCID: PMC9498195 DOI: 10.3390/e24091231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/22/2022] [Accepted: 08/29/2022] [Indexed: 06/16/2023]
Abstract
In this paper, we investigate the bifurcations of solutions to a class of degenerate constrained optimization problems. This study was motivated by the Information Bottleneck and Information Distortion problems, which have been used to successfully cluster data in many different applications. In the problems we discuss in this paper, the distortion function is not a linear function of the quantizer. This leads to a challenging annealing optimization problem, which we recast as a fixed-point dynamics problem of a gradient flow of a related dynamical system. The gradient system possesses an SN symmetry due to its invariance in relabeling representative classes. Its flow hence passes through a series of bifurcations with specific symmetry breaks. Here, we show that the dynamical system related to the Information Bottleneck problem has an additional spurious symmetry that requires more-challenging analysis of the symmetry-breaking bifurcation. For the Information Bottleneck, we determine that when bifurcations occur, they are only of pitchfork type, and we give conditions that determine the stability of the bifurcating branches. We relate the existence of subcritical bifurcations to the existence of first-order phase transitions in the corresponding distortion function as a function of the annealing parameter, and provide criteria with which to detect such transitions.
Collapse
Affiliation(s)
- Albert E. Parker
- Center for Biofilm Engineering, Department of Mathematical Sciences, Montana State University, Bozeman, MT 59717, USA
| | - Alexander G. Dimitrov
- Department of Mathematics and Statistics, Washington State University Vancouver, Vancouver, WA 98686, USA
| |
Collapse
|
23
|
Toledo A, Venezian E, Slonim N. Revisiting Sequential Information Bottleneck: New Implementation and Evaluation. Entropy (Basel) 2022; 24:1132. [PMID: 36010796 PMCID: PMC9407479 DOI: 10.3390/e24081132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/03/2022] [Accepted: 08/04/2022] [Indexed: 06/15/2023]
Abstract
We introduce a modern, optimized, and publicly available implementation of the sequential Information Bottleneck clustering algorithm, which strikes a highly competitive balance between clustering quality and speed. We describe a set of optimizations that make the algorithm computation more efficient, particularly for the common case of sparse data representation. The results are substantiated by an extensive evaluation that compares the algorithm to commonly used alternatives, focusing on the practically important use case of text clustering. The evaluation covers a range of publicly available benchmark datasets and a set of clustering setups employing modern word and sentence embeddings obtained by state-of-the-art neural models. The results show that in spite of using the more basic Term-Frequency representation, the proposed implementation provides a highly attractive trade-off between quality and speed that outperforms the alternatives considered. This new release facilitates the use of the algorithm in real-world applications of text clustering.
Collapse
|
24
|
You Y, Chen T, Wang Z, Shen Y. Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations. Proc Int Conf Web Search Data Min 2022; 2022:1300-1309. [PMID: 35647617 PMCID: PMC9130056 DOI: 10.1145/3488560.3498416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Self-supervision is recently surging at its new frontier of graph learning. It facilitates graph representations beneficial to downstream tasks; but its success could hinge on domain knowledge for handcraft or the often expensive trials and errors. Even its state-of-the-art representative, graph contrastive learning (GraphCL), is not completely free of those needs as GraphCL uses a prefabricated prior reflected by the ad-hoc manual selection of graph data augmentations. Our work aims at advancing GraphCL by answering the following questions: How to represent the space of graph augmented views? What principle can be relied upon to learn a prior in that space? And what framework can be constructed to learn the prior in tandem with contrastive learning? Accordingly, we have extended the prefabricated discrete prior in the augmentation set, to a learnable continuous prior in the parameter space of graph generators, assuming that graph priors per se, similar to the concept of image manifolds, can be learned by data generation. Furthermore, to form contrastive views without collapsing to trivial solutions due to the prior learnability, we have leveraged both principles of information minimization (InfoMin) and information bottleneck (InfoBN) to regularize the learned priors. Eventually, contrastive learning, InfoMin, and InfoBN are incorporated organically into one framework of bi-level optimization. Our principled and automated approach has proven to be competitive against the state-of-the-art graph self-supervision methods, including GraphCL, on benchmarks of small graphs; and shown even better generalizability on large-scale graphs, without resorting to human expertise or downstream validation. Our code is publicly released at https://github.com/Shen-Lab/GraphCL_Automated.
Collapse
|
25
|
Bauer M, Petkova MD, Gregor T, Wieschaus EF, Bialek W. Trading bits in the readout from a genetic network. Proc Natl Acad Sci U S A 2021; 118:e2109011118. [PMID: 34772813 DOI: 10.1073/pnas.2109011118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 11/18/2022] Open
Abstract
In the regulation of gene expression, information of relevance to the organism is represented by the concentrations of transcription factor molecules. To extract this information the cell must effectively "measure" these concentrations, but there are physical limits to the precision of these measurements. We use the gap gene network in the early fly embryo as an example of the tradeoff between the precision of concentration measurements and the transmission of relevant information. For thresholded measurements we find that lower thresholds are more important, and fine tuning is not required for near-optimal information transmission. We then consider general sensors, constrained only by a limit on their information capacity, and find that thresholded sensors can approach true information theoretic optima. The information theoretic approach allows us to identify the optimal sensor for the entire gap gene network and to argue that the physical limitations of sensing necessitate the observed multiplicity of enhancer elements, with sensitivities to combinations rather than single transcription factors.
Collapse
|
26
|
Abstract
In this study, the information bottleneck method is proposed as an optimisation method for steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI). The information bottleneck is an information-theoretic optimisation method for solving problems with a trade-off between preserving meaningful information and compression. Its main practical application in machine learning is in representation learning or feature extraction. In this study, we use the information bottleneck to find optimal classification rule for a BCI. This is a novel application for the information bottleneck. This approach is particularly suitable for BCIs since the information bottleneck optimises the amount of information transferred by the BCI. Steady-state visual evoked potential-based BCIs often classify targets using very simple rules like choosing the class corresponding to the largest feature value. We call this classifier the arg max classifier. It is unlikely that this approach is optimal, and in this study, we propose a classification method specifically designed to optimise the performance measure of BCIs. This approach gives an advantage over standard machine learning methods, which aim to optimise different measures. The performance of the proposed algorithm is tested on two publicly available datasets in offline experiments. We use the standard power spectral density analysis (PSDA) and canonical correlation analysis (CCA) feature extraction methods on one dataset and show that the current approach outperforms most of the related studies on this dataset. On the second dataset, we use the task-related component analysis (TRCA) method and demonstrate that the proposed method outperforms the standard argmax classification rule in terms of information transfer rate when using a small number of classes. To our knowledge, this is the first time the information bottleneck is used in the context of SSVEP-based BCIs. The approach is unique in the sense that optimisation is done over the space of classification functions. It potentially improves the performance of BCIs and makes it easier to calibrate the system for different subjects.
Collapse
Affiliation(s)
- Anti Ingel
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Raul Vicente
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
27
|
Tezuka T, Namekawa S. Information Bottleneck Analysis by a Conditional Mutual Information Bound. Entropy (Basel) 2021; 23:e23080974. [PMID: 34441114 PMCID: PMC8391358 DOI: 10.3390/e23080974] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/23/2021] [Accepted: 07/25/2021] [Indexed: 12/03/2022]
Abstract
Task-nuisance decomposition describes why the information bottleneck loss I(z;x)−βI(z;y) is a suitable objective for supervised learning. The true category y is predicted for input x using latent variables z. When n is a nuisance independent from y, I(z;n) can be decreased by reducing I(z;x) since the latter upper bounds the former. We extend this framework by demonstrating that conditional mutual information I(z;x|y) provides an alternative upper bound for I(z;n). This bound is applicable even if z is not a sufficient representation of x, that is, I(z;y)≠I(x;y). We used mutual information neural estimation (MINE) to estimate I(z;x|y). Experiments demonstrated that I(z;x|y) is smaller than I(z;x) for layers closer to the input, matching the claim that the former is a tighter bound than the latter. Because of this difference, the information plane differs when I(z;x|y) is used instead of I(z;x).
Collapse
Affiliation(s)
- Taro Tezuka
- Faculty of Library, Information and Media Science, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
- Correspondence:
| | - Shizuma Namekawa
- Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan;
| |
Collapse
|
28
|
Abstract
The information bottleneck (IB) framework, proposed in [...].
Collapse
Affiliation(s)
| | - Gernot Kubin
- Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 16c, 8010 Graz, Austria;
| |
Collapse
|
29
|
Asoodeh S, Calmon FP. Bottleneck Problems: An Information and Estimation-Theoretic View. Entropy (Basel) 2020; 22:E1325. [PMID: 33287090 PMCID: PMC7712227 DOI: 10.3390/e22111325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/17/2020] [Accepted: 11/17/2020] [Indexed: 06/12/2023]
Abstract
Information bottleneck (IB) and privacy funnel (PF) are two closely related optimization problems which have found applications in machine learning, design of privacy algorithms, capacity problems (e.g., Mrs. Gerber's Lemma), and strong data processing inequalities, among others. In this work, we first investigate the functional properties of IB and PF through a unified theoretical framework. We then connect them to three information-theoretic coding problems, namely hypothesis testing against independence, noisy source coding, and dependence dilution. Leveraging these connections, we prove a new cardinality bound on the auxiliary variable in IB, making its computation more tractable for discrete random variables. In the second part, we introduce a general family of optimization problems, termed "bottleneck problems", by replacing mutual information in IB and PF with other notions of mutual information, namely f-information and Arimoto's mutual information. We then argue that, unlike IB and PF, these problems lead to easily interpretable guarantees in a variety of inference tasks with statistical constraints on accuracy and privacy. While the underlying optimization problems are non-convex, we develop a technique to evaluate bottleneck problems in closed form by equivalently expressing them in terms of lower convex or upper concave envelope of certain functions. By applying this technique to a binary case, we derive closed form expressions for several bottleneck problems.
Collapse
Affiliation(s)
- Shahab Asoodeh
- School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA;
| | | |
Collapse
|
30
|
Geiger BC, Fischer IS. A Comparison of Variational Bounds for the Information Bottleneck Functional. Entropy (Basel) 2020; 22:e22111229. [PMID: 33286997 PMCID: PMC7712881 DOI: 10.3390/e22111229] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/19/2020] [Accepted: 10/20/2020] [Indexed: 11/18/2022]
Abstract
In this short note, we relate the variational bounds proposed in Alemi et al. (2017) and Fischer (2020) for the information bottleneck (IB) and the conditional entropy bottleneck (CEB) functional, respectively. Although the two functionals were shown to be equivalent, it was empirically observed that optimizing bounds on the CEB functional achieves better generalization performance and adversarial robustness than optimizing those on the IB functional. This work tries to shed light on this issue by showing that, in the most general setting, no ordering can be established between these variational bounds, while such an ordering can be enforced by restricting the feasible sets over which the optimizations take place. The absence of such an ordering in the general setup suggests that the variational bound on the CEB functional is either more amenable to optimization or a relevant cost function for optimization in its own regard, i.e., without justification from the IB or CEB functionals.
Collapse
|
31
|
Catenacci Volpi N, Polani D. Space Emerges from What We Know-Spatial Categorisations Induced by Information Constraints. Entropy (Basel) 2020; 22:e22101179. [PMID: 33286947 PMCID: PMC7597350 DOI: 10.3390/e22101179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/12/2020] [Accepted: 10/14/2020] [Indexed: 11/16/2022]
Abstract
Seeking goals carried out by agents with a level of competency requires an “understanding” of the structure of their world. While abstract formal descriptions of a world structure in terms of geometric axioms can be formulated in principle, it is not likely that this is the representation that is actually employed by biological organisms or that should be used by biologically plausible models. Instead, we operate by the assumption that biological organisms are constrained in their information processing capacities, which in the past has led to a number of insightful hypotheses and models for biologically plausible behaviour generation. Here we use this approach to study various types of spatial categorizations that emerge through such informational constraints imposed on embodied agents. We will see that geometrically-rich spatial representations emerge when agents employ a trade-off between the minimisation of the Shannon information used to describe locations within the environment and the reduction of the location error generated by the resulting approximate spatial description. In addition, agents do not always need to construct these representations from the ground up, but they can obtain them by refining less precise spatial descriptions constructed previously. Importantly, we find that these can be optimal at both steps of refinement, as guaranteed by the successive refinement principle from information theory. Finally, clusters induced by these spatial representations via the information bottleneck method are able to reflect the environment’s topology without relying on an explicit geometric description of the environment’s structure. Our findings suggest that the fundamental geometric notions possessed by natural agents do not need to be part of their a priori knowledge but could emerge as a byproduct of the pressure to process information parsimoniously.
Collapse
|
32
|
Abstract
Intuitively, one way to make classifiers more robust to their input is to have them depend less sensitively on their input. The Information Bottleneck (IB) tries to learn compressed representations of input that are still predictive. Scaling up IB approaches to large scale image classification tasks has proved difficult. We demonstrate that the Conditional Entropy Bottleneck (CEB) can not only scale up to large scale image classification tasks, but can additionally improve model robustness. CEB is an easy strategy to implement and works in tandem with data augmentation procedures. We report results of a large scale adversarial robustness study on CIFAR-10, as well as the ImageNet-C Common Corruptions Benchmark, ImageNet-A, and PGD attacks.
Collapse
Affiliation(s)
- Ian Fischer
- Google Research, Mountain View, CA 94043, USA;
| | | |
Collapse
|
33
|
Fischer I. The Conditional Entropy Bottleneck. Entropy (Basel) 2020; 22:E999. [PMID: 33286768 DOI: 10.3390/e22090999] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 08/25/2020] [Accepted: 08/28/2020] [Indexed: 11/17/2022]
Abstract
Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set. We hypothesize that these failures to robustly generalize are due to the learning systems retaining too much information about the training data. To test this hypothesis, we propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model. In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB), which is closely related to the Information Bottleneck (IB). We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets and robustness challenges. We find strong empirical evidence supporting our hypothesis that MNI models improve on these problems of robust generalization.
Collapse
|
34
|
Jónsson H, Cherubini G, Eleftheriou E. Convergence Behavior of DNNs with Mutual-Information-Based Regularization. Entropy (Basel) 2020; 22:E727. [PMID: 33286499 DOI: 10.3390/e22070727] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 06/26/2020] [Indexed: 11/17/2022]
Abstract
Information theory concepts are leveraged with the goal of better understanding and improving Deep Neural Networks (DNNs). The information plane of neural networks describes the behavior during training of the mutual information at various depths between input/output and hidden-layer variables. Previous analysis revealed that most of the training epochs are spent on compressing the input, in some networks where finiteness of the mutual information can be established. However, the estimation of mutual information is nontrivial for high-dimensional continuous random variables. Therefore, the computation of the mutual information for DNNs and its visualization on the information plane mostly focused on low-complexity fully connected networks. In fact, even the existence of the compression phase in complex DNNs has been questioned and viewed as an open problem. In this paper, we present the convergence of mutual information on the information plane for a high-dimensional VGG-16 Convolutional Neural Network (CNN) by resorting to Mutual Information Neural Estimation (MINE), thus confirming and extending the results obtained with low-dimensional fully connected networks. Furthermore, we demonstrate the benefits of regularizing a network, especially for a large number of training epochs, by adopting mutual information estimates as additional terms in the loss function characteristic of the network. Experimental results show that the regularization stabilizes the test accuracy and significantly reduces its variance.
Collapse
|
35
|
Parbhoo S, Wieser M, Wieczorek A, Roth V. Information Bottleneck for Estimating Treatment Effects with Systematically Missing Covariates. Entropy (Basel) 2020; 22:E389. [PMID: 33286163 PMCID: PMC7516862 DOI: 10.3390/e22040389] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 03/16/2020] [Accepted: 03/27/2020] [Indexed: 12/23/2022]
Abstract
Estimating the effects of an intervention from high-dimensional observational data is a challenging problem due to the existence of confounding. The task is often further complicated in healthcare applications where a set of observations may be entirely missing for certain patients at test time, thereby prohibiting accurate inference. In this paper, we address this issue using an approach based on the information bottleneck to reason about the effects of interventions. To this end, we first train an information bottleneck to perform a low-dimensional compression of covariates by explicitly considering the relevance of information for treatment effects. As a second step, we subsequently use the compressed covariates to perform a transfer of relevant information to cases where data are missing during testing. In doing so, we can reliably and accurately estimate treatment effects even in the absence of a full set of covariate information at test time. Our results on two causal inference benchmarks and a real application for treating sepsis show that our method achieves state-of-the-art performance, without compromising interpretability.
Collapse
Affiliation(s)
- Sonali Parbhoo
- Department of Mathematics and Computer Science, University of Basel, Basel CH 4051, Switzerland
| | - Mario Wieser
- Department of Mathematics and Computer Science, University of Basel, Basel CH 4051, Switzerland
| | - Aleksander Wieczorek
- Department of Mathematics and Computer Science, University of Basel, Basel CH 4051, Switzerland
| | - Volker Roth
- Department of Mathematics and Computer Science, University of Basel, Basel CH 4051, Switzerland
| |
Collapse
|
36
|
Uğur Y, Arvanitakis G, Zaidi A. Variational Information Bottleneck for Unsupervised Clustering: Deep Gaussian Mixture Embedding. Entropy (Basel) 2020; 22:e22020213. [PMID: 33285988 PMCID: PMC7516645 DOI: 10.3390/e22020213] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 02/04/2020] [Accepted: 02/09/2020] [Indexed: 11/16/2022]
Abstract
In this paper, we develop an unsupervised generative clustering framework that combines the variational information bottleneck and the Gaussian mixture model. Specifically, in our approach, we use the variational information bottleneck method and model the latent space as a mixture of Gaussians. We derive a bound on the cost function of our model that generalizes the Evidence Lower Bound (ELBO) and provide a variational inference type algorithm that allows computing it. In the algorithm, the coders’ mappings are parametrized using neural networks, and the bound is approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on real datasets are provided to support the efficiency of our method.
Collapse
Affiliation(s)
- Yiğit Uğur
- Laboratoire d’informatique Gaspard-Monge, Université Paris-Est, 77454 Champs-sur-Marne, France
- Mathematical and Algorithmic Sciences Lab, Paris Research Center, Huawei Technologies, 92100 Boulogne-Billancourt, France
- Correspondence: (Y.U.); (A.Z.)
| | - George Arvanitakis
- Mathematical and Algorithmic Sciences Lab, Paris Research Center, Huawei Technologies, 92100 Boulogne-Billancourt, France
| | - Abdellatif Zaidi
- Laboratoire d’informatique Gaspard-Monge, Université Paris-Est, 77454 Champs-sur-Marne, France
- Correspondence: (Y.U.); (A.Z.)
| |
Collapse
|
37
|
Painsky A, Feder M, Tishby N. Nonlinear Canonical Correlation Analysis:A Compressed Representation Approach. Entropy (Basel) 2020; 22:e22020208. [PMID: 33285982 PMCID: PMC7516638 DOI: 10.3390/e22020208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 02/09/2020] [Accepted: 02/10/2020] [Indexed: 06/01/2023]
Abstract
Canonical Correlation Analysis (CCA) is a linear representation learning method that seeks maximally correlated variables in multi-view data. Nonlinear CCA extends this notion to a broader family of transformations, which are more powerful in many real-world applications. Given the joint probability, the Alternating Conditional Expectation (ACE) algorithm provides an optimal solution to the nonlinear CCA problem. However, it suffers from limited performance and an increasing computational burden when only a finite number of samples is available. In this work, we introduce an information-theoretic compressed representation framework for the nonlinear CCA problem (CRCCA), which extends the classical ACE approach. Our suggested framework seeks compact representations of the data that allow a maximal level of correlation. This way, we control the trade-off between the flexibility and the complexity of the model. CRCCA provides theoretical bounds and optimality conditions, as we establish fundamental connections to rate-distortion theory, the information bottleneck and remote source coding. In addition, it allows a soft dimensionality reduction, as the compression level is determined by the mutual information between the original noisy data and the extracted signals. Finally, we introduce a simple implementation of the CRCCA framework, based on lattice quantization.
Collapse
Affiliation(s)
- Amichai Painsky
- The Industrial Engineering Department, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Meir Feder
- The School of Electrical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel;
| | - Naftali Tishby
- The School of Computer Science and Engineering and the Interdisciplinary Center for Neural Computation, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel;
| |
Collapse
|
38
|
Zaidi A, Estella-Aguerri I, Shamai (Shitz) S. On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views. Entropy (Basel) 2020; 22:E151. [PMID: 33285926 PMCID: PMC7516564 DOI: 10.3390/e22020151] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 01/18/2020] [Accepted: 01/21/2020] [Indexed: 11/16/2022]
Abstract
This tutorial paper focuses on the variants of the bottleneck problem taking an information theoretic perspective and discusses practical methods to solve it, as well as its connection to coding and learning aspects. The intimate connections of this setting to remote source-coding under logarithmic loss distortion measure, information combining, common reconstruction, the Wyner-Ahlswede-Korner problem, the efficiency of investment information, as well as, generalization, variational inference, representation learning, autoencoders, and others are highlighted. We discuss its extension to the distributed information bottleneck problem with emphasis on the Gaussian model and highlight the basic connections to the uplink Cloud Radio Access Networks (CRAN) with oblivious processing. For this model, the optimal trade-offs between relevance (i.e., information) and complexity (i.e., rates) in the discrete and vector Gaussian frameworks is determined. In the concluding outlook, some interesting problems are mentioned such as the characterization of the optimal inputs ("features") distributions under power limitations maximizing the "relevance" for the Gaussian information bottleneck, under "complexity" constraints.
Collapse
Affiliation(s)
- Abdellatif Zaidi
- Institut d’Électronique et d’Informatique Gaspard-Monge, Université Paris-Est, 77454 Champs-sur-Marne, France
- Mathematics and Algorithmic Sciences Lab, Paris Research Center, Huawei Technologies France, 92100 Boulogne-Billancourt, France;
| | - Iñaki Estella-Aguerri
- Mathematics and Algorithmic Sciences Lab, Paris Research Center, Huawei Technologies France, 92100 Boulogne-Billancourt, France;
| | | |
Collapse
|
39
|
Wieczorek A, Roth V. On the Difference between the Information Bottleneck and the Deep Information Bottleneck. Entropy (Basel) 2020; 22:e22020131. [PMID: 33285906 PMCID: PMC7516540 DOI: 10.3390/e22020131] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 01/08/2020] [Accepted: 01/16/2020] [Indexed: 11/17/2022]
Abstract
Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the assumptions needed for its derivation. The two assumed properties of the data, X and Y, and their latent representation T, take the form of two Markov chains T−X−Y and X−T−Y. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions P(X,Y,T). We, therefore, show how to circumvent this limitation by optimising a lower bound for the mutual information between T and Y: I(T;Y), for which only the latter Markov chain has to be satisfied. The mutual information I(T;Y) can be split into two non-negative parts. The first part is the lower bound for I(T;Y), which is optimised in deep variational information bottleneck (DVIB) and cognate models in practice. The second part consists of two terms that measure how much the former requirement T−X−Y is violated. Finally, we propose interpreting the family of information bottleneck models as directed graphical models, and show that in this framework, the original and deep information bottlenecks are special cases of a fundamental IB model.
Collapse
|
40
|
Rodríguez Gálvez B, Thobaben R, Skoglund M. The Convex Information Bottleneck Lagrangian. Entropy (Basel) 2020; 22:e22010098. [PMID: 33285873 PMCID: PMC7516537 DOI: 10.3390/e22010098] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 01/03/2020] [Accepted: 01/08/2020] [Indexed: 12/05/2022]
Abstract
The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations T of some random variable X for the task of predicting Y. It is defined as a constrained optimization problem that maximizes the information the representation has about the task, I(T;Y), while ensuring that a certain level of compression r is achieved (i.e., I(X;T)≤r). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., LIB(T;β)=I(T;Y)−βI(X;T)) for many values of β∈[0,1]. Then, the curve of maximal I(T;Y) for a given I(X;T) is drawn and a representation with the desired predictability and compression is selected. It is known when Y is a deterministic function of X, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: Lsq−IB(T;βsq)=I(T;Y)−βsqI(X;T)2. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one mapping between the Lagrange multiplier and the desired compression rate r for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization.
Collapse
|
41
|
Franzese G, Visintin M. Probabilistic Ensemble of Deep Information Networks. Entropy (Basel) 2020; 22:e22010100. [PMID: 33285874 PMCID: PMC7516404 DOI: 10.3390/e22010100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 01/10/2020] [Accepted: 01/13/2020] [Indexed: 11/16/2022]
Abstract
We describe a classifier made of an ensemble of decision trees, designed using information theory concepts. In contrast to algorithms C4.5 or ID3, the tree is built from the leaves instead of the root. Each tree is made of nodes trained independently of the others, to minimize a local cost function (information bottleneck). The trained tree outputs the estimated probabilities of the classes given the input datum, and the outputs of many trees are combined to decide the class. We show that the system is able to provide results comparable to those of the tree classifier in terms of accuracy, while it shows many advantages in terms of modularity, reduced complexity, and memory requirements.
Collapse
|
42
|
Wu T, Fischer I, Chuang IL, Tegmark M. Learnability for the Information Bottleneck. Entropy (Basel) 2019; 21:924. [PMCID: PMC7514257 DOI: 10.3390/e21100924] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 09/12/2019] [Indexed: 06/12/2023]
Abstract
The Information Bottleneck (IB) method provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective I(X;Z)−βI(Y;Z) employs a Lagrange multiplier β to tune this trade-off. However, in practice, not only is β chosen empirically without theoretical guidance, there is also a lack of theoretical understanding between β, learnability, the intrinsic nature of the dataset and model capacity. In this paper, we show that if β is improperly chosen, learning cannot happen—the trivial representation P(Z|X)=P(Z) becomes the global minimum of the IB objective. We show how this can be avoided, by identifying a sharp phase transition between the unlearnable and the learnable which arises as β is varied. This phase transition defines the concept of IB-Learnability. We prove several sufficient conditions for IB-Learnability, which provides theoretical guidance for choosing a good β. We further show that IB-learnability is determined by the largest confident, typical and imbalanced subset of the examples (the conspicuous subset), and discuss its relation with model capacity. We give practical algorithms to estimate the minimum β for a given dataset. We also empirically demonstrate our theoretical conditions with analyses of synthetic datasets, MNIST and CIFAR10.
Collapse
Affiliation(s)
- Tailin Wu
- Department of Physics, MIT, 77 Massachusetts Ave, Cambridge, MA 02139, USA; (I.L.C.); (M.T.)
| | - Ian Fischer
- Google Research, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA;
| | - Isaac L. Chuang
- Department of Physics, MIT, 77 Massachusetts Ave, Cambridge, MA 02139, USA; (I.L.C.); (M.T.)
| | - Max Tegmark
- Department of Physics, MIT, 77 Massachusetts Ave, Cambridge, MA 02139, USA; (I.L.C.); (M.T.)
| |
Collapse
|
43
|
Shalev Y, Ben-Gal I. Context Based Predictive Information. Entropy (Basel) 2019; 21:e21070645. [PMID: 33267359 PMCID: PMC7515138 DOI: 10.3390/e21070645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 06/20/2019] [Accepted: 06/25/2019] [Indexed: 06/12/2023]
Abstract
We propose a new algorithm called the context-based predictive information (CBPI) for estimating the predictive information (PI) between time series, by utilizing a lossy compression algorithm. The advantage of this approach over existing methods resides in the case of sparse predictive information (SPI) conditions, where the ratio between the number of informative sequences to uninformative sequences is small. It is shown that the CBPI achieves a better PI estimation than benchmark methods by ignoring uninformative sequences while improving explainability by identifying the informative sequences. We also provide an implementation of the CBPI algorithm on a real dataset of large banks' stock prices in the U.S. In the last part of this paper, we show how the CBPI algorithm is related to the well-known information bottleneck in its deterministic version.
Collapse
|
44
|
Hahn M, Futrell R. Estimating Predictive Rate-Distortion Curves via Neural Variational Inference. Entropy (Basel) 2019; 21:E640. [PMID: 33267354 DOI: 10.3390/e21070640] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/20/2019] [Accepted: 06/26/2019] [Indexed: 11/27/2022]
Abstract
The Predictive Rate–Distortion curve quantifies the trade-off between compressing information about the past of a stochastic process and predicting its future accurately. Existing estimation methods for this curve work by clustering finite sequences of observations or by utilizing analytically known causal states. Neither type of approach scales to processes such as natural languages, which have large alphabets and long dependencies, and where the causal states are not known analytically. We describe Neural Predictive Rate–Distortion (NPRD), an estimation method that scales to such processes, leveraging the universal approximation capabilities of neural networks. Taking only time series data as input, the method computes a variational bound on the Predictive Rate–Distortion curve. We validate the method on processes where Predictive Rate–Distortion is analytically known. As an application, we provide bounds on the Predictive Rate–Distortion of natural language, improving on bounds provided by clustering sequences. Based on the results, we argue that the Predictive Rate–Distortion curve is more useful than the usual notion of statistical complexity for characterizing highly complex processes such as natural language.
Collapse
|
45
|
Cheng H, Lian D, Gao S, Geng Y. Utilizing Information Bottleneck to Evaluate the Capability of Deep Neural Networks for Image Classification. Entropy (Basel) 2019; 21:E456. [PMID: 33267170 DOI: 10.3390/e21050456] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 04/12/2019] [Accepted: 04/28/2019] [Indexed: 11/16/2022]
Abstract
Inspired by the pioneering work of the information bottleneck (IB) principle for Deep Neural Networks' (DNNs) analysis, we thoroughly study the relationship among the model accuracy, I ( X ; T ) and I ( T ; Y ) , where I ( X ; T ) and I ( T ; Y ) are the mutual information of DNN's output T with input X and label Y. Then, we design an information plane-based framework to evaluate the capability of DNNs (including CNNs) for image classification. Instead of each hidden layer's output, our framework focuses on the model output T. We successfully apply our framework to many application scenarios arising in deep learning and image classification problems, such as image classification with unbalanced data distribution, model selection, and transfer learning. The experimental results verify the effectiveness of the information plane-based framework: Our framework may facilitate a quick model selection and determine the number of samples needed for each class in the unbalanced classification problem. Furthermore, the framework explains the efficiency of transfer learning in the deep learning area.
Collapse
|
46
|
Guo Y, Xu Q, Sbert M. IBVis: Interactive Visual Analytics for Information Bottleneck Based Trajectory Clustering. Entropy (Basel) 2018; 20:e20030159. [PMID: 33265250 PMCID: PMC7512675 DOI: 10.3390/e20030159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 02/27/2018] [Accepted: 02/28/2018] [Indexed: 11/16/2022]
Abstract
Analyzing trajectory data plays an important role in practical applications, and clustering is one of the most widely used techniques for this task. The clustering approach based on information bottleneck (IB) principle has shown its effectiveness for trajectory data, in which a predefined number of the clusters and an explicit distance measure between trajectories are not required. However, presenting directly the final results of IB clustering gives no clear idea of both trajectory data and clustering process. Visual analytics actually provides a powerful methodology to address this issue. In this paper, we present an interactive visual analytics prototype called IBVis to supply an expressive investigation of IB-based trajectory clustering. IBVis provides various views to graphically present the key components of IB and the current clustering results. Rich user interactions drive different views work together, so as to monitor and steer the clustering procedure and to refine the results. In this way, insights on how to make better use of IB for different featured trajectory data can be gained for users, leading to better analyzing and understanding trajectory data. The applicability of IBVis has been evidenced in usage scenarios. In addition, the conducted user study shows IBVis is well designed and helpful for users.
Collapse
Affiliation(s)
- Yuejun Guo
- Department of Informàtica i Matemàtica Aplicada, University of Girona, 17071 Girona, Spain
| | - Qing Xu
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
- Correspondence: (Q.X.); (M.S.); Tel.: +86-22-2740-6538 (Q.X.); +34-972-418419 (M.S.)
| | - Mateu Sbert
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
- Department of Informàtica i Matemàtica Aplicada, University of Girona, 17071 Girona, Spain
- Correspondence: (Q.X.); (M.S.); Tel.: +86-22-2740-6538 (Q.X.); +34-972-418419 (M.S.)
| |
Collapse
|
47
|
Abstract
Even the simplest of animals exhibit behavioral sequences with complex temporal dynamics. Prominent among the proposed organizing principles for these dynamics has been the idea of a hierarchy, wherein the movements an animal makes can be understood as a set of nested subclusters. Although this type of organization holds potential advantages in terms of motion control and neural circuitry, measurements demonstrating this for an animal's entire behavioral repertoire have been limited in scope and temporal complexity. Here, we use a recently developed unsupervised technique to discover and track the occurrence of all stereotyped behaviors performed by fruit flies moving in a shallow arena. Calculating the optimally predictive representation of the fly's future behaviors, we show that fly behavior exhibits multiple time scales and is organized into a hierarchical structure that is indicative of its underlying behavioral programs and its changing internal states.
Collapse
Affiliation(s)
- Gordon J Berman
- Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - William Bialek
- Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| | - Joshua W Shaevitz
- Joseph Henry Laboratories of Physics, Princeton University, Princeton, NJ 08544; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544
| |
Collapse
|
48
|
Abstract
Even the simplest of animals exhibit behavioral sequences with complex temporal dynamics. Prominent among the proposed organizing principles for these dynamics has been the idea of a hierarchy, wherein the movements an animal makes can be understood as a set of nested subclusters. Although this type of organization holds potential advantages in terms of motion control and neural circuitry, measurements demonstrating this for an animal's entire behavioral repertoire have been limited in scope and temporal complexity. Here, we use a recently developed unsupervised technique to discover and track the occurrence of all stereotyped behaviors performed by fruit flies moving in a shallow arena. Calculating the optimally predictive representation of the fly's future behaviors, we show that fly behavior exhibits multiple time scales and is organized into a hierarchical structure that is indicative of its underlying behavioral programs and its changing internal states.
Collapse
|
49
|
Abstract
Min-cut clustering, based on minimizing one of two heuristic cost functions proposed by Shi and Malik nearly a decade ago, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is, however, unclear if these heuristics can be derived from a more general principle, facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast-mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs drawn from a generative model designed to describe community structure, the optimal information-theoretic partition and the optimal min-cut partition are shown to be the same with high probability.
Collapse
Affiliation(s)
- Anil Raj
- Department of Applied Physics and Applied Mathematics, Columbia University, 200 S. W. Mudd Building, MC 4701, 500 W. 120th Street, New York, NY 10027, USA.
| | | |
Collapse
|
50
|
Abstract
Although it has been widely applied in identification of genes responsible for biomedically, economically, or even evolutionarily important complex and quantitative traits, traditional candidate gene approach is largely limited by its reliance on the priori knowledge about the physiological, biochemical or functional aspects of possible candidates. Such limitation results in a fatal information bottleneck, which has apparently become an obstacle for further applications of traditional candidate gene approach on many occasions. While the identification of candidate genes involved in genetic traits of specific interest remains a challenge, significant progress in this subject has been achieved in the last few years. Several strategies have been developed, or being developed, to break the barrier of information bottleneck. Recently, being a new developing method of candidate gene approach, digital candidate gene approach (DigiCGA) has emerged and been primarily applied to identify potential candidate genes in some studies. This review summarizes the progress, application software, online tools, and challenges related to this approach.
Collapse
Affiliation(s)
- Mengjin Zhu
- Key Laboratory of Agricultural Animal Genetics, Breeding, Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, PR China
| | | |
Collapse
|