1
|
Wolpert DH, Kipper J. Memory Systems, the Epistemic Arrow of Time, and the Second Law. Entropy (Basel) 2024; 26:170. [PMID: 38392425 PMCID: PMC10888154 DOI: 10.3390/e26020170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 02/12/2024] [Accepted: 02/14/2024] [Indexed: 02/24/2024]
Abstract
The epistemic arrow of time is the fact that our knowledge of the past seems to be both of a different kind and more detailed than our knowledge of the future. Just like with the other arrows of time, it has often been speculated that the epistemic arrow arises due to the second law of thermodynamics. In this paper, we investigate the epistemic arrow of time using a fully formal framework. We begin by defining a memory system as any physical system whose present state can provide information about the state of the external world at some time other than the present. We then identify two types of memory systems in our universe, along with an important special case of the first type, which we distinguish as a third type of memory system. We show that two of these types of memory systems are time-symmetric, able to provide knowledge about both the past and the future. However, the third type of memory systems exploits the second law of thermodynamics, at least in all of its instances in our universe that we are aware of. The result is that in our universe, this type of memory system only ever provides information about the past. We also argue that human memory is of this third type, completing the argument. We end by scrutinizing the basis of the second law itself. This uncovers a previously unappreciated formal problem for common arguments that try to derive the second law from the "Past Hypothesis", i.e., from the claim that the very early universe was in a state of extremely low entropy. Our analysis is indebted to prior work by one of us but expands and improves upon this work in several respects.
Collapse
Affiliation(s)
- David H Wolpert
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| | - Jens Kipper
- Philosophy Department, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
2
|
Tasnim F, Wolpert DH. Stochastic Thermodynamics of Multiple Co-Evolving Systems-Beyond Multipartite Processes. Entropy (Basel) 2023; 25:1078. [PMID: 37510025 PMCID: PMC10378096 DOI: 10.3390/e25071078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/14/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Many dynamical systems consist of multiple, co-evolving subsystems (i.e., they have multiple degrees of freedom). Often, the dynamics of one or more of these subsystems will not directly depend on the state of some other subsystems, resulting in a network of dependencies governing the dynamics. How does this dependency network affect the full system's thermodynamics? Prior studies on the stochastic thermodynamics of multipartite processes have addressed this question by assuming that, in addition to the constraints of the dependency network, only one subsystem is allowed to change state at a time. However, in many real systems, such as chemical reaction networks or electronic circuits, multiple subsystems can-or must-change state together. Here, we investigate the thermodynamics of such composite processes, in which multiple subsystems are allowed to change state simultaneously. We first present new, strictly positive lower bounds on entropy production in composite processes. We then present thermodynamic uncertainty relations for information flows in composite processes. We end with strengthened speed limits for composite processes.
Collapse
Affiliation(s)
- Farita Tasnim
- Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David H Wolpert
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Complexity Science Hub, Josefstadter Straße 39, 1080 Vienna, Austria
- Center for Bio-Social Complex Systems, Arizona State University, Tempe, AZ 85287, USA
- International Center for Theoretical Physics, 34151 Trieste, Italy
| |
Collapse
|
3
|
Wolpert DH. Strengthened second law for multi-dimensional systems coupled to multiple thermodynamic reservoirs. Philos Trans A Math Phys Eng Sci 2022; 380:20200428. [PMID: 35599569 PMCID: PMC9125225 DOI: 10.1098/rsta.2020.0428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The second law of thermodynamics can be formulated as a restriction on the evolution of the entropy of any system undergoing Markovian dynamics. Here I show that this form of the second law is strengthened for multi-dimensional, complex systems, coupled to multiple thermodynamic reservoirs, if we have a set of a priori constraints restricting how the dynamics of each coordinate can depend on the other coordinates. As an example, this strengthened second law (SSL) applies to complex systems composed of multiple physically separated, co-evolving subsystems, each identified as a coordinate of the overall system. In this example, the constraints concern how the dynamics of some subsystems are allowed to depend on the states of the other subsystems. Importantly, the SSL applies to such complex systems even if some of its subsystems can change state simultaneously, which is prohibited in a multipartite process. The SSL also strengthens previously derived bounds on how much work can be extracted from a system using feedback control, if the system is multi-dimensional. Importantly, the SSL does not require local detailed balance. So it potentially applies to complex systems ranging from interacting economic agents to co-evolving biological species. This article is part of the theme issue 'Emergent phenomena in complex physical and socio-technical systems: from cells to societies'.
Collapse
Affiliation(s)
- David H. Wolpert
- Santa Fe Institute, Santa Fe, NM, USA
- Complexity Science Hub, Vienna, Arizona State University, Tempe, AZ, USA
- International Center for Theoretical Physics, Italy
| |
Collapse
|
4
|
Abstract
The second law of thermodynamics can be formulated as a restriction on the evolution of the entropy of any system undergoing Markovian dynamics. Here I show that this form of the second law is strengthened for multi-dimensional, complex systems, coupled to multiple thermodynamic reservoirs, if we have a set of a priori constraints restricting how the dynamics of each coordinate can depend on the other coordinates. As an example, this strengthened second law (SSL) applies to complex systems composed of multiple physically separated, co-evolving subsystems, each identified as a coordinate of the overall system. In this example, the constraints concern how the dynamics of some subsystems are allowed to depend on the states of the other subsystems. Importantly, the SSL applies to such complex systems even if some of its subsystems can change state simultaneously, which is prohibited in a multipartite process. The SSL also strengthens previously derived bounds on how much work can be extracted from a system using feedback control, if the system is multi-dimensional. Importantly, the SSL does not require local detailed balance. So it potentially applies to complex systems ranging from interacting economic agents to co-evolving biological species. This article is part of the theme issue 'Emergent phenomena in complex physical and socio-technical systems: from cells to societies'.
Collapse
Affiliation(s)
- David H Wolpert
- Santa Fe Institute, Santa Fe, NM, USA
- Complexity Science Hub, Vienna, Arizona State University, Tempe, AZ, USA
- International Center for Theoretical Physics, Italy
| |
Collapse
|
5
|
Kolchinsky A, Wolpert DH. Dependence of integrated, instantaneous, and fluctuating entropy production on the initial state in quantum and classical processes. Phys Rev E 2021; 104:054107. [PMID: 34942730 DOI: 10.1103/physreve.104.054107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 09/28/2021] [Indexed: 11/07/2022]
Abstract
We consider the additional entropy production (EP) incurred by a fixed quantum or classical process on some initial state ρ, above the minimum EP incurred by the same process on any initial state. We show that this additional EP, which we term the "mismatch cost of ρ," has a universal information-theoretic form: it is given by the contraction of the relative entropy between ρ and the least-dissipative initial state φ over time. We derive versions of this result for integrated EP incurred over the course of a process, for trajectory-level fluctuating EP, and for instantaneous EP rate. We also show that mismatch cost for fluctuating EP obeys an integral fluctuation theorem. Our results demonstrate a fundamental relationship between thermodynamic irreversibility (generation of EP) and logical irreversibility (inability to know the initial state corresponding to a given final state). We use this relationship to derive quantitative bounds on the thermodynamics of quantum error correction and to propose a thermodynamically operationalized measure of the logical irreversibility of a quantum channel. Our results hold for both finite- and infinite-dimensional systems, and generalize beyond EP to many other thermodynamic costs, including nonadiabatic EP, free-energy loss, and entropy gain.
Collapse
Affiliation(s)
- Artemy Kolchinsky
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - David H Wolpert
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
6
|
Abstract
We consider the problem of driving a finite-state classical system from some initial distribution p to some final distribution p^{'} with vanishing entropy production (EP), under the constraint that the driving protocols can only use some limited set of energy functions E. Assuming no other constraints on the driving protocol, we derive a simple condition that guarantees that such a transformation can be carried out, which is stated in terms of the smallest probabilities in {p,p^{'}} and a graph-theoretic property defined in terms of E. Our results imply that a surprisingly small amount of control over the energy function is sufficient (in particular, any transformation p→p^{'} can be carried out as soon as one can control some one-dimensional parameter of the energy function, e.g., the energy of a single state). We also derive a lower bound on the EP under more general constraints on the transition rates, which is formulated in terms of a convex optimization problem.
Collapse
|
7
|
Wolpert DH. Uncertainty Relations and Fluctuation Theorems for Bayes Nets. Phys Rev Lett 2020; 125:200602. [PMID: 33258647 DOI: 10.1103/physrevlett.125.200602] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 06/30/2020] [Accepted: 09/11/2020] [Indexed: 05/10/2023]
Abstract
Recent research has considered the stochastic thermodynamics of multiple interacting systems, representing the overall system as a Bayes net. I derive fluctuation theorems governing the entropy production (EP) of arbitrary sets of the systems in such a Bayes net. I also derive "conditional" fluctuation theorems, governing the distribution of EP in one set of systems conditioned on the EP of a different set of systems. I then derive thermodynamic uncertainty relations relating the EP of the overall system to the precisions of probability currents within the individual systems.
Collapse
Affiliation(s)
- David H Wolpert
- Santa Fe Institute, Santa Fe, New Mexico Complexity Science Hub, Vienna Arizona State University, Tempe, Arizona 87501, USA
| |
Collapse
|
8
|
Shin J, Price MH, Wolpert DH, Shimao H, Tracey B, Kohler TA. Scale and information-processing thresholds in Holocene social evolution. Nat Commun 2020; 11:2394. [PMID: 32409638 PMCID: PMC7224170 DOI: 10.1038/s41467-020-16035-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 04/06/2020] [Indexed: 11/09/2022] Open
Abstract
Throughout the Holocene, societies developed additional layers of administration and more information-rich instruments for managing and recording transactions and events as they grew in population and territory. Yet, while such increases seem inevitable, they are not. Here we use the Seshat database to investigate the development of hundreds of polities, from multiple continents, over thousands of years. We find that sociopolitical development is dominated first by growth in polity scale, then by improvements in information processing and economic systems, and then by further increases in scale. We thus define a Scale Threshold for societies, beyond which growth in information processing becomes paramount, and an Information Threshold, which once crossed facilitates additional growth in scale. Polities diverge in socio-political features below the Information Threshold, but reconverge beyond it. We suggest an explanation for the evolutionary divergence between Old and New World polities based on phased growth in scale and information processing. We also suggest a mechanism to help explain social collapses with no evident external causes. The Seshat database has made it possible to reveal large-scale patterns in human cultural evolution. Here, Shin et al. investigate transitions in social complexity and find alternating thresholds of polity size and information processing required for further sociopolitical development.
Collapse
Affiliation(s)
- Jaeweon Shin
- Department of Mathematics, Rice University, 6100 Main St, Houston, TX, 77005, USA
| | | | - David H Wolpert
- Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM, 87501, USA. .,Center for Biosocial Complex Systems, Arizona State University, Tempe, AZ, 85281, USA.
| | - Hajime Shimao
- Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM, 87501, USA
| | - Brendan Tracey
- Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM, 87501, USA
| | - Timothy A Kohler
- Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM, 87501, USA. .,Department of Anthropology, Washington State University, Pullman, WA, 99164-4910, USA. .,Crow Canyon Archaeological Center, 23390 C R K, Cortez, CO, 81321, USA. .,Research Institute for Humanity and Nature, 457-4 Kamigamo Motoyama, Kita-ku, Kyoto, 603-8047, Japan.
| |
Collapse
|
9
|
Wolpert DH, Kolchinsky A, Owen JA. Publisher Correction: A space–time tradeoff for implementing a function with master equation dynamics. Nat Commun 2019; 10:2072. [PMID: 31043614 PMCID: PMC6494806 DOI: 10.1038/s41467-019-10162-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
10
|
Abstract
Information bottleneck (IB) is a technique for extracting information in one random variable X that is relevant for predicting another random variable Y. IB works by encoding X in a compressed “bottleneck” random variable M from which Y can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete X and Y with small state spaces, and continuous X and Y with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous X and Y, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets.
Collapse
Affiliation(s)
- Artemy Kolchinsky
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; (B.D.T.); (D.H.W.)
- Correspondence:
| | - Brendan D. Tracey
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; (B.D.T.); (D.H.W.)
- Department of Aeronautics & Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David H. Wolpert
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; (B.D.T.); (D.H.W.)
- Complexity Science Hub, 1080 Vienna, Austria
- Center for Bio-Social Complex Systems, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
11
|
Wolpert DH, Kolchinsky A, Owen JA. A space-time tradeoff for implementing a function with master equation dynamics. Nat Commun 2019; 10:1727. [PMID: 30988296 PMCID: PMC6465315 DOI: 10.1038/s41467-019-09542-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 03/15/2019] [Indexed: 11/09/2022] Open
Abstract
Master equations are commonly used to model the dynamics of physical systems, including systems that implement single-valued functions like a computer’s update step. However, many such functions cannot be implemented by any master equation, even approximately, which raises the question of how they can occur in the real world. Here we show how any function over some “visible” states can be implemented with master equation dynamics—if the dynamics exploits additional, “hidden” states at intermediate times. We also show that any master equation implementing a function can be decomposed into a sequence of “hidden” timesteps, demarcated by changes in what state-to-state transitions have nonzero probability. In many real-world situations there is a cost both for more hidden states and for more hidden timesteps. Accordingly, we derive a “space–time” tradeoff between the number of hidden states and the number of hidden timesteps needed to implement any given function. Deterministic maps from initial to final states can always be modelled using the master equation formalism, provided additional “hidden” states are available. Here, the authors demonstrate a tradeoff between the required number of such states and the number of required, suitably defined “hidden time steps”.
Collapse
Affiliation(s)
- David H Wolpert
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA. .,Arizona State University, Tempe, 85281, AZ, USA.
| | | | - Jeremy A Owen
- Physics of Living Systems Group, Department of Physics, Massachusetts Institute of Technology, 400 Tech Square, Cambridge, MA, 02139, USA
| |
Collapse
|
12
|
Abstract
Shannon information theory provides various measures of so-called syntactic information, which reflect the amount of statistical correlation between systems. By contrast, the concept of 'semantic information' refers to those correlations which carry significance or 'meaning' for a given system. Semantic information plays an important role in many fields, including biology, cognitive science and philosophy, and there has been a long-standing interest in formulating a broadly applicable and formal theory of semantic information. In this paper, we introduce such a theory. We define semantic information as the syntactic information that a physical system has about its environment which is causally necessary for the system to maintain its own existence. 'Causal necessity' is defined in terms of counter-factual interventions which scramble correlations between the system and its environment, while 'maintaining existence' is defined in terms of the system's ability to keep itself in a low entropy state. We also use recent results in non-equilibrium statistical physics to analyse semantic information from a thermodynamic point of view. Our framework is grounded in the intrinsic dynamics of a system coupled to an environment, and is applicable to any physical system, living or otherwise. It leads to formal definitions of several concepts that have been intuitively understood to be related to semantic information, including 'value of information', 'semantic content' and 'agency'.
Collapse
Affiliation(s)
| | - David H. Wolpert
- Santa Fe Institute, Santa Fe, NM 87501, USA
- Massachusetts Institute of Technology, Cambridge, MA, USA
- Arizona State University, Tempe, AZ, USA
| |
Collapse
|
13
|
|
14
|
Wolpert DH, Harré M, Olbrich E, Bertschinger N, Jost J. Hysteresis effects of changing the parameters of noncooperative games. Phys Rev E Stat Nonlin Soft Matter Phys 2012; 85:036102. [PMID: 22587144 DOI: 10.1103/physreve.85.036102] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2011] [Revised: 11/08/2011] [Indexed: 05/31/2023]
Abstract
We adapt the method used by Jaynes to derive the equilibria of statistical physics to instead derive equilibria of bounded rational game theory. We analyze the dependence of these equilibria on the parameters of the underlying game, focusing on hysteresis effects. In particular, we show that by gradually imposing individual-specific tax rates on the players of the game, and then gradually removing those taxes, the players move from a poor equilibrium to one that is better for all of them.
Collapse
Affiliation(s)
- David H Wolpert
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA.
| | | | | | | | | |
Collapse
|
15
|
Wolpert DH. Computational capabilities of physical systems. Phys Rev E Stat Nonlin Soft Matter Phys 2002; 65:016128. [PMID: 11800757 DOI: 10.1103/physreve.65.016128] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2000] [Revised: 12/18/2000] [Indexed: 11/07/2022]
Abstract
In this paper strong limits on the accuracy of real-world physical computation are established. To derive these results a non-Turing machine formulation of physical computation is used. First it is proven that there cannot be a physical computer C to which one can pose any and all computational tasks concerning the physical universe. Next it is proven that no physical computer C can correctly carry out every computational task in the subset of such tasks that could potentially be posed to C. This means in particular that there cannot be a physical computer that can be assured of correctly "processing information faster than the universe does." Because this result holds independent of how or if the computer is physically coupled to the rest of the universe, it also means that there cannot exist an infallible, general-purpose observation apparatus, nor an infallible, general-purpose control apparatus. These results do not rely on systems that are infinite, and/or nonclassical, and/or obey chaotic dynamics. They also hold even if one could use an infinitely fast, infinitely dense computer, with computational powers greater than that of a Turing machine (TM). After deriving these results analogs of the TM Halting theorem are derived for the novel kind of computer considered in this paper, as are results concerning the (im)possibility of certain kinds of error-correcting codes. In addition, an analog of algorithmic information complexity, "prediction complexity," is elaborated. A task-independent bound is derived on how much the prediction complexity of a computational task can differ for two different reference universal physical computers used to solve that task. This is analogous to the "encoding" bound governing how much the algorithm information complexity of a TM calculation can differ for two reference universal TMs. It is proven that either the Hamiltonian of our universe proscribes a certain type of computation, or prediction complexity is unique (unlike algorithmic information complexity). Finally, the implications of this analysis for the issue of whether the universe "is" a computer are briefly discussed.
Collapse
Affiliation(s)
- David H Wolpert
- NASA Ames Research Center, N269-1, Moffett Field, California 94035, USA.
| |
Collapse
|
16
|
|
17
|
Abstract
This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in Bayesian analysis) and training sets are averaged over (as in the conventional bias plus variance formula). Another additive correction casts conventional fixed-trainingset Bayesian analysis directly in terms of bias plus variance. Another correction is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point. Yet another correction can help explain the recent counterintuitive bias-variance decomposition of Friedman for zero-one loss. After presenting these corrections, this article discusses some other loss function-specific aspects of supervised learning. In particular, there is a discussion of the fact that if the loss function is a metric (e.g., zero-one loss), then there is bound on the change in generalization error accompanying changing the algorithm's guess from h1 to h2, a bound that depends only on h1 and h2 and not on the target. This article ends by presenting versions of the bias-plus-variance formula appropriate for logarithmic and quadratic scoring, and then all the additive corrections appropriate to those formulas. All the correction terms presented are a covariance, between the learning algorithm and the posterior distribution over targets. Accordingly, in the (very common) contexts in which those terms apply, there is not a “bias-variance trade-off” or a “bias-variance dilemma,” as one often hears. Rather there is a bias-variance-covariance trade-off.
Collapse
|
18
|
Wolpert DH, Wolf DR. Erratum: Estimating functions of probability distributions from a finite set of samples. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 1996; 54:6973. [PMID: 9965935 DOI: 10.1103/physreve.54.6973.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
19
|
Abstract
This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are “as many” targets (or priors over targets) for which A has lower expected OTS error than B as vice versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is “anti-cross-validation” (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small. Other implications for “membership queries” algorithms and “punting” algorithms are also discussed.
Collapse
Affiliation(s)
- David H. Wolpert
- The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| |
Collapse
|
20
|
Abstract
This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).
Collapse
Affiliation(s)
- David H. Wolpert
- The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM, 87501, USA
| |
Collapse
|
21
|
Wolpert DH, Wolf DR. Estimating functions of probability distributions from a finite set of samples. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics 1995; 52:6841-6854. [PMID: 9964199 DOI: 10.1103/physreve.52.6841] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
22
|
Korber BT, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci U S A 1993; 90:7176-80. [PMID: 8346232 PMCID: PMC47099 DOI: 10.1073/pnas.90.15.7176] [Citation(s) in RCA: 205] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
The V3 loop of the human immunodeficiency virus type 1 (HIV-1) envelope protein is a highly variable region that is both functionally and immunologically important. Using available amino acid sequences from the V3 region, we have used an information theoretic quantity called mutual information, a measure of covariation, to quantify dependence between mutations in the loop. Certain pairs of sites, including non-contiguous sites along the sequence, do not have independent mutations but display considerable, statistically significant, covarying mutations as measured by mutual information. For the pairs of sites with the highest mutual information, specific amino acids were identified that were highly predictive of amino acids in the linked site. The observed interdependence between variable sites may have implications for structural or functional relationships; separate experimental evidence indicates functional linkage between some of the pairs of sites with high mutual information. Further specific mutational studies of the V3 loop's role in determining viral phenotype are suggested by our analyses. Also, the implications of our results may be important to consider for V3 peptide vaccine design. The methods used here are generally applicable to the study of variable proteins.
Collapse
|
23
|
|
24
|
|