351
|
Nielsen JT, Eghbalnia HR, Nielsen NC. Chemical shift prediction for protein structure calculation and quality assessment using an optimally parameterized force field. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2012; 60:1-28. [PMID: 22293396 PMCID: PMC3270304 DOI: 10.1016/j.pnmrs.2011.05.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2010] [Accepted: 05/09/2011] [Indexed: 05/05/2023]
Abstract
The exquisite sensitivity of chemical shifts as reporters of structural information, and the ability to measure them routinely and accurately, gives great import to formulations that elucidate the structure-chemical-shift relationship. Here we present a new and highly accurate, precise, and robust formulation for the prediction of NMR chemical shifts from protein structures. Our approach, shAIC (shift prediction guided by Akaikes Information Criterion), capitalizes on mathematical ideas and an information-theoretic principle, to represent the functional form of the relationship between structure and chemical shift as a parsimonious sum of smooth analytical potentials which optimally takes into account short-, medium-, and long-range parameters in a nuclei-specific manner to capture potential chemical shift perturbations caused by distant nuclei. shAIC outperforms the state-of-the-art methods that use analytical formulations. Moreover, for structures derived by NMR or structures with novel folds, shAIC delivers better overall results; even when it is compared to sophisticated machine learning approaches. shAIC provides for a computationally lightweight implementation that is unimpeded by molecular size, making it an ideal for use as a force field.
Collapse
Affiliation(s)
- Jakob T. Nielsen
- Center for Insoluble Protein Structures (inSPIN), Interdisciplinary Nanoscience Center (iNANO) and Department of Chemistry, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Hamid R. Eghbalnia
- Department of Molecular and Cellular Physiology, University of Cincinnati, 231 Albert B. Sabin Way, Cincinnati, OH 45267-0576, United States
| | - Niels Chr. Nielsen
- Center for Insoluble Protein Structures (inSPIN), Interdisciplinary Nanoscience Center (iNANO) and Department of Chemistry, Aarhus University, DK-8000 Aarhus C, Denmark
| |
Collapse
|
352
|
|
353
|
Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behav Brain Sci 2011; 34:169-88; disuccsion 188-231. [DOI: 10.1017/s0140525x10003134] [Citation(s) in RCA: 272] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
AbstractThe prominence of Bayesian modeling of cognition has increased recently largely because of mathematical advances in specifying and deriving predictions from complex probabilistic models. Much of this research aims to demonstrate that cognitive behavior can be explained from rational principles alone, without recourse to psychological or neurological processes and representations. We note commonalities between this rational approach and other movements in psychology – namely, Behaviorism and evolutionary psychology – that set aside mechanistic explanations or make use of optimality assumptions. Through these comparisons, we identify a number of challenges that limit the rational program's potential contribution to psychological theory. Specifically, rational Bayesian models are significantly unconstrained, both because they are uninformed by a wide range of process-level data and because their assumptions about the environment are generally not grounded in empirical measurement. The psychological implications of most Bayesian models are also unclear. Bayesian inference itself is conceptually trivial, but strong assumptions are often embedded in the hypothesis sets and the approximation algorithms used to derive model predictions, without a clear delineation between psychological commitments and implementational details. Comparing multiple Bayesian models of the same task is rare, as is the realization that many Bayesian models recapitulate existing (mechanistic level) theories. Despite the expressive power of current Bayesian models, we argue they must be developed in conjunction with mechanistic considerations to offer substantive explanations of cognition. We lay out several means for such an integration, which take into account the representations on which Bayesian inference operates, as well as the algorithms and heuristics that carry it out. We argue this unification will better facilitate lasting contributions to psychological theory, avoiding the pitfalls that have plagued previous theoretical movements.
Collapse
|
354
|
Olmo JL, Romero JR, Ventura S. Using Ant Programming Guided by Grammar for Building Rule-Based Classifiers. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. PART B, CYBERNETICS : A PUBLICATION OF THE IEEE SYSTEMS, MAN, AND CYBERNETICS SOCIETY 2011; 41:1585-99. [PMID: 21724517 DOI: 10.1109/tsmcb.2011.2157681] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The extraction of comprehensible knowledge is one of the major challenges in many domains. In this paper, an ant programming (AP) framework, which is capable of mining classification rules easily comprehensible by humans, and, therefore, capable of supporting expert-domain decisions, is presented. The algorithm proposed, called grammar based ant programming (GBAP), is the first AP algorithm developed for the extraction of classification rules, and it is guided by a context-free grammar that ensures the creation of new valid individuals. To compute the transition probability of each available movement, this new model introduces the use of two complementary heuristic functions, instead of just one, as typical ant-based algorithms do. The selection of a consequent for each rule mined and the selection of the rules that make up the classifier are based on the use of a niching approach. The performance of GBAP is compared against other classification techniques on 18 varied data sets. Experimental results show that our approach produces comprehensible rules and competitive or better accuracy values than those achieved by the other classification algorithms compared with it.
Collapse
|
355
|
Kovacs T, Egginton R. On the analysis and design of software for reinforcement learning, with a survey of existing systems. Mach Learn 2011. [DOI: 10.1007/s10994-011-5237-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
356
|
Abstract
We discuss the no-free-lunch NFL theorem for supervised learning as a logical paradox--that is, as a counterintuitive result that is correctly proven from apparently incontestable assumptions. We show that the uniform prior that is used in the proof of the theorem has a number of unpalatable consequences besides the NFL theorem, and propose a simple definition of determination (by a learning set of given size) that casts additional suspicion on the utility of this assumption for the prior. Whereas others have suggested that the assumptions of the NFL theorem are not practically realistic, we show these assumptions to be at odds with supervised learning in principle. This analysis suggests a route toward the establishment of a more realistic prior probability for use in the extended Bayesian framework.
Collapse
Affiliation(s)
- Etienne Barnard
- Multilingual Speech Technologies Group, North-West University, Vanderbijlpark, South Africa.
| |
Collapse
|
357
|
Bengio Y, Delalleau O. On the Expressive Power of Deep Architectures. LECTURE NOTES IN COMPUTER SCIENCE 2011. [DOI: 10.1007/978-3-642-24412-4_3] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
358
|
Abstract
The contributions to this special issue on cognitive development collectively propose ways in which learning involves developing constraints that shape subsequent learning. A learning system must be constrained to learn efficiently, but some of these constraints are themselves learnable. To know how something will behave, a learner must know what kind of thing it is. Although this has led previous researchers to argue for domain-specific constraints that are tied to different kinds/domains, an exciting possibility is that kinds/domains themselves can be learned. General cognitive constraints, when combined with rich inputs, can establish domains, rather than these domains necessarily preexisting prior to learning. Knowledge is structured and richly differentiated, but its "skeleton" must not always be preestablished. Instead, the skeleton may be adapted to fit patterns of co-occurrence, task requirements, and goals. Finally, we argue that for models of development to demonstrate genuine cognitive novelty, it will be helpful for them to move beyond highly preprocessed and symbolic encodings that limit flexibility. We consider two physical models that learn to make tone discriminations. They are mechanistic models that preserve rich spatial, perceptual, dynamic, and concrete information, allowing them to form surprising new classes of hypotheses and encodings.
Collapse
Affiliation(s)
- Robert L Goldstone
- Department of Psychological and Brain Sciences, Indiana University Department of Psychology, University of Richmond
| | | |
Collapse
|
359
|
|
360
|
Espejo P, Ventura S, Herrera F. A Survey on the Application of Genetic Programming to Classification. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tsmcc.2009.2033566] [Citation(s) in RCA: 379] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
361
|
Lughofer E, Smith J, Tahir M, Caleb-Solly P, Eitzinger C, Sannen D, Nuttin M. Human–Machine Interaction Issues in Quality Control Based on Online Image Classification. ACTA ACUST UNITED AC 2009. [DOI: 10.1109/tsmca.2009.2025025] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
362
|
Oztop E. Sign-representation of Boolean functions using a small number of monomials. Neural Netw 2009; 22:938-48. [PMID: 19423284 DOI: 10.1016/j.neunet.2009.03.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2007] [Revised: 10/17/2008] [Accepted: 03/31/2009] [Indexed: 10/20/2022]
Abstract
This paper presents a deterministic algorithm that can construct a higher-order neuron representation for an arbitrary n-variable Boolean function with a fan-in less than 0.75 x 2(n), and provides related theoretical results. When the logic constants True and False are identified by +1 and -1, an n-variable Boolean function is identified by a unique dichotomy of the n-dimensional hypercube. With this equivalence, all n-variable Boolean functions can be uniquely represented by linear combinations of monomials, the products of input variables. A polynomial function whose sign matches the truth table of a given Boolean function is said to sign-represent that Boolean function. The artificial neural units that implement this sign-representation scheme are often called higher-order neurons or polynomial threshold units. This paper investigates the freedom provided by the sign-representation framework in terms of the fan-in of these artificial neural units. In particular, we look for sign-representations with a small number of monomials. Although there are methods developed for finding a reduced set of monomials to represent Boolean functions, there are no deterministic algorithms for computing non-trivial solutions with guarantees on the number of monomials in the found sign-representations. This work fills this gap by providing deterministic algorithms which are guaranteed to find solutions with fewer than 0.75 x 2(n) monomials for n-variable Boolean functions. Although the algorithms presented here are computationally costly, it is expected that several research directions can be spawned from the current study, such as reducing the 0.75 x 2(n) bound and devising efficient algorithms for finding sign-representations with a small number of monomials.
Collapse
Affiliation(s)
- Erhan Oztop
- JST-ICORP Computational Brain Project, 4-1-8 Honcho Kawaguchi, Saitama, Japan.
| |
Collapse
|
363
|
Rocke DM, Ideker T, Troyanskaya O, Quackenbush J, Dopazo J. Papers on normalization, variable selection, classification or clustering of microarray data. Bioinformatics 2009. [DOI: 10.1093/bioinformatics/btp038] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
364
|
Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 2009. [DOI: 10.1016/j.asoc.2008.01.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
365
|
Orriols-Puig A, Bernadó-Mansilla E. Evolutionary rule-based systems for imbalanced data sets. Soft comput 2008. [DOI: 10.1007/s00500-008-0319-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
366
|
|
367
|
Hamdi MS. MASACAD: A multi-agent approach to information customization for the purpose of academic advising of students. Appl Soft Comput 2007. [DOI: 10.1016/j.asoc.2006.02.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
368
|
Abstract
A common goal of computational neuroscience and of artificial intelligence research based on statistical learning algorithms is the discovery and understanding of computational principles that could explain what we consider adaptive intelligence, in animals as well as in machines. This chapter focuses on what is required for the learning of complex behaviors. We believe it involves the learning of highly varying functions, in a mathematical sense. We bring forward two types of arguments which convey the message that many currently popular machine learning approaches to learning flexible functions have fundamental limitations that render them inappropriate for learning highly varying functions. The first issue concerns the representation of such functions with what we call shallow model architectures. We discuss limitations of shallow architectures, such as so-called kernel machines, boosting algorithms, and one-hidden-layer artificial neural networks. The second issue is more focused and concerns kernel machines with a local kernel (the type used most often in practice) that act like a collection of template-matching units. We present mathematical results on such computational architectures showing that they have a limitation similar to those already proved for older non-parametric methods, and connected to the so-called curse of dimensionality. Though it has long been believed that efficient learning in deep architectures is difficult, recently proposed computational principles for learning in deep architectures may offer a breakthrough.
Collapse
Affiliation(s)
- Yoshua Bengio
- Department IRO, Université de Montréal, P.O. Box 6128, Downtown Branch, Montreal, QC, H3C 3J7, Canada.
| |
Collapse
|
369
|
van Dinther R, Patterson RD. Perception of acoustic scale and size in musical instrument sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2158-76. [PMID: 17069313 PMCID: PMC2821800 DOI: 10.1121/1.2338295] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.
Collapse
Affiliation(s)
- Ralph van Dinther
- Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG UK.
| | | |
Collapse
|
370
|
Abstract
MOTIVATION Machine learning methods such as neural networks, support vector machines, and other classification and regression methods rely on iterative optimization of the model quality in the space of the parameters of the method. Model quality measures (accuracies, correlations, etc.) are frequently overly optimistic because the training sets are dominated by particular families and subfamilies. To overcome the bias, the dataset is usually reduced by filtering out closely related objects. However, such filtering uses fixed similarity thresholds and ignores a part of the training information. RESULTS We suggested a novel approach to calculate prediction model quality based on assigning to each data point inverse density weights derived from the postulated distance metric. We demonstrated that our new weighted measures estimate the model generalization better and are consistent with the machine learning theory. The Vapnik-Chervonenkis theorem was reformulated and applied to derive the space-uniform error estimates. Two examples were used to illustrate the advantages of the inverse density weighting. First, we demonstrated on a set with a built-in bias that the unweighted cross-validation procedure leads to an overly optimistic quality estimate, while the density-weighted quality estimates are more realistic. Second, an analytical equation for weighted quality estimates was used to derive an SVM model for signal peptide prediction using a full set of known signal peptides, instead of the usual filtered subset.
Collapse
Affiliation(s)
- Levon Budagyan
- Molsoft LLC, 3366 North Torrey Pines Court Suite 300, San Diego, CA 92037, USA.
| | | |
Collapse
|
371
|
Bryson JJ, Leong JCS. Primate errors in transitive ‘inference’: a two-tier learning model. Anim Cogn 2006; 10:1-15. [PMID: 16810495 DOI: 10.1007/s10071-006-0024-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2005] [Revised: 05/01/2006] [Accepted: 05/04/2006] [Indexed: 11/24/2022]
Abstract
Transitive performance (TP) is a learning-based behaviour exhibited by a wide range of species, where if a subject has been taught to prefer A when presented with the pair AB but to prefer B when presented with the pair BC, then the subject will also prefer A when presented with the novel pair AC. Most explanations of TP assume that subjects recognize and learn an underlying sequence from observing the training pairs. However, data from squirrel monkeys (Saimiri sciureus) and young children contradict this, showing that when three different items (a triad) are drawn from the sequence, subjects' performance degrades systematically (McGonigle and Chalmers, Nature 267:694-696, 1977; Chalmers and McGonigle, Journal of Experimental Child Psychology 37:355-377, 1984; Harris and McGonigle, The Quarterly Journal of Experimental Psychology 47B:319-348, 1994). We present here the two-tier model, the first learning model of TP which accounts for this systematic performance degradation. Our model assumes primate TP is based on a general-purpose task learning system rather than a special-purpose sequence-learning system. It supports the hypothesis of Heckers et al. (Hippocampus 14:153-162, 2004) that TP is an expression of two separate general learning elements: one for associating actions and contexts, another for prioritising associations when more than one context is present. The two-tier model also provides explanations for why phased training is important for helping subjects learn the initial training pairs and why some subjects fail to do so. It also supports the Harris and McGonigle (The Quarterly Journal of Experimental Psychology 47B:319-348, 1994) explanation of why, once the training pairs have been acquired, subjects perform transitive choice automatically on two-item diads, but not when exposed to triads from the same sequence.
Collapse
Affiliation(s)
- Joanna J Bryson
- Artificial Models of Natural Intelligence, University of Bath, Bath, BA2 7AY, UK.
| | | |
Collapse
|
372
|
|
373
|
Smith DRR, Patterson RD, Turner R, Kawahara H, Irino T. The processing and perception of size information in speech sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:305-18. [PMID: 15704423 PMCID: PMC2346562 DOI: 10.1121/1.1828637] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.
Collapse
Affiliation(s)
- David R R Smith
- Centre for Neural Basis of Hearing, Department of Physiology, University of Cambridge, Cambridge CB2 3EG, United Kingdom
| | | | | | | | | |
Collapse
|
374
|
|
375
|
Butz MV, Sigaud O, Gérard P. Internal Models and Anticipations in Adaptive Learning Systems. ANTICIPATORY BEHAVIOR IN ADAPTIVE LEARNING SYSTEMS 2003. [DOI: 10.1007/978-3-540-45002-3_6] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
376
|
Bensusan H, Kalousis A. Estimating the Predictive Accuracy of a Classifier. MACHINE LEARNING: ECML 2001 2003. [DOI: 10.1007/3-540-44795-4_3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
377
|
|
378
|
Abstract
This review provides a comprehensive understanding of regularization theory from different perspectives, emphasizing smoothness and simplicity principles. Using the tools of operator theory and Fourier analysis, it is shown that the solution of the classical Tikhonov regularization problem can be derived from the regularized functional defined by a linear differential (integral) operator in the spatial (Fourier) domain. State-of-the-art research relevant to the regularization theory is reviewed, covering Occam's razor, minimum length description, Bayesian theory, pruning algorithms, informational (entropy) theory, statistical learning theory, and equivalent regularization. The universal principle of regularization in terms of Kolmogorov complexity is discussed. Finally, some prospective studies on regularization theory and beyond are suggested.
Collapse
Affiliation(s)
- Zhe Chen
- Adaptive Systems Lab, Communications Research Laboratory, McMaster University, Hamilton, Ontario, Canada L8S 4K1.
| | | |
Collapse
|
379
|
Vehtari A, Lampinen J. Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput 2002; 14:2439-68. [PMID: 12396570 DOI: 10.1162/08997660260293292] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate because it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical cross-validation methods, importance sampling and k-fold cross-validation. As illustrative examples, we use multilayer perceptron neural networks and gaussian processes with Markov chain Monte Carlo sampling in one toy problem and two challenging real-world problems.
Collapse
Affiliation(s)
- Aki Vehtari
- Laboratory of Computational Engineering, Helsinki University of Technology, FIN-02015, HUT, Finland.
| | | |
Collapse
|
380
|
Peng Y, Flach PA, Soares C, Brazdil P. Improved Dataset Characterisation for Meta-learning. DISCOVERY SCIENCE 2002. [DOI: 10.1007/3-540-36182-0_14] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
381
|
O'Reilly RC. Generalization in interactive networks: the benefits of inhibitory competition and Hebbian learning. Neural Comput 2001; 13:1199-241. [PMID: 11387044 DOI: 10.1162/08997660152002834] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful and has proven useful for modeling a range of psychological data but is not biologically plausible. Several approaches to implementing backpropagation in a biologically plausible fashion converge on the idea of using bidirectional activation propagation in interactive networks to convey error signals. This article demonstrates two main points about these error-driven interactive networks: (1) they generalize poorly due to attractor dynamics that interfere with the network's ability to produce novel combinatorial representations systematically in response to novel inputs, and (2) this generalization problem can be remedied by adding two widely used mechanistic principles, inhibitory competition and Hebbian learning, that can be independently motivated for a variety of biological, psychological, and computational reasons. Simulations using the Leabra algorithm, which combines the generalized recirculation (GeneRec), biologically plausible, error-driven learning algorithm with inhibitory competition and Hebbian learning, show that these mechanisms can result in good generalization in interactive networks. These results support the general conclusion that cognitive neuroscience models that incorporate the core mechanistic principles of interactivity, inhibitory competition, and error-driven and Hebbian learning satisfy a wider range of biological, psychological, and computational constraints than models employing a subset of these principles.
Collapse
Affiliation(s)
- R C O'Reilly
- Department of Psychology, University of Colorado at Boulder, Boulder, CO 80309, USA
| |
Collapse
|
382
|
Abstract
We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The generalization capability of a statistical model, classical or Bayesian, is ultimately based on the prior assumptions. The Bayesian approach permits propagation of uncertainty in quantities which are unknown to other assumptions in the model, which may be more generally valid or easier to guess in the problem. The case problem studied in this paper include a regression, a classification, and an inverse problem. In the most thoroughly analyzed regression problem, the best models were those with less restrictive priors. This emphasizes the major advantage of the Bayesian approach, that we are not forced to guess attributes that are unknown, such as the number of degrees of freedom in the model, non-linearity of the model with respect to each input variable, or the exact form for the distribution of the model residuals.
Collapse
Affiliation(s)
- J Lampinen
- Laboratory of Computational Engineering, Helsinki University of Technology, Espoo, Finland.
| | | |
Collapse
|
383
|
Kalousis A, Hilario M. Feature Selection for Meta-learning. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2001. [DOI: 10.1007/3-540-45357-1_26] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
384
|
Architectures and Idioms: Making Progress in Agent Design. LECTURE NOTES IN COMPUTER SCIENCE 2001. [DOI: 10.1007/3-540-44631-1_6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
385
|
Abstract
No-free-lunch theorems have shown that learning algorithms cannot be universally good. We show that no free funch exists for noise prediction as well. We show that when the noise is additive and the prior over target functions is uniform, a prior on the noise distribution cannot be updated, in the Bayesian sense, from any finite data set. We emphasize the importance of a prior over the target function in order to justify superior performance for learning systems.
Collapse
|
386
|
Grünwald P. Model Selection Based on Minimum Description Length. JOURNAL OF MATHEMATICAL PSYCHOLOGY 2000; 44:133-152. [PMID: 10733861 DOI: 10.1006/jmps.1999.1280] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
We introduce the minimum description length (MDL) principle, a general principle for inductive inference based on the idea that regularities (laws) underlying data can always be used to compress data. We introduce the fundamental concept of MDL, called the stochastic complexity, and we show how it can be used for model selection. We briefly compare MDL-based model selection to other approaches and we informally explain why we may expect MDL to give good results in practical applications. Copyright 2000 Academic Press.
Collapse
|
387
|
Forster MR. Key Concepts in Model Selection: Performance and Generalizability. JOURNAL OF MATHEMATICAL PSYCHOLOGY 2000; 44:205-231. [PMID: 10733865 DOI: 10.1006/jmps.1999.1284] [Citation(s) in RCA: 146] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
What is model selection? What are the goals of model selection? What are the methods of model selection and how do they work? Which methods perform better than others and in what circumstances? These questions rest on a number of key concepts in a relatively underdeveloped field. The aim of this paper is to explain some background concepts, to highlight some of the results in this special issue, and to add my own. The standard methods of model selection include classical hypothesis testing, maximum likelihood, Bayes method, minimum description length, cross-validation, and Akaike's information criterion. They all provide an implementation of Occam's razor, in which parsimony or simplicity is balanced against goodness-of-fit. These methods primarily take account of the sampling errors in parameter estimation, although their relative success at this task depends on the circumstances. However, the aim of model selection should also include the ability of a model to generalize to predictions in a different domain. Errors of extrapolation, or generalization, are different from errors of parameter estimation. So, it seems that simplicity and parsimony may be an additional factor in managing these errors, in which case the standard methods of model selection are incomplete implementations of Occam's razor. Copyright 2000 Academic Press.
Collapse
|
388
|
Marcus GF. Language acquisition in the absence of explicit negative evidence: can simple recurrent networks obviate the need for domain-specific learning devices? Cognition 1999; 73:293-6. [PMID: 10585518 DOI: 10.1016/s0010-0277(99)00054-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- G F Marcus
- Department of Psychology, New York University, New York, NY 10012, USA.
| |
Collapse
|
389
|
|
390
|
|
391
|
Abstract
We show that with a uniform prior on models having the same training error, early stopping at some fixed training error above the training error minimum results in an increase in the expected generalization error.
Collapse
Affiliation(s)
- Z Cataltepe
- Bell Labs, Lucent Technologies, Room 2C-265, 600 Mountain Avenue, Murray Hill, NJ 07974, USA.
| | | | | |
Collapse
|
392
|
Sharkey AJC. Linear and Order Statistics Combiners for Pattern Classification. PERSPECTIVES IN NEURAL COMPUTING 1999. [DOI: 10.1007/978-1-4471-0793-4_6] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
393
|
Abstract
For any discrete-state sequence prediction algorithm A, it is always possible, using an algorithm B no more complicated than A, to generate a sequence for which A's prediction is always wrong. For any prediction algorithm A and sequence x, there exists a sequence y no more complicated than x, such that if A performs better than random on x, then it will perform worse than random on y by the same margin. An example of a simple neural network predicting a bit sequence is used to illustrate this very general but not widely recognized phenomenon. This implies that any predictor with good performance must rely on some (usually implicitly) assumed prior distributions of the problem.
Collapse
Affiliation(s)
- H Zhu
- The Santa Fe Institute,1399 Hyde Park Road, Santa Fe, MN 87501, USA.
| | | |
Collapse
|
394
|
Abstract
This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in Bayesian analysis) and training sets are averaged over (as in the conventional bias plus variance formula). Another additive correction casts conventional fixed-trainingset Bayesian analysis directly in terms of bias plus variance. Another correction is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point. Yet another correction can help explain the recent counterintuitive bias-variance decomposition of Friedman for zero-one loss. After presenting these corrections, this article discusses some other loss function-specific aspects of supervised learning. In particular, there is a discussion of the fact that if the loss function is a metric (e.g., zero-one loss), then there is bound on the change in generalization error accompanying changing the algorithm's guess from h1 to h2, a bound that depends only on h1 and h2 and not on the target. This article ends by presenting versions of the bias-plus-variance formula appropriate for logarithmic and quadratic scoring, and then all the additive corrections appropriate to those formulas. All the correction terms presented are a covariance, between the learning algorithm and the posterior distribution over targets. Accordingly, in the (very common) contexts in which those terms apply, there is not a “bias-variance trade-off” or a “bias-variance dilemma,” as one often hears. Rather there is a bias-variance-covariance trade-off.
Collapse
|
395
|
A computer scientist's view of life, the universe, and everything. FOUNDATIONS OF COMPUTER SCIENCE 1997. [DOI: 10.1007/bfb0052088] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|