1
|
The Keys to the Future? An Examination of Statistical Versus Discriminative Accounts of Serial Pattern Learning. Cogn Sci 2024; 48:e13404. [PMID: 38294059 DOI: 10.1111/cogs.13404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 02/01/2024]
Abstract
Sequence learning is fundamental to a wide range of cognitive functions. Explaining how sequences-and the relations between the elements they comprise-are learned is a fundamental challenge to cognitive science. However, although hundreds of articles addressing this question are published each year, the actual learning mechanisms involved in the learning of sequences are rarely investigated. We present three experiments that seek to examine these mechanisms during a typing task. Experiments 1 and 2 tested learning during typing single letters on each trial. Experiment 3 tested for "chunking" of these letters into "words." The results of these experiments were used to examine the mechanisms that could best account for them, with a focus on two particular proposals: statistical transitional probability learning and discriminative error-driven learning. Experiments 1 and 2 showed that error-driven learning was a better predictor of response latencies than either n-gram frequencies or transitional probabilities. No evidence for chunking was found in Experiment 3, probably due to interspersing visual cues with the motor response. In addition, learning occurred across a greater distance in Experiment 1 than Experiment 2, suggesting that the greater predictability that comes with increased structure leads to greater learnability. These results shed new light on the mechanism responsible for sequence learning. Despite the widely held assumption that transitional probability learning is essential to this process, the present results suggest instead that the sequences are learned through a process of discriminative learning, involving prediction and feedback from prediction error.
Collapse
|
2
|
Discriminative, Restorative, and Adversarial Learning: Stepwise Incremental Pretraining. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER : 4TH MICCAI WORKSHOP, DART 2022, HELD IN CONJUNCTION WITH MICCAI 2022, SINGAPORE, SEPTEMBER 22, 2022, PROCEEDINGS. DOMAIN ADAPTATION AND REPRESENTATION TRANSFER (WORKSHOP) (4TH : 2022 : SIN... 2022; 13542:66-76. [PMID: 36507899 PMCID: PMC9728134 DOI: 10.1007/978-3-031-16852-9_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Uniting three self-supervised learning (SSL) ingredients (discriminative, restorative, and adversarial learning) enables collaborative representation learning and yields three transferable components: a discriminative encoder, a restorative decoder, and an adversary encoder. To leverage this advantage, we have redesigned five prominent SSL methods, including Rotation, Jigsaw, Rubik's Cube, Deep Clustering, and TransVW, and formulated each in a United framework for 3D medical imaging. However, such a United framework increases model complexity and pretraining difficulty. To overcome this difficulty, we develop a stepwise incremental pretraining strategy, in which a discriminative encoder is first trained via discriminative learning, the pretrained discriminative encoder is then attached to a restorative decoder, forming a skip-connected encoder-decoder, for further joint discriminative and restorative learning, and finally, the pretrained encoder-decoder is associated with an adversarial encoder for final full discriminative, restorative, and adversarial learning. Our extensive experiments demonstrate that the stepwise incremental pretraining stabilizes United models training, resulting in significant performance gains and annotation cost reduction via transfer learning for five target tasks, encompassing both classification and segmentation, across diseases, organs, datasets, and modalities. This performance is attributed to the synergy of the three SSL ingredients in our United framework unleashed via stepwise incremental pretraining. All codes and pretrained models are available at GitHub.com/JLiangLab/StepwisePretraining.
Collapse
|
3
|
An exploration of error-driven learning in simple two-layer networks from a discriminative learning perspective. Behav Res Methods 2022; 54:2221-2251. [PMID: 35032022 PMCID: PMC9579095 DOI: 10.3758/s13428-021-01711-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2021] [Indexed: 11/08/2022]
Abstract
Error-driven learning algorithms, which iteratively adjust expectations based on prediction error, are the basis for a vast array of computational models in the brain and cognitive sciences that often differ widely in their precise form and application: they range from simple models in psychology and cybernetics to current complex deep learning models dominating discussions in machine learning and artificial intelligence. However, despite the ubiquity of this mechanism, detailed analyses of its basic workings uninfluenced by existing theories or specific research goals are rare in the literature. To address this, we present an exposition of error-driven learning – focusing on its simplest form for clarity – and relate this to the historical development of error-driven learning models in the cognitive sciences. Although historically error-driven models have been thought of as associative, such that learning is thought to combine preexisting elemental representations, our analysis will highlight the discriminative nature of learning in these models and the implications of this for the way how learning is conceptualized. We complement our theoretical introduction to error-driven learning with a practical guide to the application of simple error-driven learning models in which we discuss a number of example simulations, that are also presented in detail in an accompanying tutorial.
Collapse
|
4
|
Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. ACTA ACUST UNITED AC 2021; 2:420. [PMID: 34426802 PMCID: PMC8372231 DOI: 10.1007/s42979-021-00815-1] [Citation(s) in RCA: 163] [Impact Index Per Article: 54.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Accepted: 08/07/2021] [Indexed: 11/26/2022]
Abstract
Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today’s Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various application areas like healthcare, visual recognition, text analytics, cybersecurity, and many more. However, building an appropriate DL model is a challenging task, due to the dynamic nature and variations in real-world problems and data. Moreover, the lack of core understanding turns DL methods into black-box machines that hamper development at the standard level. This article presents a structured and comprehensive view on DL techniques including a taxonomy considering various types of real-world tasks like supervised or unsupervised. In our taxonomy, we take into account deep networks for supervised or discriminative learning, unsupervised or generative learning as well as hybrid learning and relevant others. We also summarize real-world application areas where deep learning techniques can be used. Finally, we point out ten potential aspects for future generation DL modeling with research directions. Overall, this article aims to draw a big picture on DL modeling that can be used as a reference guide for both academia and industry professionals.
Collapse
|
5
|
Prediction and error in early infant speech learning: A speech acquisition model. Cognition 2021; 212:104697. [PMID: 33798952 PMCID: PMC8173624 DOI: 10.1016/j.cognition.2021.104697] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 03/03/2021] [Accepted: 03/19/2021] [Indexed: 12/28/2022]
Abstract
In the last two decades, statistical clustering models have emerged as a dominant model of how infants learn the sounds of their language. However, recent empirical and computational evidence suggests that purely statistical clustering methods may not be sufficient to explain speech sound acquisition. To model early development of speech perception, the present study used a two-layer network trained with Rescorla-Wagner learning equations, an implementation of discriminative, error-driven learning. The model contained no a priori linguistic units, such as phonemes or phonetic features. Instead, expectations about the upcoming acoustic speech signal were learned from the surrounding speech signal, with spectral components extracted from an audio recording of child-directed speech as both inputs and outputs of the model. To evaluate model performance, we simulated infant responses in the high-amplitude sucking paradigm using vowel and fricative pairs and continua. The simulations were able to discriminate vowel and consonant pairs and predicted the infant speech perception data. The model also showed the greatest amount of discrimination in the expected spectral frequencies. These results suggest that discriminative error-driven learning may provide a viable approach to modelling early infant speech sound acquisition.
Collapse
|
6
|
Order Matters! Influences of Linear Order on Linguistic Category Learning. Cogn Sci 2020; 44:e12910. [PMID: 33124103 PMCID: PMC7685149 DOI: 10.1111/cogs.12910] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 08/21/2020] [Accepted: 09/02/2020] [Indexed: 11/27/2022]
Abstract
Linguistic category learning has been shown to be highly sensitive to linear order, and depending on the task, differentially sensitive to the information provided by preceding category markers (premarkers, e.g., gendered articles) or succeeding category markers (postmarkers, e.g., gendered suffixes). Given that numerous systems for marking grammatical categories exist in natural languages, it follows that a better understanding of these findings can shed light on the factors underlying this diversity. In two discriminative learning simulations and an artificial language learning experiment, we identify two factors that modulate linear order effects in linguistic category learning: category structure and the level of abstraction in a category hierarchy. Regarding category structure, we find that postmarking brings an advantage for learning category diagnostic stimulus dimensions, an effect not present when categories are non-confusable. Regarding levels of abstraction, we find that premarking of super-ordinate categories (e.g., noun class) facilitates learning of subordinate categories (e.g., nouns). We present detailed simulations using a plausible candidate mechanism for the observed effects, along with a comprehensive analysis of linear order effects within an expectation-based account of learning. Our findings indicate that linguistic category learning is differentially guided by pre- and postmarking, and that the influence of each is modulated by the specific characteristics of a given category system.
Collapse
|
7
|
Learning about things that never happened: A critique and refinement of the Rescorla-Wagner update rule when many outcomes are possible. Mem Cognit 2020; 47:1415-1430. [PMID: 31152383 DOI: 10.3758/s13421-019-00942-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A vector-based model of discriminative learning is presented. It is demonstrated to learn association strengths identical to the Rescorla-Wagner model under certain parameter settings (Rescorla & Wagner, 1972, Classical Conditioning II: Current Research and Theory, 2, 64-99). For other parameter settings, it approximates the association strengths learned by the Rescorla-Wagner model. I argue that the Rescorla-Wagner model has conceptual details that exclude it as an algorithmically plausible model of learning. The vector learning model, however, does not suffer from the same conceptual issues. Finally, we demonstrate that the vector learning model provides insight into how animals might learn the semantics of stimuli rather than just their associations. Results for simulations of language processing experiments are reported.
Collapse
|
8
|
Discriminative learning and associative memory under the differential outcomes procedure is modulated by cognitive load. Acta Psychol (Amst) 2020; 208:103103. [PMID: 32569877 DOI: 10.1016/j.actpsy.2020.103103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 03/29/2020] [Accepted: 05/20/2020] [Indexed: 11/29/2022] Open
Abstract
Working memory (WM) has been thought to be the cause of associative memory deficits in older adults. Previous research has demonstrated the benefits of a discriminative learning procedure, the differential outcomes procedure (DOP), to ameliorate such associative-memory maintenance deficits in situations that simulate adherence to medical prescriptions in both healthy and pathological ageing. Specifically, the DOP involves rewarding each correct response to each stimulus-stimulus association with a distinct and unique outcome (reinforcer). The aim of the present study was to explore the limits of this procedure by testing the amount of cognitive load at which the DOP improves discriminative learning and associative memory in a task that simulates adherence to medical treatment in undergraduate students. During the training phase, participants were asked to learn three pill/name (low-load condition) or four pill/name associations (high-load conditions) under the DOP in comparison with a control condition (the non-differential outcomes condition, NOP). Long-term retention of such learned associations was tested 1h and 1week after completion of the training phase. Participants showed a better accuracy and long-term retention of the learned associations when the DOP was used, but just in the high-load condition. These results suggest that when WM is overtaxed, the DOP plays a fundamental role in the long-term maintenance of the learned stimulus-stimulus associations, rendering such learning procedure as a useful technique to enhance people's discriminative learning and associative memory.
Collapse
|
9
|
Of mice and men: Speech sound acquisition as discriminative learning from prediction error, not just statistical tracking. Cognition 2020; 197:104081. [PMID: 31901874 PMCID: PMC7033563 DOI: 10.1016/j.cognition.2019.104081] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 09/18/2019] [Accepted: 09/18/2019] [Indexed: 11/17/2022]
Abstract
Despite burgeoning evidence that listeners are highly sensitive to statistical distributions of speech cues, the mechanism underlying learning may not be purely statistical tracking. Decades of research in animal learning suggest that learning results from prediction and prediction error. Two artificial language learning experiments test two predictions that distinguish error-driven from purely statistical models; namely, cue competition - specifically, Kamin's (1968) 'blocking' effect (Experiment 1) - and the predictive structure of learning events (Experiment 2). In Experiment 1, prior knowledge of an informative cue blocked learning of a second cue. This finding may help explain second language learners' difficulty in acquiring native-level perception of non-native speech cues. In Experiment 2, learning was better with a discriminative (cue-outcome) order compared to a non-discriminative (outcome-cue) order. Experiment 2 suggests that learning speech cues, including reversing effects of blocking, depends on (un)learning from prediction error and depends on the temporal order of auditory cues versus semantic outcomes. Together, these results show that (a) existing knowledge of acoustic cues can block later learning of new cues, and (b) speech sound acquisition depends on the predictive structure of learning events. When feedback from prediction error is available, this drives learners to ignore salient non-discriminative cues and effectively learn to use target cue dimensions. These findings may have considerable implications for the field of speech acquisition.
Collapse
|
10
|
Semi-supervised distributed representations of documents for sentiment analysis. Neural Netw 2019; 119:139-150. [PMID: 31425854 DOI: 10.1016/j.neunet.2019.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 06/07/2019] [Accepted: 08/01/2019] [Indexed: 01/10/2023]
Abstract
Learning document representation is important in applying machine learning algorithms for sentiment analysis. Distributed representation learning models of words and documents, one of neural language models, have overcome some limits of vector space models such as bag-of-words model and have been utilized successively in many natural language processing tasks including sentiment analysis. However, because such models learn the embeddings only with a context-based objective, it is hard for embeddings to reflect the sentiment of texts. In this research, we address this problem by introducing a semi-supervised sentiment-discriminative objective using partial sentiment information of documents. Our method not only reflects the partial sentiment information, but also preserves local structures induced from original distributed representation learning objectives by considering only sentiment relationships between neighboring documents. Using real-world datasets, the proposed method has been validated by sentiment visualization and classification tasks. The visualization results of Amazon review datasets demonstrate the enhancement of the sentiment class separation when document representations of our proposed method are compared to other methods. Sentiment prediction from our representations also appears to be consistently superior to other representations in both Amazon and Yelp datasets. This work can be extended to develop effective document embeddings applied to other discriminative tasks.
Collapse
|
11
|
Discriminative confidence estimation for probabilistic multi-atlas label fusion. Med Image Anal 2017; 42:274-287. [PMID: 28888171 DOI: 10.1016/j.media.2017.08.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 06/26/2017] [Accepted: 08/29/2017] [Indexed: 12/31/2022]
Abstract
Quantitative neuroimaging analyses often rely on the accurate segmentation of anatomical brain structures. In contrast to manual segmentation, automatic methods offer reproducible outputs and provide scalability to study large databases. Among existing approaches, multi-atlas segmentation has recently shown to yield state-of-the-art performance in automatic segmentation of brain images. It consists in propagating the labelmaps from a set of atlases to the anatomy of a target image using image registration, and then fusing these multiple warped labelmaps into a consensus segmentation on the target image. Accurately estimating the contribution of each atlas labelmap to the final segmentation is a critical step for the success of multi-atlas segmentation. Common approaches to label fusion either rely on local patch similarity, probabilistic statistical frameworks or a combination of both. In this work, we propose a probabilistic label fusion framework based on atlas label confidences computed at each voxel of the structure of interest. Maximum likelihood atlas confidences are estimated using a supervised approach, explicitly modeling the relationship between local image appearances and segmentation errors produced by each of the atlases. We evaluate different spatial pooling strategies for modeling local segmentation errors. We also present a novel type of label-dependent appearance features based on atlas labelmaps that are used during confidence estimation to increase the accuracy of our label fusion. Our approach is evaluated on the segmentation of seven subcortical brain structures from the MICCAI 2013 SATA Challenge dataset and the hippocampi from the ADNI dataset. Overall, our results indicate that the proposed label fusion framework achieves superior performance to state-of-the-art approaches in the majority of the evaluated brain structures and shows more robustness to registration errors.
Collapse
|
12
|
Longitudinal investigation on learned helplessness tested under negative and positive reinforcement involving stimulus control. Behav Processes 2014; 106:160-7. [PMID: 24814908 DOI: 10.1016/j.beproc.2014.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2013] [Revised: 02/11/2014] [Accepted: 03/21/2014] [Indexed: 10/25/2022]
Abstract
In this study, we investigated whether (a) animals demonstrating the learned helplessness effect during an escape contingency also show learning deficits under positive reinforcement contingencies involving stimulus control and (b) the exposure to positive reinforcement contingencies eliminates the learned helplessness effect under an escape contingency. Rats were initially exposed to controllable (C), uncontrollable (U) or no (N) shocks. After 24h, they were exposed to 60 escapable shocks delivered in a shuttlebox. In the following phase, we selected from each group the four subjects that presented the most typical group pattern: no escape learning (learned helplessness effect) in Group U and escape learning in Groups C and N. All subjects were then exposed to two phases, the (1) positive reinforcement for lever pressing under a multiple FR/Extinction schedule and (2) a re-test under negative reinforcement (escape). A fourth group (n=4) was exposed only to the positive reinforcement sessions. All subjects showed discrimination learning under multiple schedule. In the escape re-test, the learned helplessness effect was maintained for three of the animals in Group U. These results suggest that the learned helplessness effect did not extend to discriminative behavior that is positively reinforced and that the learned helplessness effect did not revert for most subjects after exposure to positive reinforcement. We discuss some theoretical implications as related to learned helplessness as an effect restricted to aversive contingencies and to the absence of reversion after positive reinforcement. This article is part of a Special Issue entitled: insert SI title.
Collapse
|
13
|
Structured Set Intra Prediction With Discriminative Learning in a Max-Margin Markov Network for High Efficiency Video Coding. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY : A PUBLICATION OF THE CIRCUITS AND SYSTEMS SOCIETY 2013; 23:1941-1956. [PMID: 25505829 PMCID: PMC4260422 DOI: 10.1109/tcsvt.2013.2269776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper proposes a novel model on intra coding for High Efficiency Video Coding (HEVC), which simultaneously predicts blocks of pixels with optimal rate distortion. It utilizes the spatial statistical correlation for the optimal prediction based on 2-D contexts, in addition to formulating the data-driven structural interdependences to make the prediction error coherent with the probability distribution, which is desirable for successful transform and coding. The structured set prediction model incorporates a max-margin Markov network (M3N) to regulate and optimize multiple block predictions. The model parameters are learned by discriminating the actual pixel value from other possible estimates to maximize the margin (i.e., decision boundary bandwidth). Compared to existing methods that focus on minimizing prediction error, the M3N-based model adaptively maintains the coherence for a set of predictions. Specifically, the proposed model concurrently optimizes a set of predictions by associating the loss for individual blocks to the joint distribution of succeeding discrete cosine transform coefficients. When the sample size grows, the prediction error is asymptotically upper bounded by the training error under the decomposable loss function. As an internal step, we optimize the underlying Markov network structure to find states that achieve the maximal energy using expectation propagation. For validation, we integrate the proposed model into HEVC for optimal mode selection on rate-distortion optimization. The proposed prediction model obtains up to 2.85% bit rate reduction and achieves better visual quality in comparison to the HEVC intra coding.
Collapse
|
14
|
Stochastic margin-based structure learning of Bayesian network classifiers. PATTERN RECOGNITION 2013; 46:464-471. [PMID: 24511159 PMCID: PMC3914412 DOI: 10.1016/j.patcog.2012.08.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Revised: 05/24/2012] [Accepted: 08/04/2012] [Indexed: 06/03/2023]
Abstract
The margin criterion for parameter learning in graphical models gained significant impact over the last years. We use the maximum margin score for discriminatively optimizing the structure of Bayesian network classifiers. Furthermore, greedy hill-climbing and simulated annealing search heuristics are applied to determine the classifier structures. In the experiments, we demonstrate the advantages of maximum margin optimized Bayesian network structures in terms of classification performance compared to traditionally used discriminative structure learning methods. Stochastic simulated annealing requires less score evaluations than greedy heuristics. Additionally, we compare generative and discriminative parameter learning on both generatively and discriminatively structured Bayesian network classifiers. Margin-optimized Bayesian network classifiers achieve similar classification performance as support vector machines. Moreover, missing feature values during classification can be handled by discriminatively optimized Bayesian network classifiers, a case where purely discriminative classifiers usually require mechanisms to complete unknown feature values in the data first.
Collapse
|