1
|
Duan T, Wang Z, Li F, Doretto G, Adjeroh DA, Yin Y, Tao C. Online continual decoding of streaming EEG signal with a balanced and informative memory buffer. Neural Netw 2024; 176:106338. [PMID: 38692190 DOI: 10.1016/j.neunet.2024.106338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 03/20/2024] [Accepted: 04/23/2024] [Indexed: 05/03/2024]
Abstract
Electroencephalography (EEG) based Brain Computer Interface (BCI) systems play a significant role in facilitating how individuals with neurological impairments effectively interact with their environment. In real world applications of BCI system for clinical assistance and rehabilitation training, the EEG classifier often needs to learn on sequentially arriving subjects in an online manner. As patterns of EEG signals can be significantly different for different subjects, the EEG classifier can easily erase knowledge of learnt subjects after learning on later ones as it performs decoding in online streaming scenario, namely catastrophic forgetting. In this work, we tackle this problem with a memory-based approach, which considers the following conditions: (1) subjects arrive sequentially in an online manner, with no large scale dataset available for joint training beforehand, (2) data volume from the different subjects could be imbalanced, (3) decoding difficulty of the sequential streaming signal vary, (4) continual classification for a long time is required. This online sequential EEG decoding problem is more challenging than classic cross subject EEG decoding as there is no large-scale training data from the different subjects available beforehand. The proposed model keeps a small balanced memory buffer during sequential learning, with memory data dynamically selected based on joint consideration of data volume and informativeness. Furthermore, for the more general scenarios where subject identity is unknown to the EEG decoder, aka. subject agnostic scenario, we propose a kernel based subject shift detection method that identifies underlying subject changes on the fly in a computationally efficient manner. We develop challenging benchmarks of streaming EEG data from sequentially arriving subjects with both balanced and imbalanced data volumes, and performed extensive experiments with a detailed ablation study on the proposed model. The results show the effectiveness of our proposed approach, enabling the decoder to maintain performance on all previously seen subjects over a long period of sequential decoding. The model demonstrates the potential for real-world applications.
Collapse
Affiliation(s)
- Tiehang Duan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32246 United States
| | - Zhenyi Wang
- Department of Computer Science, University of Maryland, College Park, MD, 20742, United States
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32246 United States
| | - Gianfranco Doretto
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26506, United States
| | - Donald A Adjeroh
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26506, United States.
| | - Yiyi Yin
- Meta AI, Seattle, WA, 98005, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, 32246 United States.
| |
Collapse
|
2
|
van Diggelen F, Cambier N, Ferrante E, Eiben AE. A model-free method to learn multiple skills in parallel on modular robots. Nat Commun 2024; 15:6267. [PMID: 39048541 DOI: 10.1038/s41467-024-50131-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 07/02/2024] [Indexed: 07/27/2024] Open
Abstract
Legged robots are well-suited for deployment in unstructured environments but require a unique control scheme specific for their design. As controllers optimised in simulation do not transfer well to the real world (the infamous sim-to-real gap), methods enabling quick learning in the real world, without any assumptions on the specific robot model and its dynamics, are necessary. In this paper, we present a generic method based on Central Pattern Generators, that enables the acquisition of basic locomotion skills in parallel, through very few trials. The novelty of our approach, underpinned by a mathematical analysis of the controller model, is to search for good initial states, instead of optimising connection weights. Empirical validation in six different robot morphologies demonstrates that our method enables robots to learn primary locomotion skills in less than 15 minutes in the real world. In the end, we showcase our skills in a targeted locomotion experiment.
Collapse
Affiliation(s)
- Fuda van Diggelen
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands.
| | - Nicolas Cambier
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands
| | - Eliseo Ferrante
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands
| | - A E Eiben
- Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands
| |
Collapse
|
3
|
Zeng X, Guo Y, Li L, Liu Y. Continual medical image denoising based on triplet neural networks collaboration. Comput Biol Med 2024; 179:108914. [PMID: 39053331 DOI: 10.1016/j.compbiomed.2024.108914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 07/14/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024]
Abstract
BACKGROUND When multiple tasks are learned consecutively, the old model parameters may be overwritten by the new data, resulting in the phenomenon that the new task is learned and the old task is forgotten, which leads to catastrophic forgetting. Moreover, continual learning has no mature solution for image denoising tasks. METHODS Therefore, in order to solve the problem of catastrophic forgetting caused by learning multiple denoising tasks, we propose a Triplet Neural-networks Collaboration-continuity DeNosing (TNCDN) model. Use triplet neural networks to update each other cooperatively. The knowledge from two denoising networks that maintain continual learning capability is transferred to the main-denoising network. The main-denoising network has new knowledge and can consolidate old knowledge. A co-training mechanism is designed. The main-denoising network updates the other two denoising networks with different thresholds to maintain memory reinforcement capability and knowledge extension capability. RESULTS The experimental results show that our method effectively alleviates catastrophic forgetting. In GS, CT and ADNI datasets, compared with ANCL, the TNCDN(PromptIR) method reduced the average degree of forgetting on the evaluation index PSNR by 2.38 (39%) and RMSE by 1.63 (55%). CONCLUSION This study aims to solve the problem of catastrophic forgetting caused by learning multiple denoising tasks. Although the experimental results are promising, extending the basic denoising model to more data sets and tasks will enhance its application. Nevertheless, this study is a starting point, which can provide reference and support for the further development of continuous learning image denoising task.
Collapse
Affiliation(s)
- Xianhua Zeng
- School of Computer Science and Technology/School of Artificial Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Yongli Guo
- School of Computer Science and Technology/School of Artificial Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Laquan Li
- School of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| | - Yuhang Liu
- School of Computer Science and Technology/School of Artificial Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
| |
Collapse
|
4
|
Lu Q, Nguyen TT, Zhang Q, Hasson U, Griffiths TL, Zacks JM, Gershman SJ, Norman KA. Reconciling shared versus context-specific information in a neural network model of latent causes. Sci Rep 2024; 14:16782. [PMID: 39039131 PMCID: PMC11263346 DOI: 10.1038/s41598-024-64272-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 06/06/2024] [Indexed: 07/24/2024] Open
Abstract
It has been proposed that, when processing a stream of events, humans divide their experiences in terms of inferred latent causes (LCs) to support context-dependent learning. However, when shared structure is present across contexts, it is still unclear how the "splitting" of LCs and learning of shared structure can be simultaneously achieved. Here, we present the Latent Cause Network (LCNet), a neural network model of LC inference. Through learning, it naturally stores structure that is shared across tasks in the network weights. Additionally, it represents context-specific structure using a context module, controlled by a Bayesian nonparametric inference algorithm, which assigns a unique context vector for each inferred LC. Across three simulations, we found that LCNet could (1) extract shared structure across LCs in a function learning task while avoiding catastrophic interference, (2) capture human data on curriculum effects in schema learning, and (3) infer the underlying event structure when processing naturalistic videos of daily events. Overall, these results demonstrate a computationally feasible approach to reconciling shared structure and context-specific structure in a model of LCs that is scalable from laboratory experiment settings to naturalistic settings.
Collapse
Affiliation(s)
- Qihong Lu
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA.
| | - Tan T Nguyen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, USA
| | - Qiong Zhang
- Department of Psychology and Department of Computer Science, Rutgers University, New Brunswick, USA
| | - Uri Hasson
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA
| | - Thomas L Griffiths
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA
- Department of Computer Science, Princeton University, Princeton, USA
| | - Jeffrey M Zacks
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, USA
| | - Samuel J Gershman
- Department of Psychology and Center for Brain Science, Harvard University, Cambridge, USA
| | - Kenneth A Norman
- Department of Psychology and Princeton Neuroscience Institute, Princeton University, Princeton, USA
| |
Collapse
|
5
|
Gao Z, Xu K, Zhuang H, Liu L, Mao X, Ding B, Feng D, Wang H. Less confidence, less forgetting: Learning with a humbler teacher in exemplar-free Class-Incremental learning. Neural Netw 2024; 179:106513. [PMID: 39018945 DOI: 10.1016/j.neunet.2024.106513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 06/26/2024] [Accepted: 07/04/2024] [Indexed: 07/19/2024]
Abstract
Class-Incremental learning (CIL) is challenging due to catastrophic forgetting (CF), which escalates in exemplar-free scenarios. To mitigate CF, Knowledge Distillation (KD), which leverages old models as teacher models, has been widely employed in CIL. However, based on a case study, our investigation reveals that the teacher model exhibits over-confidence in unseen new samples. In this article, we conduct empirical experiments and provide theoretical analysis to investigate the over-confident phenomenon and the impact of KD in exemplar-free CIL, where access to old samples is unavailable. Building on our analysis, we propose a novel approach, Learning with Humbler Teacher, by systematically selecting an appropriate checkpoint model as a humbler teacher to mitigate CF. Furthermore, we explore utilizing the nuclear norm to obtain an appropriate temporal ensemble to enhance model stability. Notably, LwHT outperforms the state-of-the-art approach by a significant margin of 10.41%, 6.56%, and 4.31% in various settings while demonstrating superior model plasticity.
Collapse
Affiliation(s)
- Zijian Gao
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Kele Xu
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China.
| | - Huiping Zhuang
- South China University of Technology, Guangzhou 510000, China
| | - Li Liu
- National University of Defense Technology, Changsha 410000, China; University of Oulu, 02150 Oulu, Finland
| | - Xinjun Mao
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Bo Ding
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Dawei Feng
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| | - Huaimin Wang
- National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China
| |
Collapse
|
6
|
Thandiackal K, Piccinelli L, Gupta R, Pati P, Goksel O. Multi-Scale Feature Alignment for Continual Learning of Unlabeled Domains. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2599-2609. [PMID: 38381642 DOI: 10.1109/tmi.2024.3368365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Methods for unsupervised domain adaptation (UDA) help to improve the performance of deep neural networks on unseen domains without any labeled data. Especially in medical disciplines such as histopathology, this is crucial since large datasets with detailed annotations are scarce. While the majority of existing UDA methods focus on the adaptation from a labeled source to a single unlabeled target domain, many real-world applications with a long life cycle involve more than one target domain. Thus, the ability to sequentially adapt to multiple target domains becomes essential. In settings where the data from previously seen domains cannot be stored, e.g., due to data protection regulations, the above becomes a challenging continual learning problem. To this end, we propose to use generative feature-driven image replay in conjunction with a dual-purpose discriminator that not only enables the generation of images with realistic features for replay, but also promotes feature alignment during domain adaptation. We evaluate our approach extensively on a sequence of three histopathological datasets for tissue-type classification, achieving state-of-the-art results. We present detailed ablation experiments studying our proposed method components and demonstrate a possible use-case of our continual UDA method for an unsupervised patch-based segmentation task given high-resolution tissue images. Our code is available at: https://github.com/histocartography/multi-scale-feature-alignment.
Collapse
|
7
|
Wu Z, Weng Z, Peng W, Yang X, Li A, Davis LS, Jiang YG. Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4747-4762. [PMID: 38261478 DOI: 10.1109/tpami.2024.3357503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods.
Collapse
|
8
|
Blankemeier L, Cohen JP, Kumar A, Van Veen D, Gardezi SJS, Paschali M, Chen Z, Delbrouck JB, Reis E, Truyts C, Bluethgen C, Jensen MEK, Ostmeier S, Varma M, Valanarasu JMJ, Fang Z, Huo Z, Nabulsi Z, Ardila D, Weng WH, Amaro E, Ahuja N, Fries J, Shah NH, Johnston A, Boutin RD, Wentland A, Langlotz CP, Hom J, Gatidis S, Chaudhari AS. Merlin: A Vision Language Foundation Model for 3D Computed Tomography. RESEARCH SQUARE 2024:rs.3.rs-4546309. [PMID: 38978576 PMCID: PMC11230513 DOI: 10.21203/rs.3.rs-4546309/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current shortage of both general and specialized radiologists, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies while simultaneously using the images to extract novel physiological insights. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs) that utilize both the image and the corresponding textual radiology reports. However, current medical VLMs are generally limited to 2D images and short reports. To overcome these shortcomings for abdominal CT interpretation, we introduce Merlin - a 3D VLM that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining without requiring additional manual annotations. We train Merlin using a high-quality clinical dataset of paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens) for training. We comprehensively evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year chronic disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. This computationally efficient design can help democratize foundation model training, especially for health systems with compute constraints. We plan to release our trained models, code, and dataset, pending manual removal of all protected health information.
Collapse
Affiliation(s)
- Louis Blankemeier
- Department of Electrical Engineering, Stanford University
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Joseph Paul Cohen
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
| | - Ashwin Kumar
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Dave Van Veen
- Department of Electrical Engineering, Stanford University
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | | | - Magdalini Paschali
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Zhihong Chen
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Jean-Benoit Delbrouck
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Eduardo Reis
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Cesar Truyts
- Department of Radiology, Hospital Israelita Albert Einstein
| | - Christian Bluethgen
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, University Hospital Zurich
| | - Malte Engmann Kjeldskov Jensen
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Sophie Ostmeier
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Maya Varma
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
- Department of Computer Science, Stanford University
| | - Jeya Maria Jose Valanarasu
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
- Department of Computer Science, Stanford University
| | | | - Zepeng Huo
- Department of Biomedical Data Science, Stanford University
| | - Zaid Nabulsi
- Department of Electrical Engineering, Stanford University
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
- Department of Radiology, University of Wisconsin-Madison
- Department of Radiology, Hospital Israelita Albert Einstein
- Department of Radiology, University Hospital Zurich
- Department of Computer Science, Stanford University
- Department of Biomedical Data Science, Stanford University
- Department of Medicine, Stanford University
| | - Diego Ardila
- Department of Electrical Engineering, Stanford University
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
- Department of Radiology, University of Wisconsin-Madison
- Department of Radiology, Hospital Israelita Albert Einstein
- Department of Radiology, University Hospital Zurich
- Department of Computer Science, Stanford University
- Department of Biomedical Data Science, Stanford University
- Department of Medicine, Stanford University
| | - Wei-Hung Weng
- Department of Electrical Engineering, Stanford University
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
- Department of Radiology, University of Wisconsin-Madison
- Department of Radiology, Hospital Israelita Albert Einstein
- Department of Radiology, University Hospital Zurich
- Department of Computer Science, Stanford University
- Department of Biomedical Data Science, Stanford University
- Department of Medicine, Stanford University
| | - Edson Amaro
- Department of Radiology, Hospital Israelita Albert Einstein
| | | | - Jason Fries
- Department of Computer Science, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Nigam H Shah
- Department of Radiology, Stanford University
- Department of Biomedical Data Science, Stanford University
| | | | | | | | - Curtis P Langlotz
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
| | - Jason Hom
- Department of Medicine, Stanford University
| | | | - Akshay S Chaudhari
- Stanford Center for Artificial Intelligence in Medicine and Imaging, Stanford University
- Department of Radiology, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
9
|
Wei Q, Zhang W. Class-incremental learning with Balanced Embedding Discrimination Maximization. Neural Netw 2024; 179:106487. [PMID: 38986188 DOI: 10.1016/j.neunet.2024.106487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 04/20/2024] [Accepted: 06/20/2024] [Indexed: 07/12/2024]
Abstract
Class incremental learning is committed to solving representation learning and classification assignments while avoiding catastrophic forgetting in scenarios where categories are increasing. In this work, a unified method named Balanced Embedding Discrimination Maximization (BEDM) is developed to make the intermediate embedding more distinctive. Specifically, we utilize an orthogonality constraint based on doubly-blocked Toeplitz matrix to minimize the correlation of convolution kernels, and an algorithm for similarity visualization is introduced. Furthermore, uneven samples and distribution shift among old and new tasks eventuate strongly biased classifiers. To mitigate the imbalance, we propose an adaptive balance weighting in softmax to compensate insufficient categories dynamically. In addition, hybrid embedding learning is introduced to preserve knowledge from old models, which involves less hyper-parameters than conventional knowledge distillation. Our proposed method outperforms the existing approaches on three mainstream benchmark datasets. Moreover, we technically visualize that our method can produce a more uniform similarity histogram and more stable spectrum. Grad-CAM and t-SNE visualizations further confirm its effectiveness. Code is available at https://github.com/wqzh/BEDM.
Collapse
Affiliation(s)
- Qinglai Wei
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Systems Engineering, Macau University of Science and Technology, 999078, Macao Special Administrative Region of China.
| | - Weiqin Zhang
- State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
10
|
Ye S, Filippova A, Lauer J, Schneider S, Vidal M, Qiu T, Mathis A, Mathis MW. SuperAnimal pretrained pose estimation models for behavioral analysis. Nat Commun 2024; 15:5165. [PMID: 38906853 PMCID: PMC11192880 DOI: 10.1038/s41467-024-48792-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 04/26/2024] [Indexed: 06/23/2024] Open
Abstract
Quantification of behavior is critical in diverse applications from neuroscience, veterinary medicine to animal conservation. A common key step for behavioral analysis is first extracting relevant keypoints on animals, known as pose estimation. However, reliable inference of poses currently requires domain knowledge and manual labeling effort to build supervised models. We present SuperAnimal, a method to develop unified foundation models that can be used on over 45 species, without additional manual labels. These models show excellent performance across six pose estimation benchmarks. We demonstrate how to fine-tune the models (if needed) on differently labeled data and provide tooling for unsupervised video adaptation to boost performance and decrease jitter across frames. If fine-tuned, SuperAnimal models are 10-100× more data efficient than prior transfer-learning-based approaches. We illustrate the utility of our models in behavioral classification and kinematic analysis. Collectively, we present a data-efficient solution for animal pose estimation.
Collapse
Affiliation(s)
- Shaokai Ye
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Anastasiia Filippova
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Jessy Lauer
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Steffen Schneider
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Maxime Vidal
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Tian Qiu
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Alexander Mathis
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland
| | - Mackenzie Weygandt Mathis
- École Polytechnique Fédérale de Lausanne (EPFL), Brain Mind Institute & Neuro-X Institute, Geneva, Switzerland.
| |
Collapse
|
11
|
Maharjan J, Garikipati A, Singh NP, Cyrus L, Sharma M, Ciobanu M, Barnes G, Thapa R, Mao Q, Das R. OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Sci Rep 2024; 14:14156. [PMID: 38898116 PMCID: PMC11187169 DOI: 10.1038/s41598-024-64827-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024] Open
Abstract
LLMs can accomplish specialized medical knowledge tasks, however, equitable access is hindered by the extensive fine-tuning, specialized medical data requirement, and limited access to proprietary models. Open-source (OS) medical LLMs show performance improvements and provide the transparency and compliance required in healthcare. We present OpenMedLM, a prompting platform delivering state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks. We evaluated OS foundation LLMs (7B-70B) on medical benchmarks (MedQA, MedMCQA, PubMedQA, MMLU medical-subset) and selected Yi34B for developing OpenMedLM. Prompting strategies included zero-shot, few-shot, chain-of-thought, and ensemble/self-consistency voting. OpenMedLM delivered OS SOTA results on three medical LLM benchmarks, surpassing previous best-performing OS models that leveraged costly and extensive fine-tuning. OpenMedLM displays the first results to date demonstrating the ability of OS foundation models to optimize performance, absent specialized fine-tuning. The model achieved 72.6% accuracy on MedQA, outperforming the previous SOTA by 2.4%, and 81.7% accuracy on MMLU medical-subset, establishing itself as the first OS LLM to surpass 80% accuracy on this benchmark. Our results highlight medical-specific emergent properties in OS LLMs not documented elsewhere to date and validate the ability of OS models to accomplish healthcare tasks, highlighting the benefits of prompt engineering to improve performance of accessible LLMs for medical applications.
Collapse
Affiliation(s)
- Jenish Maharjan
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Anurag Garikipati
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Navan Preet Singh
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Leo Cyrus
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Mayank Sharma
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Madalina Ciobanu
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Gina Barnes
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Rahul Thapa
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| | - Qingqing Mao
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA.
| | - Ritankar Das
- Montera, Inc. Dba Forta, 548 Market St., PMB 89605, San Francisco, CA, 94104-5401, USA
| |
Collapse
|
12
|
Ayromlou S, Tsang T, Abolmaesumi P, Li X. CCSI: Continual Class-Specific Impression for data-free class incremental learning. Med Image Anal 2024; 97:103239. [PMID: 38936223 DOI: 10.1016/j.media.2024.103239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 06/02/2024] [Accepted: 06/06/2024] [Indexed: 06/29/2024]
Abstract
In real-world clinical settings, traditional deep learning-based classification methods struggle with diagnosing newly introduced disease types because they require samples from all disease classes for offline training. Class incremental learning offers a promising solution by adapting a deep network trained on specific disease classes to handle new diseases. However, catastrophic forgetting occurs, decreasing the performance of earlier classes when adapting the model to new data. Prior proposed methodologies to overcome this require perpetual storage of previous samples, posing potential practical concerns regarding privacy and storage regulations in healthcare. To this end, we propose a novel data-free class incremental learning framework that utilizes data synthesis on learned classes instead of data storage from previous classes. Our key contributions include acquiring synthetic data known as Continual Class-Specific Impression (CCSI) for previously inaccessible trained classes and presenting a methodology to effectively utilize this data for updating networks when introducing new classes. We obtain CCSI by employing data inversion over gradients of the trained classification model on previous classes starting from the mean image of each class inspired by common landmarks shared among medical images and utilizing continual normalization layers statistics as a regularizer in this pixel-wise optimization process. Subsequently, we update the network by combining the synthesized data with new class data and incorporate several losses, including an intra-domain contrastive loss to generalize the deep network trained on the synthesized data to real data, a margin loss to increase separation among previous classes and new ones, and a cosine-normalized cross-entropy loss to alleviate the adverse effects of imbalanced distributions in training data. Extensive experiments show that the proposed framework achieves state-of-the-art performance on four of the public MedMNIST datasets and in-house echocardiography cine series, with an improvement in classification accuracy of up to 51% compared to baseline data-free methods. Our code is available at https://github.com/ubc-tea/Continual-Impression-CCSI.
Collapse
Affiliation(s)
- Sana Ayromlou
- Electrical and Computer Engineering Department, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada; Vector Institute, Toronto, ON M5G 0C6, Canada.
| | - Teresa Tsang
- Vancouver General Hospital, Vancouver, BC V5Z 1M9, Canada.
| | - Purang Abolmaesumi
- Electrical and Computer Engineering Department, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
| | - Xiaoxiao Li
- Electrical and Computer Engineering Department, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada; Vector Institute, Toronto, ON M5G 0C6, Canada.
| |
Collapse
|
13
|
Malepathirana T, Senanayake D, Gautam V, Engel M, Balez R, Lovelace MD, Sundaram G, Heng B, Chow S, Marquis C, Guillemin GJ, Brew B, Jagadish C, Ooi L, Halgamuge S. Visualization of incrementally learned projection trajectories for longitudinal data. Sci Rep 2024; 14:13558. [PMID: 38866809 PMCID: PMC11169470 DOI: 10.1038/s41598-024-63511-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 05/29/2024] [Indexed: 06/14/2024] Open
Abstract
Longitudinal studies that continuously generate data enable the capture of temporal variations in experimentally observed parameters, facilitating the interpretation of results in a time-aware manner. We propose IL-VIS (incrementally learned visualizer), a new machine learning pipeline that incrementally learns and visualizes a progression trajectory representing the longitudinal changes in longitudinal studies. At each sampling time point in an experiment, IL-VIS generates a snapshot of the longitudinal process on the data observed thus far, a new feature that is beyond the reach of classical static models. We first verify the utility and correctness of IL-VIS using simulated data, for which the true progression trajectories are known. We find that it accurately captures and visualizes the trends and (dis)similarities between high-dimensional progression trajectories. We then apply IL-VIS to longitudinal multi-electrode array data from brain cortical organoids when exposed to different levels of quinolinic acid, a metabolite contributing to many neuroinflammatory diseases including Alzheimer's disease, and its blocking antibody. We uncover valuable insights into the organoids' electrophysiological maturation and response patterns over time under these conditions.
Collapse
Affiliation(s)
- Tamasha Malepathirana
- Department of Mechanical Engineering, University of Melbourne, Melbourne, 3010, VIC, Australia
| | - Damith Senanayake
- Department of Mechanical Engineering, University of Melbourne, Melbourne, 3010, VIC, Australia
| | - Vini Gautam
- School of Chemical and Biomedical Engineering, University of Melbourne, Melbourne, 3010, VIC, Australia
- Centre for Nano Science and Engineering, Indian Institute of Science, Bangalore, 560012, India
| | - Martin Engel
- Molecular Horizons and School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, 2522, NSW, Australia
| | - Rachelle Balez
- Molecular Horizons and School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, 2522, NSW, Australia
| | - Michael D Lovelace
- Applied Neurosciences Program, Peter Duncan Neurosciences Research Unit, St. Vincent's Centre for Applied Medical Research, 405 Liverpool St., Darlinghurst, Sydney, 2010, NSW, Australia
- School of Clinical Medicine, UNSW Medicine and Health, St. Vincent's Healthcare Clinical Campus, Faculty of Medicine and Health, UNSW Sydney, Sydney, 2010, NSW, Australia
| | | | - Benjamin Heng
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, 2109, NSW, Australia
| | - Sharron Chow
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, 2109, NSW, Australia
| | - Christopher Marquis
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, 2052, NSW, Australia
| | - Gilles J Guillemin
- Applied Neurosciences Program, Peter Duncan Neurosciences Research Unit, St. Vincent's Centre for Applied Medical Research, 405 Liverpool St., Darlinghurst, Sydney, 2010, NSW, Australia
- IPB University, Bogor, Indonesia
| | - Bruce Brew
- Applied Neurosciences Program, Peter Duncan Neurosciences Research Unit, St. Vincent's Centre for Applied Medical Research, 405 Liverpool St., Darlinghurst, Sydney, 2010, NSW, Australia
- School of Clinical Medicine, UNSW Medicine and Health, St. Vincent's Healthcare Clinical Campus, Faculty of Medicine and Health, UNSW Sydney, Sydney, 2010, NSW, Australia
- Departments of Neurology and Immunology, St. Vincent's Hospital, Sydney, 2010, NSW, Australia
| | - Chennupati Jagadish
- Research School of Physics, Australian National University, Canberra, 2601, ACT, Australia
| | - Lezanne Ooi
- Molecular Horizons and School of Chemistry and Molecular Bioscience, University of Wollongong, Wollongong, 2522, NSW, Australia.
| | - Saman Halgamuge
- Department of Mechanical Engineering, University of Melbourne, Melbourne, 3010, VIC, Australia.
| |
Collapse
|
14
|
Kong LW, Brewer GA, Lai YC. Reservoir-computing based associative memory and itinerancy for complex dynamical attractors. Nat Commun 2024; 15:4840. [PMID: 38844437 PMCID: PMC11156990 DOI: 10.1038/s41467-024-49190-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 05/24/2024] [Indexed: 06/09/2024] Open
Abstract
Traditional neural network models of associative memories were used to store and retrieve static patterns. We develop reservoir-computing based memories for complex dynamical attractors, under two common recalling scenarios in neuropsychology: location-addressable with an index channel and content-addressable without such a channel. We demonstrate that, for location-addressable retrieval, a single reservoir computing machine can memorize a large number of periodic and chaotic attractors, each retrievable with a specific index value. We articulate control strategies to achieve successful switching among the attractors, unveil the mechanism behind failed switching, and uncover various scaling behaviors between the number of stored attractors and the reservoir network size. For content-addressable retrieval, we exploit multistability with cue signals, where the stored attractors coexist in the high-dimensional phase space of the reservoir network. As the length of the cue signal increases through a critical value, a high success rate can be achieved. The work provides foundational insights into developing long-term memories and itinerancy for complex dynamical patterns.
Collapse
Affiliation(s)
- Ling-Wei Kong
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, Arizona, USA
| | - Gene A Brewer
- Department of Psychology, Arizona State University, Tempe, Arizona, USA
| | - Ying-Cheng Lai
- School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, Arizona, USA.
- Department of Physics, Arizona State University, Tempe, Arizona, USA.
| |
Collapse
|
15
|
Zhang L, Abdeen N, Lang J. A novel center-based deep contrastive metric learning method for the detection of polymicrogyria in pediatric brain MRI. Comput Med Imaging Graph 2024; 114:102373. [PMID: 38522222 DOI: 10.1016/j.compmedimag.2024.102373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 03/03/2024] [Accepted: 03/18/2024] [Indexed: 03/26/2024]
Abstract
Polymicrogyria (PMG) is a disorder of cortical organization mainly seen in children, which can be associated with seizures, developmental delay and motor weakness. PMG is typically diagnosed on magnetic resonance imaging (MRI) but some cases can be challenging to detect even for experienced radiologists. In this study, we create an open pediatric MRI dataset (PPMR) containing both PMG and control cases from the Children's Hospital of Eastern Ontario (CHEO), Ottawa, Canada. The differences between PMG and control MRIs are subtle and the true distribution of the features of the disease is unknown. This makes automatic detection of potential PMG cases in MRI difficult. To enable the automatic detection of potential PMG cases, we propose an anomaly detection method based on a novel center-based deep contrastive metric learning loss function (cDCM). Despite working with a small and imbalanced dataset our method achieves 88.07% recall at 71.86% precision. This will facilitate a computer-aided tool for radiologists to select potential PMG MRIs. To the best of our knowledge, our research is the first to apply machine learning techniques to identify PMG solely from MRI. Our code is available at: https://github.com/RichardChangCA/Deep-Contrastive-Metric-Learning-Method-to-Detect-Polymicrogyria-in-Pediatric-Brain-MRI. Our pediatric MRI dataset is available at: https://www.kaggle.com/datasets/lingfengzhang/pediatric-polymicrogyria-mri-dataset.
Collapse
Affiliation(s)
- Lingfeng Zhang
- School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, K1N 6N5, Canada.
| | - Nishard Abdeen
- Department of Radiology, University of Ottawa, Ottawa, K1N 6N5, Canada; Department of Medical Imaging, Children's Hospital of Eastern Ontario, Ottawa, K1H 8L1, Canada.
| | - Jochen Lang
- School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, K1N 6N5, Canada.
| |
Collapse
|
16
|
Fang Y, Yap PT, Lin W, Zhu H, Liu M. Source-free unsupervised domain adaptation: A survey. Neural Netw 2024; 174:106230. [PMID: 38490115 PMCID: PMC11015964 DOI: 10.1016/j.neunet.2024.106230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/14/2024] [Accepted: 03/07/2024] [Indexed: 03/17/2024]
Abstract
Unsupervised domain adaptation (UDA) via deep learning has attracted appealing attention for tackling domain-shift problems caused by distribution discrepancy across different domains. Existing UDA approaches highly depend on the accessibility of source domain data, which is usually limited in practical scenarios due to privacy protection, data storage and transmission cost, and computation burden. To tackle this issue, many source-free unsupervised domain adaptation (SFUDA) methods have been proposed recently, which perform knowledge transfer from a pre-trained source model to the unlabeled target domain with source data inaccessible. A comprehensive review of these works on SFUDA is of great significance. In this paper, we provide a timely and systematic literature review of existing SFUDA approaches from a technical perspective. Specifically, we categorize current SFUDA studies into two groups, i.e., white-box SFUDA and black-box SFUDA, and further divide them into finer subcategories based on different learning strategies they use. We also investigate the challenges of methods in each subcategory, discuss the advantages/disadvantages of white-box and black-box SFUDA methods, conclude the commonly used benchmark datasets, and summarize the popular techniques for improved generalizability of models learned without using source data. We finally discuss several promising future directions in this field.
Collapse
Affiliation(s)
- Yuqi Fang
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Pew-Thian Yap
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Weili Lin
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Hongtu Zhu
- Department of Biostatistics and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
| |
Collapse
|
17
|
Proca AM, Rosas FE, Luppi AI, Bor D, Crosby M, Mediano PAM. Synergistic information supports modality integration and flexible learning in neural networks solving multiple tasks. PLoS Comput Biol 2024; 20:e1012178. [PMID: 38829900 PMCID: PMC11175422 DOI: 10.1371/journal.pcbi.1012178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 06/13/2024] [Accepted: 05/18/2024] [Indexed: 06/05/2024] Open
Abstract
Striking progress has been made in understanding cognition by analyzing how the brain is engaged in different modes of information processing. For instance, so-called synergistic information (information encoded by a set of neurons but not by any subset) plays a key role in areas of the human brain linked with complex cognition. However, two questions remain unanswered: (a) how and why a cognitive system can become highly synergistic; and (b) how informational states map onto artificial neural networks in various learning modes. Here we employ an information-decomposition framework to investigate neural networks performing cognitive tasks. Our results show that synergy increases as networks learn multiple diverse tasks, and that in tasks requiring integration of multiple sources, performance critically relies on synergistic neurons. Overall, our results suggest that synergy is used to combine information from multiple modalities-and more generally for flexible and efficient learning. These findings reveal new ways of investigating how and why learning systems employ specific information-processing strategies, and support the principle that the capacity for general-purpose learning critically relies on the system's information dynamics.
Collapse
Affiliation(s)
- Alexandra M. Proca
- Department of Computing, Imperial College London, London, United Kingdom
| | - Fernando E. Rosas
- Department of Informatics, University of Sussex, Brighton, United Kingdom
- Sussex Centre for Consciousness Science and Sussex AI, University of Sussex, Brighton, United Kingdom
- Centre for Psychedelic Research and Centre for Complexity Science, Department of Brain Sciences, Imperial College London, London, United Kingdom
- Centre for Eudaimonia and Human Flourishing, University of Oxford, Oxford, United Kingdom
| | - Andrea I. Luppi
- Department of Clinical Neurosciences and Division of Anaesthesia, University of Cambridge, Cambridge, United Kingdom
- Leverhulme Centre for the Future of Intelligence, University of Cambridge, Cambridge, United Kingdom
- Montreal Neurological Institute, McGill University, Montreal, Canada
| | - Daniel Bor
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
- Department of Psychology, Queen Mary University of London, London, United Kingdom
| | - Matthew Crosby
- Department of Computing, Imperial College London, London, United Kingdom
| | - Pedro A. M. Mediano
- Department of Computing, Imperial College London, London, United Kingdom
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
18
|
Meseguer P, Del Amor R, Naranjo V. MICIL: Multiple-Instance Class-Incremental Learning for skin cancer whole slide images. Artif Intell Med 2024; 152:102870. [PMID: 38663270 DOI: 10.1016/j.artmed.2024.102870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 05/15/2024]
Abstract
Artificial intelligence (AI) agents encounter the problem of catastrophic forgetting when they are trained in sequentially with new data batches. This issue poses a barrier to the implementation of AI-based models in tasks that involve ongoing evolution, such as cancer prediction. Moreover, whole slide images (WSI) play a crucial role in cancer management, and their automated analysis has become increasingly popular in assisting pathologists during the diagnosis process. Incremental learning (IL) techniques aim to develop algorithms capable of retaining previously acquired information while also acquiring new insights to predict future data. Deep IL techniques need to address the challenges posed by the gigapixel scale of WSIs, which often necessitates the use of multiple instance learning (MIL) frameworks. In this paper, we introduce an IL algorithm tailored for analyzing WSIs within a MIL paradigm. The proposed Multiple Instance Class-Incremental Learning (MICIL) algorithm combines MIL with class-IL for the first time, allowing for the incremental prediction of multiple skin cancer subtypes from WSIs within a class-IL scenario. Our framework incorporates knowledge distillation and data rehearsal, along with a novel embedding-level distillation, aiming to preserve the latent space at the aggregated WSI level. Results demonstrate the algorithm's effectiveness in addressing the challenge of balancing IL-specific metrics, such as intransigence and forgetting, and solving the plasticity-stability dilemma.
Collapse
Affiliation(s)
- Pablo Meseguer
- Instituto Universitario de Investigación e Innovación en Tecnología Centarada en el Ser Humano, HUMAN-tech, Universitat Politècnica de València, Valencia, Spain; valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence, Valencia, Spain.
| | - Rocío Del Amor
- Instituto Universitario de Investigación e Innovación en Tecnología Centarada en el Ser Humano, HUMAN-tech, Universitat Politècnica de València, Valencia, Spain
| | - Valery Naranjo
- Instituto Universitario de Investigación e Innovación en Tecnología Centarada en el Ser Humano, HUMAN-tech, Universitat Politècnica de València, Valencia, Spain; valgrAI - Valencian Graduate School and Research Network of Artificial Intelligence, Valencia, Spain
| |
Collapse
|
19
|
Barai P, Leroy G, Bisht P, Rothman JM, Lee S, Andrews J, Rice SA, Ahmed A. Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:75-84. [PMID: 38827063 PMCID: PMC11141838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.
Collapse
Affiliation(s)
| | - Gondy Leroy
- The University of Arizona, Tucson 85721, U.S.A
| | | | | | - Sumi Lee
- The University of Arizona, Tucson 85721, U.S.A
| | | | | | - Arif Ahmed
- The University of Arizona, Tucson 85721, U.S.A
| |
Collapse
|
20
|
Lu J, Sun S. PAMK: Prototype Augmented Multi-Teacher Knowledge Transfer Network for Continual Zero-Shot Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3353-3368. [PMID: 38787667 DOI: 10.1109/tip.2024.3403053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Continual zero-shot learning (CZSL) aims to develop a model that accumulates historical knowledge to recognize unseen tasks, while eliminating catastrophic forgetting for seen tasks when learning new tasks. However, existing CZSL methods, while mitigating catastrophic forgetting for old tasks, often lead to negative transfer problem for new tasks by over-focusing on accumulating old knowledge and neglecting the plasticity of the model for learning new tasks. To tackle these problems, we propose PAMK, a prototype augmented multi-teacher knowledge transfer network that strikes a trade-off between recognition stability for old tasks and generalization plasticity for new tasks. PAMK consists of a prototype augmented contrastive generation (PACG) module and a multi-teacher knowledge transfer (MKT) module. To reduce the cumulative semantic decay of the class representation embedding and mitigate catastrophic forgetting, we propose a continual prototype augmentation strategy based on relevance scores in PACG. Furthermore, by introducing the prototype augmented semantic-visual contrastive loss, PACG promotes intra-class compactness for all classes across all tasks. MKT effectively accumulates semantic knowledge learned from old tasks to recognize new tasks via the proposed multi-teacher knowledge transfer, eliminating the negative transfer problem. Extensive experiments on various CZSL settings demonstrate the superior performance of PAMK compared to state-of-the-art methods. In particular, in the practical task-free CZSL setting, PAMK achieves impressive gains of 3.28%, 3.09% and 3.71% in mean harmonic accuracy on the CUB, AWA1, and AWA2 datasets, respectively.
Collapse
|
21
|
Du J, Li W, Liu P, Vong CM, You Y, Lei B, Wang T. Federated learning using model projection for multi-center disease diagnosis with non-IID data. Neural Netw 2024; 178:106409. [PMID: 38823069 DOI: 10.1016/j.neunet.2024.106409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 04/28/2024] [Accepted: 05/23/2024] [Indexed: 06/03/2024]
Abstract
Multi-center disease diagnosis aims to build a global model for all involved medical centers. Due to privacy concerns, it is infeasible to collect data from multiple centers for training (i.e., centralized learning). Federated Learning (FL) is a decentralized framework that enables multiple clients (e.g., medical centers) to collaboratively train a global model while retaining patient data locally for privacy. However, in practice, the data across medical centers are not independently and identically distributed (Non-IID), causing two challenging issues: (1) catastrophic forgetting at clients, i.e., the local model at clients will forget the knowledge received from the global model after local training, causing reduced performance; and (2) invalid aggregation at the server, i.e., the global model at the server may not be favorable to some clients after model aggregation, resulting in a slow convergence rate. To mitigate these issues, an innovative Federated learning using Model Projection (FedMoP) is proposed, which guarantees: (1) the loss of local model on global data does not increase after local training without accessing the global data so that the performance will not be degenerated; and (2) the loss of global model on local data does not increase after aggregation without accessing local data so that convergence rate can be improved. Extensive experimental results show that our FedMoP outperforms state-of-the-art FL methods in terms of accuracy, convergence rate and communication cost. In particular, our FedMoP also achieves comparable or even higher accuracy than centralized learning. Thus, our FedMoP can ensure privacy protection while outperforming centralized learning in accuracy and communication cost.
Collapse
Affiliation(s)
- Jie Du
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen 518060, Guangdong, China.
| | - Wei Li
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen 518060, Guangdong, China.
| | - Peng Liu
- Artificial Intelligence Industrial Innovation Research Center, Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110, China.
| | - Chi-Man Vong
- Department of Computer and Information Science, University of Macau, Macau SAR, 999078, China.
| | - Yongke You
- Department of nephrology, Shenzhen University General Hospital, Shenzhen University, Shenzhen 518060, Guangdong, China.
| | - Baiying Lei
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen 518060, Guangdong, China.
| | - Tianfu Wang
- National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen 518060, Guangdong, China.
| |
Collapse
|
22
|
Chen X, Guo W, Lin C, Jiang N, Su J. Cross-Subject Lifelong Learning for Continuous Estimation From Surface Electromyographic Signal. IEEE Trans Neural Syst Rehabil Eng 2024; 32:1965-1973. [PMID: 38739518 DOI: 10.1109/tnsre.2024.3400535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The employment of surface electromyographic (sEMG) signals in the estimation of hand kinematics represents a promising non-invasive methodology for the advancement of human-machine interfaces. However, the limitations of existing subject-specific methods are obvious as they confine the application to individual models that are custom-tailored for specific subjects, thereby reducing the potential for broader applicability. In addition, current cross-subject methods are challenged in their ability to simultaneously cater to the needs of both new and existing users effectively. To overcome these challenges, we propose the Cross-Subject Lifelong Network (CSLN). CSLN incorporates a novel lifelong learning approach, maintaining the patterns of sEMG signals across a varied user population and across different temporal scales. Our method enhances the generalization of acquired patterns, making it applicable to various individuals and temporal contexts. Our experimental investigations, encompassing both joint and sequential training approaches, demonstrate that the CSLN model not only attains enhanced performance in cross-subject scenarios but also effectively addresses the issue of catastrophic forgetting, thereby augmenting training efficacy.
Collapse
|
23
|
Togo T, Togo R, Maeda K, Ogawa T, Haseyama M. Analysis of Continual Learning Techniques for Image Generative Models with Learned Class Information Management. SENSORS (BASEL, SWITZERLAND) 2024; 24:3087. [PMID: 38793943 PMCID: PMC11125277 DOI: 10.3390/s24103087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/06/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024]
Abstract
The advancements in deep learning have significantly enhanced the capability of image generation models to produce images aligned with human intentions. However, training and adapting these models to new data and tasks remain challenging because of their complexity and the risk of catastrophic forgetting. This study proposes a method for addressing these challenges involving the application of class-replacement techniques within a continual learning framework. This method utilizes selective amnesia (SA) to efficiently replace existing classes with new ones while retaining crucial information. This approach improves the model's adaptability to evolving data environments while preventing the loss of past information. We conducted a detailed evaluation of class-replacement techniques, examining their impact on the "class incremental learning" performance of models and exploring their applicability in various scenarios. The experimental results demonstrated that our proposed method could enhance the learning efficiency and long-term performance of image generation models. This study broadens the application scope of image generation technology and supports the continual improvement and adaptability of corresponding models.
Collapse
Affiliation(s)
- Taro Togo
- Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan;
| | - Ren Togo
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan; (R.T.); (T.O.)
| | - Keisuke Maeda
- Data-Driven Interdisciplinary Research Emergence Department, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan;
| | - Takahiro Ogawa
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan; (R.T.); (T.O.)
| | - Miki Haseyama
- Faculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, Japan; (R.T.); (T.O.)
| |
Collapse
|
24
|
Alonso N, Krichmar JL. A sparse quantized hopfield network for online-continual memory. Nat Commun 2024; 15:3722. [PMID: 38697981 PMCID: PMC11065890 DOI: 10.1038/s41467-024-46976-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 03/13/2024] [Indexed: 05/05/2024] Open
Abstract
An important difference between brains and deep neural networks is the way they learn. Nervous systems learn online where a stream of noisy data points are presented in a non-independent, identically distributed way. Further, synaptic plasticity in the brain depends only on information local to synapses. Deep networks, on the other hand, typically use non-local learning algorithms and are trained in an offline, non-noisy, independent, identically distributed setting. Understanding how neural networks learn under the same constraints as the brain is an open problem for neuroscience and neuromorphic computing. A standard approach to this problem has yet to be established. In this paper, we propose that discrete graphical models that learn via an online maximum a posteriori learning algorithm could provide such an approach. We implement this kind of model in a neural network called the Sparse Quantized Hopfield Network. We show our model outperforms state-of-the-art neural networks on associative memory tasks, outperforms these networks in online, continual settings, learns efficiently with noisy inputs, and is better than baselines on an episodic memory task.
Collapse
Affiliation(s)
- Nicholas Alonso
- Department of Cognitive Science, University of California, Irvine, CA, USA.
| | - Jeffrey L Krichmar
- Department of Cognitive Science, University of California, Irvine, CA, USA
- Department Computer Science, University of California, Irvine, CA, USA
| |
Collapse
|
25
|
Zhang X, Dong S, Chen J, Tian Q, Gong Y, Hong X. Deep Class-Incremental Learning From Decentralized Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7190-7203. [PMID: 36315536 DOI: 10.1109/tnnls.2022.3214573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this article, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories. We initiate the study of data-decentralized class-incremental learning (DCIL) by making the following contributions. First, we formulate the DCIL problem and develop the experimental protocol. Second, we introduce a paradigm to create a basic decentralized counterpart of typical (centralized) CIL approaches, and as a result, establish a benchmark for the DCIL study. Third, we further propose a decentralized composite knowledge incremental distillation (DCID) framework to transfer knowledge from historical models and multiple local sites to the general model continually. DCID consists of three main components, namely, local CIL, collaborated knowledge distillation (KD) among local models, and aggregated KD from local models to the general one. We comprehensively investigate our DCID framework by using a different implementation of the three components. Extensive experimental results demonstrate the effectiveness of our DCID framework. The source code of the baseline methods and the proposed DCIL is available at https://github.com/Vision-Intelligence-and-Robots-Group/DCIL.
Collapse
|
26
|
Chanra V, Chudzinska A, Braniewska N, Silski B, Holst B, Sauvigny T, Stodieck S, Pelzl S, House PM. Development and prospective clinical validation of a convolutional neural network for automated detection and segmentation of focal cortical dysplasias. Epilepsy Res 2024; 202:107357. [PMID: 38582073 DOI: 10.1016/j.eplepsyres.2024.107357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 02/28/2024] [Accepted: 04/01/2024] [Indexed: 04/08/2024]
Abstract
PURPOSE Focal cortical dysplasias (FCDs) are a leading cause of drug-resistant epilepsy. Early detection and resection of FCDs have favorable prognostic implications for postoperative seizure freedom. Despite advancements in imaging methods, FCD detection remains challenging. House et al. (2021) introduced a convolutional neural network (CNN) for automated FCD detection and segmentation, achieving a sensitivity of 77.8%. However, its clinical applicability was limited due to a low specificity of 5.5%. The objective of this study was to improve the CNN's performance through data-driven training and algorithm optimization, followed by a prospective validation on daily-routine MRIs. MATERIAL AND METHODS A dataset of 300 3 T MRIs from daily clinical practice, including 3D T1 and FLAIR sequences, was prospectively compiled. The MRIs were visually evaluated by two neuroradiologists and underwent morphometric assessment by two epileptologists. The dataset included 30 FCD cases (11 female, mean age: 28.1 ± 10.1 years) and a control group of 150 normal cases (97 female, mean age: 32.8 ± 14.9 years), along with 120 non-FCD pathological cases (64 female, mean age: 38.4 ± 18.4 years). The dataset was divided into three subsets, each analyzed by the CNN. Subsequently, the CNN underwent a two-phase-training process, incorporating subset MRIs and expert-labeled FCD maps. This training employed both classical and continual learning techniques. The CNN's performance was validated by comparing the baseline model with the trained models at two training levels. RESULTS In prospective validation, the best model trained using continual learning achieved a sensitivity of 90.0%, specificity of 70.0%, and accuracy of 72.0%, with an average of 0.41 false positive clusters detected per MRI. For FCD segmentation, an average Dice coefficient of 0.56 was attained. The model's performance improved in each training phase while maintaining a high level of sensitivity. Continual learning outperformed classical learning in this regard. CONCLUSIONS Our study presents a promising CNN for FCD detection and segmentation, exhibiting both high sensitivity and specificity. Furthermore, the model demonstrates continuous improvement with the inclusion of more clinical MRI data. We consider our CNN a valuable tool for automated, examiner-independent FCD detection in daily clinical practice, potentially addressing the underutilization of epilepsy surgery in drug-resistant focal epilepsy and thereby improving patient outcomes.
Collapse
Affiliation(s)
- Vicky Chanra
- Hamburg Epilepsy Center, Protestant Hospital Alsterdorf, Department of Neurology and Epileptology, Hamburg, Germany
| | | | | | | | - Brigitte Holst
- University Hospital Hamburg-Eppendorf, Department of Neuroradiology, Hamburg, Germany
| | - Thomas Sauvigny
- University Hospital Hamburg-Eppendorf, Department of Neurosurgery, Hamburg, Germany
| | - Stefan Stodieck
- Hamburg Epilepsy Center, Protestant Hospital Alsterdorf, Department of Neurology and Epileptology, Hamburg, Germany
| | | | - Patrick M House
- Hamburg Epilepsy Center, Protestant Hospital Alsterdorf, Department of Neurology and Epileptology, Hamburg, Germany; theBlue.ai GmbH, Hamburg, Germany; Epileptologicum Hamburg, Specialist's Practice for Epileptology, Hamburg, Germany.
| |
Collapse
|
27
|
Rong Y, Chen Q, Fu Y, Yang X, Al-Hallaq HA, Wu QJ, Yuan L, Xiao Y, Cai B, Latifi K, Benedict SH, Buchsbaum JC, Qi XS. NRG Oncology Assessment of Artificial Intelligence Deep Learning-Based Auto-segmentation for Radiation Therapy: Current Developments, Clinical Considerations, and Future Directions. Int J Radiat Oncol Biol Phys 2024; 119:261-280. [PMID: 37972715 PMCID: PMC11023777 DOI: 10.1016/j.ijrobp.2023.10.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 09/16/2023] [Accepted: 10/14/2023] [Indexed: 11/19/2023]
Abstract
Deep learning neural networks (DLNN) in Artificial intelligence (AI) have been extensively explored for automatic segmentation in radiotherapy (RT). In contrast to traditional model-based methods, data-driven AI-based models for auto-segmentation have shown high accuracy in early studies in research settings and controlled environment (single institution). Vendor-provided commercial AI models are made available as part of the integrated treatment planning system (TPS) or as a stand-alone tool that provides streamlined workflow interacting with the main TPS. These commercial tools have drawn clinics' attention thanks to their significant benefit in reducing the workload from manual contouring and shortening the duration of treatment planning. However, challenges occur when applying these commercial AI-based segmentation models to diverse clinical scenarios, particularly in uncontrolled environments. Contouring nomenclature and guideline standardization has been the main task undertaken by the NRG Oncology. AI auto-segmentation holds the potential clinical trial participants to reduce interobserver variations, nomenclature non-compliance, and contouring guideline deviations. Meanwhile, trial reviewers could use AI tools to verify contour accuracy and compliance of those submitted datasets. In recognizing the growing clinical utilization and potential of these commercial AI auto-segmentation tools, NRG Oncology has formed a working group to evaluate the clinical utilization and potential of commercial AI auto-segmentation tools. The group will assess in-house and commercially available AI models, evaluation metrics, clinical challenges, and limitations, as well as future developments in addressing these challenges. General recommendations are made in terms of the implementation of these commercial AI models, as well as precautions in recognizing the challenges and limitations.
Collapse
Affiliation(s)
- Yi Rong
- Mayo Clinic Arizona, Phoenix, AZ
| | - Quan Chen
- City of Hope Comprehensive Cancer Center Duarte, CA
| | - Yabo Fu
- Memorial Sloan Kettering Cancer Center, Commack, NY
| | | | | | | | - Lulin Yuan
- Virginia Commonwealth University, Richmond, VA
| | - Ying Xiao
- University of Pennsylvania/Abramson Cancer Center, Philadelphia, PA
| | - Bin Cai
- The University of Texas Southwestern Medical Center, Dallas, TX
| | | | - Stanley H Benedict
- University of California Davis Comprehensive Cancer Center, Sacramento, CA
| | | | - X Sharon Qi
- University of California Los Angeles, Los Angeles, CA
| |
Collapse
|
28
|
Sun G, Ji B, Liang L, Chen M. CeCR: Cross-entropy contrastive replay for online class-incremental continual learning. Neural Netw 2024; 173:106163. [PMID: 38430638 DOI: 10.1016/j.neunet.2024.106163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 12/07/2023] [Accepted: 02/02/2024] [Indexed: 03/05/2024]
Abstract
Aiming at the realization of learning continually from an online data stream, replay-based methods have shown superior potential. The main challenge of replay-based methods is the selection of representative samples which are stored in the buffer and replayed. In this paper, we propose the Cross-entropy Contrastive Replay (CeCR) method in the online class-incremental setting. First, we present the Class-focused Memory Retrieval method that proceeds the class-level sampling without replacement. Second, we put forward the class-mean approximation memory update method that selectively replaces the mistakenly classified training samples with samples of current input batch. In addition, the Cross-entropy Contrastive Loss is proposed to implement the model training with obtaining more solid knowledge to achieve effective learning. Experiments show that the CeCR method has comparable or improved performance in two benchmark datasets in comparison with the state-of-the-art methods.
Collapse
Affiliation(s)
- Guanglu Sun
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China.
| | - Baolun Ji
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China.
| | - Lili Liang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China.
| | - Minghui Chen
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China.
| |
Collapse
|
29
|
Zhu Z, Ma X, Wang W, Dong S, Wang K, Wu L, Luo G, Wang G, Li S. Boosting knowledge diversity, accuracy, and stability via tri-enhanced distillation for domain continual medical image segmentation. Med Image Anal 2024; 94:103112. [PMID: 38401270 DOI: 10.1016/j.media.2024.103112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 01/10/2024] [Accepted: 02/20/2024] [Indexed: 02/26/2024]
Abstract
Domain continual medical image segmentation plays a crucial role in clinical settings. This approach enables segmentation models to continually learn from a sequential data stream across multiple domains. However, it faces the challenge of catastrophic forgetting. Existing methods based on knowledge distillation show potential to address this challenge via a three-stage process: distillation, transfer, and fusion. Yet, each stage presents its unique issues that, collectively, amplify the problem of catastrophic forgetting. To address these issues at each stage, we propose a tri-enhanced distillation framework. (1) Stochastic Knowledge Augmentation reduces redundancy in knowledge, thereby increasing both the diversity and volume of knowledge derived from the old network. (2) Adaptive Knowledge Transfer selectively captures critical information from the old knowledge, facilitating a more accurate knowledge transfer. (3) Global Uncertainty-Guided Fusion introduces a global uncertainty view of the dataset to fuse the old and new knowledge with reduced bias, promoting a more stable knowledge fusion. Our experimental results not only validate the feasibility of our approach, but also demonstrate its superior performance compared to state-of-the-art methods. We suggest that our innovative tri-enhanced distillation framework may establish a robust benchmark for domain continual medical image segmentation.
Collapse
Affiliation(s)
- Zhanshi Zhu
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Xinghua Ma
- Faculty of Computing, Harbin Institute of Technology, Harbin, China
| | - Wei Wang
- Faculty of Computing, Harbin Institute of Technology, Shenzhen, China.
| | - Suyu Dong
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, China
| | - Kuanquan Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| | - Lianming Wu
- Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Gongning Luo
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
30
|
Gong L, Pasqualetti F, Papouin T, Ching S. Astrocytes as a mechanism for contextually-guided network dynamics and function. PLoS Comput Biol 2024; 20:e1012186. [PMID: 38820533 PMCID: PMC11168681 DOI: 10.1371/journal.pcbi.1012186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 06/12/2024] [Accepted: 05/21/2024] [Indexed: 06/02/2024] Open
Abstract
Astrocytes are a ubiquitous and enigmatic type of non-neuronal cell and are found in the brain of all vertebrates. While traditionally viewed as being supportive of neurons, it is increasingly recognized that astrocytes play a more direct and active role in brain function and neural computation. On account of their sensitivity to a host of physiological covariates and ability to modulate neuronal activity and connectivity on slower time scales, astrocytes may be particularly well poised to modulate the dynamics of neural circuits in functionally salient ways. In the current paper, we seek to capture these features via actionable abstractions within computational models of neuron-astrocyte interaction. Specifically, we engage how nested feedback loops of neuron-astrocyte interaction, acting over separated time-scales, may endow astrocytes with the capability to enable learning in context-dependent settings, where fluctuations in task parameters may occur much more slowly than within-task requirements. We pose a general model of neuron-synapse-astrocyte interaction and use formal analysis to characterize how astrocytic modulation may constitute a form of meta-plasticity, altering the ways in which synapses and neurons adapt as a function of time. We then embed this model in a bandit-based reinforcement learning task environment, and show how the presence of time-scale separated astrocytic modulation enables learning over multiple fluctuating contexts. Indeed, these networks learn far more reliably compared to dynamically homogeneous networks and conventional non-network-based bandit algorithms. Our results fuel the notion that neuron-astrocyte interactions in the brain benefit learning over different time-scales and the conveyance of task-relevant contextual information onto circuit dynamics.
Collapse
Affiliation(s)
- Lulu Gong
- Department of Electrical and Systems Engineering, Washington University, St. Louis, Missouri, United States of America
| | - Fabio Pasqualetti
- Department of Mechanical Engineering, University of California, Riverside, California, United States of America
| | - Thomas Papouin
- Department of Neuroscience, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - ShiNung Ching
- Department of Electrical and Systems Engineering, Washington University, St. Louis, Missouri, United States of America
| |
Collapse
|
31
|
Sparrow R, Hatherley J, Oakley J, Bain C. Should the Use of Adaptive Machine Learning Systems in Medicine be Classified as Research? THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2024:1-12. [PMID: 38662360 DOI: 10.1080/15265161.2024.2337429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
A novel advantage of the use of machine learning (ML) systems in medicine is their potential to continue learning from new data after implementation in clinical practice. To date, considerations of the ethical questions raised by the design and use of adaptive machine learning systems in medicine have, for the most part, been confined to discussion of the so-called "update problem," which concerns how regulators should approach systems whose performance and parameters continue to change even after they have received regulatory approval. In this paper, we draw attention to a prior ethical question: whether the continuous learning that will occur in such systems after their initial deployment should be classified, and regulated, as medical research? We argue that there is a strong prima facie case that the use of continuous learning in medical ML systems should be categorized, and regulated, as research and that individuals whose treatment involves such systems should be treated as research subjects.
Collapse
|
32
|
Duan Z, Hossain AF, He J, Zhu F. Balancing the Encoder and Decoder Complexity in Image Compression for Classification. RESEARCH SQUARE 2024:rs.3.rs-4002168. [PMID: 38746384 PMCID: PMC11092870 DOI: 10.21203/rs.3.rs-4002168/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
This paper presents a study on the computational complexity of coding for machines, with a focus on image coding for classification. We first conduct a comprehensive set of experiments to analyze the size of the encoder (which encodes images to bitstreams), the size of the decoder (which decodes bitstreams and predicts class labels), and their impact on the rate-accuracy trade-off in compression for classification. Through empirical investigation, we demonstrate a complementary relationship between the encoder size and the decoder size, i.e., it is better to employ a large encoder with a small decoder and vice versa. Motivated by this relationship, we introduce a feature compression-based method for efficient image compression for classification. By compressing features at various layers of a neural network-based image classification model, our method achieves adjustable rate, accuracy, and encoder (or decoder) size using a single model. Experimental results on ImageNet classification show that our method achieves competitive results with existing methods while being much more flexible. The code will be made publicly available.
Collapse
Affiliation(s)
- Zhihao Duan
- Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, 47907, IN, U.S.A
| | - Adnan Faisal Hossain
- Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, 47907, IN, U.S.A
| | - Jiangpeng He
- Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, 47907, IN, U.S.A
| | - Fengqing Zhu
- Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, 47907, IN, U.S.A
| |
Collapse
|
33
|
Yuan Q, Duren Z. Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data. Nat Biotechnol 2024:10.1038/s41587-024-02182-7. [PMID: 38609714 DOI: 10.1038/s41587-024-02182-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 02/26/2024] [Indexed: 04/14/2024]
Abstract
Existing methods for gene regulatory network (GRN) inference rely on gene expression data alone or on lower resolution bulk data. Despite the recent integration of chromatin accessibility and RNA sequencing data, learning complex mechanisms from limited independent data points still presents a daunting challenge. Here we present LINGER (Lifelong neural network for gene regulation), a machine-learning method to infer GRNs from single-cell paired gene expression and chromatin accessibility data. LINGER incorporates atlas-scale external bulk data across diverse cellular contexts and prior knowledge of transcription factor motifs as a manifold regularization. LINGER achieves a fourfold to sevenfold relative increase in accuracy over existing methods and reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Following the GRN inference from reference single-cell multiome data, LINGER enables the estimation of transcription factor activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies.
Collapse
Affiliation(s)
- Qiuyue Yuan
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA
| | - Zhana Duren
- Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC, USA.
| |
Collapse
|
34
|
Zhu D, Bu Q, Zhu Z, Zhang Y, Wang Z. Advancing autonomy through lifelong learning: a survey of autonomous intelligent systems. Front Neurorobot 2024; 18:1385778. [PMID: 38644905 PMCID: PMC11027131 DOI: 10.3389/fnbot.2024.1385778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 03/25/2024] [Indexed: 04/23/2024] Open
Abstract
The combination of lifelong learning algorithms with autonomous intelligent systems (AIS) is gaining popularity due to its ability to enhance AIS performance, but the existing summaries in related fields are insufficient. Therefore, it is necessary to systematically analyze the research on lifelong learning algorithms with autonomous intelligent systems, aiming to gain a better understanding of the current progress in this field. This paper presents a thorough review and analysis of the relevant work on the integration of lifelong learning algorithms and autonomous intelligent systems. Specifically, we investigate the diverse applications of lifelong learning algorithms in AIS's domains such as autonomous driving, anomaly detection, robots, and emergency management, while assessing their impact on enhancing AIS performance and reliability. The challenging problems encountered in lifelong learning for AIS are summarized based on a profound understanding in literature review. The advanced and innovative development of lifelong learning algorithms for autonomous intelligent systems are discussed for offering valuable insights and guidance to researchers in this rapidly evolving field.
Collapse
Affiliation(s)
- Dekang Zhu
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
| | - Qianyi Bu
- College of Science and Engineering, University of Glasgow, Glasgow, United Kingdom
| | - Zhongpan Zhu
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
- College of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Yujie Zhang
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
| | - Zhipeng Wang
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
35
|
Pagkalos M, Makarov R, Poirazi P. Leveraging dendritic properties to advance machine learning and neuro-inspired computing. Curr Opin Neurobiol 2024; 85:102853. [PMID: 38394956 DOI: 10.1016/j.conb.2024.102853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 02/04/2024] [Accepted: 02/05/2024] [Indexed: 02/25/2024]
Abstract
The brain is a remarkably capable and efficient system. It can process and store huge amounts of noisy and unstructured information, using minimal energy. In contrast, current artificial intelligence (AI) systems require vast resources for training while still struggling to compete in tasks that are trivial for biological agents. Thus, brain-inspired engineering has emerged as a promising new avenue for designing sustainable, next-generation AI systems. Here, we describe how dendritic mechanisms of biological neurons have inspired innovative solutions for significant AI problems, including credit assignment in multi-layer networks, catastrophic forgetting, and high-power consumption. These findings provide exciting alternatives to existing architectures, showing how dendritic research can pave the way for building more powerful and energy efficient artificial learning systems.
Collapse
Affiliation(s)
- Michalis Pagkalos
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology Hellas (FORTH), Heraklion, 70013, Greece; Department of Biology, University of Crete, Heraklion, 70013, Greece. https://twitter.com/MPagkalos
| | - Roman Makarov
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology Hellas (FORTH), Heraklion, 70013, Greece; Department of Biology, University of Crete, Heraklion, 70013, Greece. https://twitter.com/_RomanMakarov
| | - Panayiota Poirazi
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology Hellas (FORTH), Heraklion, 70013, Greece.
| |
Collapse
|
36
|
Jang BK, Park YR. Development and Validation of Adaptable Skin Cancer Classification System Using Dynamically Expandable Representation. Healthc Inform Res 2024; 30:140-146. [PMID: 38755104 PMCID: PMC11098764 DOI: 10.4258/hir.2024.30.2.140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 04/24/2024] [Accepted: 04/24/2024] [Indexed: 05/18/2024] Open
Abstract
OBJECTIVES Skin cancer is a prevalent type of malignancy, necessitating efficient diagnostic tools. This study aimed to develop an automated skin lesion classification model using the dynamically expandable representation (DER) incremental learning algorithm. This algorithm adapts to new data and expands its classification capabilities, with the goal of creating a scalable and efficient system for diagnosing skin cancer. METHODS The DER model with incremental learning was applied to the HAM10000 and ISIC 2019 datasets. Validation involved two steps: initially, training and evaluating the HAM10000 dataset against a fixed ResNet-50; subsequently, performing external validation of the trained model using the ISIC 2019 dataset. The model's performance was assessed using precision, recall, the F1-score, and area under the precision-recall curve. RESULTS The developed skin lesion classification model demonstrated high accuracy and reliability across various types of skin lesions, achieving a weighted-average precision, recall, and F1-score of 0.918, 0.808, and 0.847, respectively. The model's discrimination performance was reflected in an average area under the curve (AUC) value of 0.943. Further external validation with the ISIC 2019 dataset confirmed the model's effectiveness, as shown by an AUC of 0.911. CONCLUSIONS This study presents an optimized skin lesion classification model based on the DER algorithm, which shows high performance in disease classification with the potential to expand its classification range. The model demonstrated robust results in external validation, indicating its adaptability to new disease classes.
Collapse
Affiliation(s)
- Bong Kyung Jang
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
| | - Yu Rang Park
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
| |
Collapse
|
37
|
Kumar N, Srivastava R. Deep learning in structural bioinformatics: current applications and future perspectives. Brief Bioinform 2024; 25:bbae042. [PMID: 38701422 PMCID: PMC11066934 DOI: 10.1093/bib/bbae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/05/2024] [Accepted: 01/18/2024] [Indexed: 05/05/2024] Open
Abstract
In this review article, we explore the transformative impact of deep learning (DL) on structural bioinformatics, emphasizing its pivotal role in a scientific revolution driven by extensive data, accessible toolkits and robust computing resources. As big data continue to advance, DL is poised to become an integral component in healthcare and biology, revolutionizing analytical processes. Our comprehensive review provides detailed insights into DL, featuring specific demonstrations of its notable applications in bioinformatics. We address challenges tailored for DL, spotlight recent successes in structural bioinformatics and present a clear exposition of DL-from basic shallow neural networks to advanced models such as convolution, recurrent, artificial and transformer neural networks. This paper discusses the emerging use of DL for understanding biomolecular structures, anticipating ongoing developments and applications in the realm of structural bioinformatics.
Collapse
Affiliation(s)
- Niranjan Kumar
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Rakesh Srivastava
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
38
|
Kim J, Lim MH, Kim K, Yoon HJ. Continual learning framework for a multicenter study with an application to electrocardiogram. BMC Med Inform Decis Mak 2024; 24:67. [PMID: 38448921 DOI: 10.1186/s12911-024-02464-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 02/21/2024] [Indexed: 03/08/2024] Open
Abstract
Deep learning has been increasingly utilized in the medical field and achieved many goals. Since the size of data dominates the performance of deep learning, several medical institutions are conducting joint research to obtain as much data as possible. However, sharing data is usually prohibited owing to the risk of privacy invasion. Federated learning is a reasonable idea to train distributed multicenter data without direct access; however, a central server to merge and distribute models is needed, which is expensive and hardly approved due to various legal regulations. This paper proposes a continual learning framework for a multicenter study, which does not require a central server and can prevent catastrophic forgetting of previously trained knowledge. The proposed framework contains the continual learning method selection process, assuming that a single method is not omnipotent for all involved datasets in a real-world setting and that there could be a proper method to be selected for specific data. We utilized the fake data based on a generative adversarial network to evaluate methods prospectively, not ex post facto. We used four independent electrocardiogram datasets for a multicenter study and trained the arrhythmia detection model. Our proposed framework was evaluated against supervised and federated learning methods, as well as finetuning approaches that do not include any regulation to preserve previous knowledge. Even without a central server and access to the past data, our framework achieved stable performance (AUROC 0.897) across all involved datasets, achieving comparable performance to federated learning (AUROC 0.901).
Collapse
Affiliation(s)
- Junmo Kim
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
| | - Min Hyuk Lim
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, Republic of Korea
- Graduate School of Health Science and Technology, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
| | - Kwangsoo Kim
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, Republic of Korea
- Department of Medicine, College of Medicine, Seoul National University, Seoul, Republic of Korea
| | - Hyung-Jin Yoon
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea.
- Medical Bigdata Research Center, Seoul National University College of Medicine, 101, Daehak-ro, Jongno-gu, Seoul, Republic of Korea.
| |
Collapse
|
39
|
Gao J, Lu Y, Qi X, Kou Y, Li B, Li L, Yu S, Hu W. Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1881-1897. [PMID: 35254973 DOI: 10.1109/tpami.2022.3156977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Tracking visual objects from a single initial exemplar in the testing phase has been broadly cast as a one-/few-shot problem, i.e., one-shot learning for initial adaptation and few-shot learning for online adaptation. The recent few-shot online adaptation methods incorporate the prior knowledge from large amounts of annotated training data via complex meta-learning optimization in the offline phase. This helps the online deep trackers to achieve fast adaptation and reduce overfitting risk in tracking. In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training. It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before, and thus the seen data can be safely removed from training. This also bears certain similarities to the emerging continual learning field in preventing catastrophic forgetting. This mechanism enables us to unveil the power of modern online deep trackers without incurring too much extra computational cost. We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP. The consistent improvements on several challenging tracking benchmarks demonstrate its effectiveness and efficiency.
Collapse
|
40
|
Willes J, Harrison J, Harakeh A, Finn C, Pavone M, Waslander SL. Bayesian Embeddings for Few-Shot Open World Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1513-1529. [PMID: 36063507 DOI: 10.1109/tpami.2022.3201541] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that are typically designed with a known set of classes and a large number of examples for each class. In this work we extend embedding-based few-shot learning algorithms to the open-world recognition setting. We combine Bayesian non-parametric class priors with an embedding-based pre-training scheme to yield a highly flexible framework which we refer to as few-shot learning for open world recognition (FLOWR). We benchmark our framework on open-world extensions of the common MiniImageNet and TieredImageNet few-shot learning datasets. Our results show, compared to prior methods, strong classification accuracy performance and up to a 12% improvement in H-measure (a measure of novel class detection) from our non-parametric open-world few-shot learning scheme.
Collapse
|
41
|
Liu Z, Miao Z, Zhan X, Wang J, Gong B, Yu SX. Open Long-Tailed Recognition in a Dynamic World. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1836-1851. [PMID: 35984801 DOI: 10.1109/tpami.2022.3200091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Real world data often exhibits a long-tailed and open-ended (i.e., with unseen classes) distribution. A practical recognition system must balance between majority (head) and minority (tail) classes, generalize across the distribution, and acknowledge novelty upon the instances of unseen classes (open classes). We define Open Long-Tailed Recognition++ (OLTR++) as learning from such naturally distributed data and optimizing for the classification accuracy over a balanced test set which includes both known and open classes. OLTR++ handles imbalanced classification, few-shot learning, open-set recognition, and active learning in one integrated algorithm, whereas existing classification approaches often focus only on one or two aspects and deliver poorly over the entire spectrum. The key challenges are: 1) how to share visual knowledge between head and tail classes, 2) how to reduce confusion between tail and open classes, and 3) how to actively explore open classes with learned knowledge. Our algorithm, OLTR++, maps images to a feature space such that visual concepts can relate to each other through a memory association mechanism and a learned metric (dynamic meta-embedding) that both respects the closed world classification of seen classes and acknowledges the novelty of open classes. Additionally, we propose an active learning scheme based on visual memory, which learns to recognize open classes in a data-efficient manner for future expansions. On three large-scale open long-tailed datasets we curated from ImageNet (object-centric), Places (scene-centric), and MS1M (face-centric) data, as well as three standard benchmarks (CIFAR-10-LT, CIFAR-100-LT, and iNaturalist-18), our approach, as a unified framework, consistently demonstrates competitive performance. Notably, our approach also shows strong potential for the active exploration of open classes and the fairness analysis of minority groups.
Collapse
|
42
|
Mazumder P, Singh P, Rai P, Namboodiri VP. Rectification-Based Knowledge Retention for Task Incremental Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1561-1575. [PMID: 36449592 DOI: 10.1109/tpami.2022.3225310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
In the task incremental learning problem, deep learning models suffer from catastrophic forgetting of previously seen classes/tasks as they are trained on new classes/tasks. This problem becomes even harder when some of the test classes do not belong to the training class set, i.e., the task incremental generalized zero-shot learning problem. We propose a novel approach to address the task incremental learning problem for both the non zero-shot and zero-shot settings. Our proposed approach, called Rectification-based Knowledge Retention (RKR), applies weight rectifications and affine transformations for adapting the model to any task. During testing, our approach can use the task label information (task-aware) to quickly adapt the network to that task. We also extend our approach to make it task-agnostic so that it can work even when the task label information is not available during testing. Specifically, given a continuum of test data, our approach predicts the task and quickly adapts the network to the predicted task. We experimentally show that our proposed approach achieves state-of-the-art results on several benchmark datasets for both non zero-shot and zero-shot task incremental learning.
Collapse
|
43
|
Zhao H, Fu Y, Kang M, Tian Q, Wu F, Li X. MgSvF: Multi-Grained Slow versus Fast Framework for Few-Shot Class-Incremental Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1576-1588. [PMID: 34882547 DOI: 10.1109/tpami.2021.3133897] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As a challenging problem, few-shot class-incremental learning (FSCIL) continually learns a sequence of tasks, confronting the dilemma between slow forgetting of old knowledge and fast adaptation to new knowledge. In this paper, we concentrate on this "slow versus fast" (SvF) dilemma to determine which knowledge components to be updated in a slow fashion or a fast fashion, and thereby balance old-knowledge preservation and new-knowledge adaptation. We propose a multi-grained SvF learning strategy to cope with the SvF dilemma from two different grains: intra-space (within the same feature space) and inter-space (between two different feature spaces). The proposed strategy designs a novel frequency-aware regularization to boost the intra-space SvF capability, and meanwhile develops a new feature space composition operation to enhance the inter-space SvF learning performance. With the multi-grained SvF learning strategy, our method outperforms the state-of-the-art approaches by a large margin.
Collapse
|
44
|
Souza R, Stanley EAM, Camacho M, Camicioli R, Monchi O, Ismail Z, Wilms M, Forkert ND. A multi-center distributed learning approach for Parkinson's disease classification using the traveling model paradigm. Front Artif Intell 2024; 7:1301997. [PMID: 38384277 PMCID: PMC10879577 DOI: 10.3389/frai.2024.1301997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 01/23/2024] [Indexed: 02/23/2024] Open
Abstract
Distributed learning is a promising alternative to central learning for machine learning (ML) model training, overcoming data-sharing problems in healthcare. Previous studies exploring federated learning (FL) or the traveling model (TM) setup for medical image-based disease classification often relied on large databases with a limited number of centers or simulated artificial centers, raising doubts about real-world applicability. This study develops and evaluates a convolution neural network (CNN) for Parkinson's disease classification using data acquired by 83 diverse real centers around the world, mostly contributing small training samples. Our approach specifically makes use of the TM setup, which has proven effective in scenarios with limited data availability but has never been used for image-based disease classification. Our findings reveal that TM is effective for training CNN models, even in complex real-world scenarios with variable data distributions. After sufficient training cycles, the TM-trained CNN matches or slightly surpasses the performance of the centrally trained counterpart (AUROC of 83% vs. 80%). Our study highlights, for the first time, the effectiveness of TM in 3D medical image classification, especially in scenarios with limited training samples and heterogeneous distributed data. These insights are relevant for situations where ML models are supposed to be trained using data from small or remote medical centers, and rare diseases with sparse cases. The simplicity of this approach enables a broad application to many deep learning tasks, enhancing its clinical utility across various contexts and medical facilities.
Collapse
Affiliation(s)
- Raissa Souza
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Emma A. M. Stanley
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Milton Camacho
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Biomedical Engineering Graduate Program, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Richard Camicioli
- Department of Medicine (Neurology), Neuroscience and Mental Health Institute, University of Alberta, Edmonton, AB, Canada
| | - Oury Monchi
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Radiology, Radio-oncology and Nuclear Medicine, Université de Montréal, Montréal, QC, Canada
- Centre de Recherche, Institut Universitaire de Gériatrie de Montréal, Montréal, QC, Canada
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Zahinoor Ismail
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Department of Psychiatry, University of Calgary, Calgary, AB, Canada
- Clinical and Biomedical Sciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - Matthias Wilms
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
- Department of Pediatrics, University of Calgary, Calgary, AB, Canada
- Department of Community Health Sciences, University of Calgary, Calgary, AB, Canada
| | - Nils D. Forkert
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
45
|
Barry MLLR, Gerstner W. Fast adaptation to rule switching using neuronal surprise. PLoS Comput Biol 2024; 20:e1011839. [PMID: 38377112 PMCID: PMC10906910 DOI: 10.1371/journal.pcbi.1011839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/01/2024] [Accepted: 01/18/2024] [Indexed: 02/22/2024] Open
Abstract
In humans and animals, surprise is a physiological reaction to an unexpected event, but how surprise can be linked to plausible models of neuronal activity is an open problem. We propose a self-supervised spiking neural network model where a surprise signal is extracted from an increase in neural activity after an imbalance of excitation and inhibition. The surprise signal modulates synaptic plasticity via a three-factor learning rule which increases plasticity at moments of surprise. The surprise signal remains small when transitions between sensory events follow a previously learned rule but increases immediately after rule switching. In a spiking network with several modules, previously learned rules are protected against overwriting, as long as the number of modules is larger than the total number of rules-making a step towards solving the stability-plasticity dilemma in neuroscience. Our model relates the subjective notion of surprise to specific predictions on the circuit level.
Collapse
Affiliation(s)
- Martin L. L. R. Barry
- School of Computer and Communication Sciences and School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Wulfram Gerstner
- School of Computer and Communication Sciences and School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
46
|
Fan L, Gong X, Zheng C, Li J. Data pyramid structure for optimizing EUS-based GISTs diagnosis in multi-center analysis with missing label. Comput Biol Med 2024; 169:107897. [PMID: 38171262 DOI: 10.1016/j.compbiomed.2023.107897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/05/2024]
Abstract
This study introduces the Data Pyramid Structure (DPS) to address data sparsity and missing labels in medical image analysis. The DPS optimizes multi-task learning and enables sustainable expansion of multi-center data analysis. Specifically, It facilitates attribute prediction and malignant tumor diagnosis tasks by implementing a segmentation and aggregation strategy on data with absent attribute labels. To leverage multi-center data, we propose the Unified Ensemble Learning Framework (UELF) and the Unified Federated Learning Framework (UFLF), which incorporate strategies for data transfer and incremental learning in scenarios with missing labels. The proposed method was evaluated on a challenging EUS patient dataset from five centers, achieving promising diagnostic performance. The average accuracy was 0.984 with an AUC of 0.927 for multi-center analysis, surpassing state-of-the-art approaches. The interpretability of the predictions further highlights the potential clinical relevance of our method.
Collapse
Affiliation(s)
- Lin Fan
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, China
| | - Xun Gong
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, China.
| | - Cenyang Zheng
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, Sichuan 611756, China; Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, China; Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China; National Engineering Laboratory of Integrated Transportation Big Data Application Technology, China
| | - Jiao Li
- Department of Gastroenterology, The Third People's Hospital of Chendu, Affiliated Hospital of Southwest Jiaotong University, Chengdu 610031, China
| |
Collapse
|
47
|
Huo F, Liu Z, Guo J, Xu W, Guo S. UTDNet: A unified triplet decoder network for multimodal salient object detection. Neural Netw 2024; 170:521-534. [PMID: 38043372 DOI: 10.1016/j.neunet.2023.11.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 10/11/2023] [Accepted: 11/22/2023] [Indexed: 12/05/2023]
Abstract
Image Salient Object Detection (SOD) is a fundamental research topic in the area of computer vision. Recently, the multimodal information in RGB, Depth (D), and Thermal (T) modalities has been proven to be beneficial to the SOD. However, existing methods are only designed for RGB-D or RGB-T SOD, which may limit the utilization in various modalities, or just finetuned on specific datasets, which may bring about extra computation overhead. These defects can hinder the practical deployment of SOD in real-world applications. In this paper, we propose an end-to-end Unified Triplet Decoder Network, dubbed UTDNet, for both RGB-T and RGB-D SOD tasks. The intractable challenges for the unified multimodal SOD are mainly two-fold, i.e., (1) accurately detecting and segmenting salient objects, and (2) preferably via a single network that fits both RGB-T and RGB-D SOD. First, to deal with the former challenge, we propose the multi-scale feature extraction unit to enrich the discriminative contextual information, and the efficient fusion module to explore cross-modality complementary information. Then, the multimodal features are fed to the triplet decoder, where the hierarchical deep supervision loss further enable the network to capture distinctive saliency cues. Second, as to the latter challenge, we propose a simple yet effective continual learning method to unify multimodal SOD. Concretely, we sequentially train multimodal SOD tasks by applying Elastic Weight Consolidation (EWC) regularization with the hierarchical loss function to avoid catastrophic forgetting without inducing more parameters. Critically, the triplet decoder separates task-specific and task-invariant information, making the network easily adaptable to multimodal SOD tasks. Extensive comparisons with 26 recently proposed RGB-T and RGB-D SOD methods demonstrate the superiority of the proposed UTDNet.
Collapse
Affiliation(s)
- Fushuo Huo
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Ziming Liu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| | - Jingcai Guo
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China.
| | - Wenchao Xu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China.
| | - Song Guo
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong Special Administrative Region of China
| |
Collapse
|
48
|
Huber F, Inderka A, Steinhage V. Leveraging Remote Sensing Data for Yield Prediction with Deep Transfer Learning. SENSORS (BASEL, SWITZERLAND) 2024; 24:770. [PMID: 38339487 PMCID: PMC10857376 DOI: 10.3390/s24030770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 01/10/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
Remote sensing data represent one of the most important sources for automized yield prediction. High temporal and spatial resolution, historical record availability, reliability, and low cost are key factors in predicting yields around the world. Yield prediction as a machine learning task is challenging, as reliable ground truth data are difficult to obtain, especially since new data points can only be acquired once a year during harvest. Factors that influence annual yields are plentiful, and data acquisition can be expensive, as crop-related data often need to be captured by experts or specialized sensors. A solution to both problems can be provided by deep transfer learning based on remote sensing data. Satellite images are free of charge, and transfer learning allows recognition of yield-related patterns within countries where data are plentiful and transfers the knowledge to other domains, thus limiting the number of ground truth observations needed. Within this study, we examine the use of transfer learning for yield prediction, where the data preprocessing towards histograms is unique. We present a deep transfer learning framework for yield prediction and demonstrate its successful application to transfer knowledge gained from US soybean yield prediction to soybean yield prediction within Argentina. We perform a temporal alignment of the two domains and improve transfer learning by applying several transfer learning techniques, such as L2-SP, BSS, and layer freezing, to overcome catastrophic forgetting and negative transfer problems. Lastly, we exploit spatio-temporal patterns within the data by applying a Gaussian process. We are able to improve the performance of soybean yield prediction in Argentina by a total of 19% in terms of RMSE and 39% in terms of R2 compared to predictions without transfer learning and Gaussian processes. This proof of concept for advanced transfer learning techniques for yield prediction and remote sensing data in the form of histograms can enable successful yield prediction, especially in emerging and developing countries, where reliable data are usually limited.
Collapse
Affiliation(s)
| | | | - Volker Steinhage
- Department of Computer Science IV, University of Bonn, 53121 Bonn, Germany;
| |
Collapse
|
49
|
Katiyar AK, Hoang AT, Xu D, Hong J, Kim BJ, Ji S, Ahn JH. 2D Materials in Flexible Electronics: Recent Advances and Future Prospectives. Chem Rev 2024; 124:318-419. [PMID: 38055207 DOI: 10.1021/acs.chemrev.3c00302] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Flexible electronics have recently gained considerable attention due to their potential to provide new and innovative solutions to a wide range of challenges in various electronic fields. These electronics require specific material properties and performance because they need to be integrated into a variety of surfaces or folded and rolled for newly formatted electronics. Two-dimensional (2D) materials have emerged as promising candidates for flexible electronics due to their unique mechanical, electrical, and optical properties, as well as their compatibility with other materials, enabling the creation of various flexible electronic devices. This article provides a comprehensive review of the progress made in developing flexible electronic devices using 2D materials. In addition, it highlights the key aspects of materials, scalable material production, and device fabrication processes for flexible applications, along with important examples of demonstrations that achieved breakthroughs in various flexible and wearable electronic applications. Finally, we discuss the opportunities, current challenges, potential solutions, and future investigative directions about this field.
Collapse
Affiliation(s)
- Ajit Kumar Katiyar
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Anh Tuan Hoang
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Duo Xu
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Juyeong Hong
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Beom Jin Kim
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Seunghyeon Ji
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| | - Jong-Hyun Ahn
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of Korea
| |
Collapse
|
50
|
Gai S, Lyu S, Zhang H, Wang D. Continual Reinforcement Learning for Quadruped Robot Locomotion. ENTROPY (BASEL, SWITZERLAND) 2024; 26:93. [PMID: 38275501 PMCID: PMC11154561 DOI: 10.3390/e26010093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/12/2024] [Accepted: 01/15/2024] [Indexed: 01/27/2024]
Abstract
The ability to learn continuously is crucial for a robot to achieve a high level of intelligence and autonomy. In this paper, we consider continual reinforcement learning (RL) for quadruped robots, which includes the ability to continuously learn sub-sequential tasks (plasticity) and maintain performance on previous tasks (stability). The policy obtained by the proposed method enables robots to learn multiple tasks sequentially, while overcoming both catastrophic forgetting and loss of plasticity. At the same time, it achieves the above goals with as little modification to the original RL learning process as possible. The proposed method uses the Piggyback algorithm to select protected parameters for each task, and reinitializes the unused parameters to increase plasticity. Meanwhile, we encourage the policy network exploring by encouraging the entropy of the soft network of the policy network. Our experiments show that traditional continual learning algorithms cannot perform well on robot locomotion problems, and our algorithm is more stable and less disruptive to the RL training progress. Several robot locomotion experiments validate the effectiveness of our method.
Collapse
Affiliation(s)
- Sibo Gai
- School of Computer Science, Fudan University, Shanghai 200433, China; or
- School of Engineer, Westlake Univercity, Hangzhou 310030, China; (S.L.); (H.Z.)
| | - Shangke Lyu
- School of Engineer, Westlake Univercity, Hangzhou 310030, China; (S.L.); (H.Z.)
| | - Hongyin Zhang
- School of Engineer, Westlake Univercity, Hangzhou 310030, China; (S.L.); (H.Z.)
| | - Donglin Wang
- School of Engineer, Westlake Univercity, Hangzhou 310030, China; (S.L.); (H.Z.)
| |
Collapse
|