1
|
Ross M, Berberian N, Nikolla A, Chartier S. Dynamic multilayer growth: Parallel vs. sequential approaches. PLoS One 2024; 19:e0301513. [PMID: 38722934 PMCID: PMC11081283 DOI: 10.1371/journal.pone.0301513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/18/2024] [Indexed: 05/13/2024] Open
Abstract
The decision of when to add a new hidden unit or layer is a fundamental challenge for constructive algorithms. It becomes even more complex in the context of multiple hidden layers. Growing both network width and depth offers a robust framework for leveraging the ability to capture more information from the data and model more complex representations. In the context of multiple hidden layers, should growing units occur sequentially with hidden units only being grown in one layer at a time or in parallel with hidden units growing across multiple layers simultaneously? The effects of growing sequentially or in parallel are investigated using a population dynamics-inspired growing algorithm in a multilayer context. A modified version of the constructive growing algorithm capable of growing in parallel is presented. Sequential and parallel growth methodologies are compared in a three-hidden layer multilayer perceptron on several benchmark classification tasks. Several variants of these approaches are developed for a more in-depth comparison based on the type of hidden layer initialization and the weight update methods employed. Comparisons are then made to another sequential growing approach, Dynamic Node Creation. Growing hidden layers in parallel resulted in comparable or higher performances than sequential approaches. Growing hidden layers in parallel promotes growing narrower deep architectures tailored to the task. Dynamic growth inspired by population dynamics offers the potential to grow the width and depth of deeper neural networks in either a sequential or parallel fashion.
Collapse
Affiliation(s)
- Matt Ross
- Laboratory for- Computational Neurodynamics and Cognition, School of Psychology, University of Ottawa, Ottawa, ON, Canada
| | - Nareg Berberian
- Laboratory for- Computational Neurodynamics and Cognition, School of Psychology, University of Ottawa, Ottawa, ON, Canada
| | - Albino Nikolla
- Laboratory for- Computational Neurodynamics and Cognition, School of Psychology, University of Ottawa, Ottawa, ON, Canada
| | - Sylvain Chartier
- Laboratory for- Computational Neurodynamics and Cognition, School of Psychology, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
2
|
Kang T, Ding W, Chen P. CRESPR: Modular sparsification of DNNs to improve pruning performance and model interpretability. Neural Netw 2024; 172:106067. [PMID: 38199151 DOI: 10.1016/j.neunet.2023.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/27/2023] [Accepted: 12/12/2023] [Indexed: 01/12/2024]
Abstract
Modern DNNs often include a huge number of parameters that are expensive for both computation and memory. Pruning can significantly reduce model complexity and lessen resource demands, and less complex models can also be easier to explain and interpret. In this paper, we propose a novel pruning algorithm, Cluster-Restricted Extreme Sparsity Pruning of Redundancy (CRESPR), to prune a neural network into modular units and achieve better pruning efficiency. With the Hessian matrix, we provide an analytic explanation of why modular structures in a sparse DNN can better maintain performance, especially at an extreme high pruning ratio. In CRESPR, each modular unit contains mostly internal connections, which clearly shows how subgroups of input features are processed through a DNN and eventually contribute to classification decisions. Such process-level revealing of internal working mechanisms undoubtedly leads to better interpretability of a black-box DNN model. Extensive experiments were conducted with multiple DNN architectures and datasets, and CRESPR achieves higher pruning performance than current state-of-the-art methods at high and extremely high pruning ratios. Additionally, we show how CRESPR improves model interpretability through a concrete example.
Collapse
Affiliation(s)
- Tianyu Kang
- University of Massachusetts Boston, United States of America
| | - Wei Ding
- University of Massachusetts Boston, United States of America
| | - Ping Chen
- University of Massachusetts Boston, United States of America.
| |
Collapse
|
3
|
Lim H, Joo Y, Ha E, Song Y, Yoon S, Shin T. Brain Age Prediction Using Multi-Hop Graph Attention Combined with Convolutional Neural Network. Bioengineering (Basel) 2024; 11:265. [PMID: 38534539 DOI: 10.3390/bioengineering11030265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 03/28/2024] Open
Abstract
Convolutional neural networks (CNNs) have been used widely to predict biological brain age based on brain magnetic resonance (MR) images. However, CNNs focus mainly on spatially local features and their aggregates and barely on the connective information between distant regions. To overcome this issue, we propose a novel multi-hop graph attention (MGA) module that exploits both the local and global connections of image features when combined with CNNs. After insertion between convolutional layers, MGA first converts the convolution-derived feature map into graph-structured data by using patch embedding and embedding-distance-based scoring. Multi-hop connections between the graph nodes are modeled by using the Markov chain process. After performing multi-hop graph attention, MGA re-converts the graph into an updated feature map and transfers it to the next convolutional layer. We combined the MGA module with sSE (spatial squeeze and excitation)-ResNet18 for our final prediction model (MGA-sSE-ResNet18) and performed various hyperparameter evaluations to identify the optimal parameter combinations. With 2788 three-dimensional T1-weighted MR images of healthy subjects, we verified the effectiveness of MGA-sSE-ResNet18 with comparisons to four established, general-purpose CNNs and two representative brain age prediction models. The proposed model yielded an optimal performance with a mean absolute error of 2.822 years and Pearson's correlation coefficient (PCC) of 0.968, demonstrating the potential of the MGA module to improve the accuracy of brain age prediction.
Collapse
Affiliation(s)
- Heejoo Lim
- Division of Mechanical and Biomedical Engineering, Ewha W. University, Seoul 03760, Republic of Korea
- Graduate Program in Smart Factory, Ewha W. University, Seoul 03760, Republic of Korea
| | - Yoonji Joo
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
| | - Eunji Ha
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
| | - Yumi Song
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
- Department of Brain and Cognitive Sciences, Ewha W. University, Seoul 03760, Republic of Korea
| | - Sujung Yoon
- Ewha Brain Institute, Ewha W. University, Seoul 03760, Republic of Korea
- Department of Brain and Cognitive Sciences, Ewha W. University, Seoul 03760, Republic of Korea
| | - Taehoon Shin
- Division of Mechanical and Biomedical Engineering, Ewha W. University, Seoul 03760, Republic of Korea
- Graduate Program in Smart Factory, Ewha W. University, Seoul 03760, Republic of Korea
| |
Collapse
|
4
|
Li G, Yang P, Qian C, Hong R, Tang K. Stage-Wise Magnitude-Based Pruning for Recurrent Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1666-1680. [PMID: 35759588 DOI: 10.1109/tnnls.2022.3184730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A recurrent neural network (RNN) has shown powerful performance in tackling various natural language processing (NLP) tasks, resulting in numerous powerful models containing both RNN neurons and feedforward neurons. On the other hand, the deep structure of RNN has heavily restricted its implementation on mobile devices, where quite a few applications involve NLP tasks. Magnitude-based pruning (MP) is a promising way to address such a challenge. However, the existing MP methods are mostly designed for feedforward neural networks that do not involve a recurrent structure, and, thus, have performed less satisfactorily on pruning models containing RNN layers. In this article, a novel stage-wise MP method is proposed by explicitly taking the featured recurrent structure of RNN into account, which can effectively prune feedforward layers and RNN layers, simultaneously. The connections of neural networks are first grouped into three types according to how they are intersected with recurrent neurons. Then, an optimization-based pruning method is applied to compress each group of connections, respectively. Empirical studies show that the proposed method performs significantly better than the commonly used RNN pruning methods; i.e., up to 96.84% connections are pruned with little or even no degradation of precision indicators on the testing datasets.
Collapse
|
5
|
Zhen C, Zhang W, Mo J, Ji M, Zhou H, Zhu J. RASP: Regularization-based Amplitude Saliency Pruning. Neural Netw 2023; 168:1-13. [PMID: 37734135 DOI: 10.1016/j.neunet.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 07/14/2023] [Accepted: 09/01/2023] [Indexed: 09/23/2023]
Abstract
Due to the prevalent data-dependent nature of existing pruning criteria, norm criteria with data independence play a crucial role in filter pruning criteria, providing promising prospects for deploying deep neural networks on resource-constrained devices. However, norm criteria based on amplitude measurements have long posed challenges in terms of theoretical feasibility. Existing methods rely on data-derived information such as derivatives to establish reasonable pruning standards. Nonetheless, achieving quantitative analysis of the "smaller-norm-less-important" notion remains elusive within the norm criterion context. To address the need for data independence and theoretical feasibility, we conducted saliency analysis on filters and proposed a regularization-based amplitude saliency pruning criterion (RASP). This amplitude saliency not only attains data independence but also establishes norm criteria for usage guidelines. Furthermore, we further investigated the amplitude saliency, addressing the issues of data dependency in model evaluation and inter-class filter selection. We introduced model saliency and an adaptive parameter group lasso (AGL) regularization approach sensitive to different layers. Theoretically, we thoroughly analyzed the feasibility of amplitude saliency and employed quantitative saliency analysis to validate the advantages of our method over previous approaches. Experimentally, conducted on the CIFAR-10 and ImageNet image classification benchmarks, we extensively validated the improved top-level performance of our method compared to previous methods. Even when the pruned model has the same or even smaller number of FLOP, our method can achieve equivalent or higher model accuracy. Notably, in our ImageNet experiment, RASP achieved a 51.9% reduction in FLOPs while maintaining an accuracy of 76.19% on ResNet-50.
Collapse
Affiliation(s)
- Chenghui Zhen
- College of Information Science and Engineering, Huaqiao University, Xiamen, 361021, Fujian, China.
| | - Weiwei Zhang
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| | - Jian Mo
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| | - Ming Ji
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| | - Hongbo Zhou
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China; Intelligent Software Research Center, Institute of Software Chinese Academy of Sciences, 100190, Beijing, China.
| | - Jianqing Zhu
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| |
Collapse
|
6
|
Ioannides G, Jadhav A, Sharma A, Navali S, Black AW. Compressed models for co-reference resolution: enhancing efficiency with debiased word embeddings. Sci Rep 2023; 13:18510. [PMID: 37898713 PMCID: PMC10613201 DOI: 10.1038/s41598-023-45677-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 10/23/2023] [Indexed: 10/30/2023] Open
Abstract
This work presents a comprehensive approach to reduce bias in word embedding vectors and evaluate the impact on various Natural Language Processing (NLP) tasks. Two GloVe variations (840B and 50) are debiased by identifying the gender direction in the word embedding space and then removing or reducing the gender component from the embeddings of target words, while preserving useful semantic information. Their gender bias is assessed through the Word Embedding Association Test. The performance of co-reference resolution and text classification models trained on both original and debiased embeddings is evaluated in terms of accuracy. A compressed co-reference resolution model is examined to gauge the effectiveness of debiasing techniques on resource-efficient models. To the best of the authors' knowledge, this is the first attempt to apply compression techniques to debiased models. By analyzing the context preservation of debiased embeddings using a Twitter misinformation dataset, this study contributes valuable insights into the practical implications of debiasing methods for real-world applications such as person profiling.
Collapse
Affiliation(s)
- Georgios Ioannides
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA.
- James Silberrad Brown Center for Artificial Intelligence, San Diego State University, San Diego, 92182, USA.
| | - Aishwarya Jadhav
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Aditi Sharma
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Samarth Navali
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
| | - Alan W Black
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, 15213, USA
| |
Collapse
|
7
|
Shah SM, Lau VKN. Model Compression for Communication Efficient Federated Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5937-5951. [PMID: 34936557 DOI: 10.1109/tnnls.2021.3131614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Despite the many advantages of using deep neural networks over shallow networks in various machine learning tasks, their effectiveness is compromised in a federated learning setting due to large storage sizes and high computational resource requirements for training. A large model size can potentially require infeasible amounts of data to be transmitted between the server and clients for training. To address these issues, we investigate the traditional and novel compression techniques to construct sparse models from dense networks whose storage and bandwidth requirements are significantly lower. We do this by separately considering compression techniques for the server model to address downstream communication and the client models to address upstream communication. Both of these play a crucial role in developing and maintaining sparsity across communication cycles. We empirically demonstrate the efficacy of the proposed schemes by testing their performance on standard datasets and verify that they outperform various state-of-the-art baseline schemes in terms of accuracy and communication volume.
Collapse
|
8
|
Choi B, Olberg S, Park JC, Kim JS, Shrestha DK, Yaddanapudi S, Furutani KM, Beltran CJ. Technical note: Progressive deep learning: An accelerated training strategy for medical image segmentation. Med Phys 2023; 50:5075-5087. [PMID: 36763566 DOI: 10.1002/mp.16267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 12/30/2022] [Accepted: 01/24/2023] [Indexed: 02/11/2023] Open
Abstract
BACKGROUND Recent advancements in Deep Learning (DL) methodologies have led to state-of-the-art performance in a wide range of applications especially in object recognition, classification, and segmentation of medical images. However, training modern DL models requires a large amount of computation and long training times due to the complex nature of network structures and the large number of training datasets involved. Moreover, it is an intensive, repetitive manual process to select the optimized configuration of hyperparameters for a given DL network. PURPOSE In this study, we present a novel approach to accelerate the training time of DL models via the progressive feeding of training datasets based on similarity measures for medical image segmentation. We term this approach Progressive Deep Learning (PDL). METHODS The two-stage PDL approach was tested on the auto-segmentation task for two imaging modalities: CT and MRI. The training datasets were ranked according to similarity measures between each sample based on Mean Square Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and the Universal Quality Image Index (UQI) values. At the start of the training process, a relatively coarse sampling of training datasets with higher ranks was used to optimize the hyperparameters of the DL network. Following this, the samples with higher ranks were used in step 1 to yield accelerated loss minimization in early training epochs and the total dataset was added in step 2 for the remainder of training. RESULTS Our results demonstrate that the PDL approach can reduce the training time by nearly half (∼49%) and can predict segmentations (CT U-net/DenseNet dice coefficient: 0.9506/0.9508, MR U-net/DenseNet dice coefficient: 0.9508/0.9510) without major statistical difference (Wilcoxon signed-rank test) compared to the conventional DL approach. The total training times with a fixed cutoff at 0.95 DSC for the CT dataset using DenseNet and U-Net architectures, respectively, were 17 h, 20 min and 4 h, 45 min in the conventional case compared to 8 h, 45 min and 2 h, 20 min with PDL. For the MRI dataset, the total training times using the same architectures were 2 h, 54 min and 52 min in the conventional case and 1 h, 14 min and 25 min with PDL. CONCLUSION The proposed PDL training approach offers the ability to substantially reduce the training time for medical image segmentation while maintaining the performance achieved in the conventional case.
Collapse
Affiliation(s)
- Byongsu Choi
- Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, South Korea
- Medical Physics and Biomedical Engineering Lab (MPBEL), Yonsei University College of Medicine, Seoul, South Korea
| | - Sven Olberg
- Department of Radiation Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Justin C Park
- Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, South Korea
- Department of Radiation Oncology, Mayo Clinic, Jacksonville, Florida, USA
| | - Jin Sung Kim
- Department of Radiation Oncology, Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, South Korea
- Medical Physics and Biomedical Engineering Lab (MPBEL), Yonsei University College of Medicine, Seoul, South Korea
- Oncosoft Inc., Seoul, South Korea
| | - Deepak K Shrestha
- Department of Radiation Oncology, Mayo Clinic, Jacksonville, Florida, USA
| | | | - Keith M Furutani
- Department of Radiation Oncology, Mayo Clinic, Jacksonville, Florida, USA
| | - Chris J Beltran
- Department of Radiation Oncology, Mayo Clinic, Jacksonville, Florida, USA
| |
Collapse
|
9
|
Kim D, Kim MS, Shim H, Lee J. Your lottery ticket is damaged: Towards all-alive pruning for extremely sparse networks. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
10
|
Shargh AK, Abdolrahim N. An interpretable deep learning approach for designing nanoporous silicon nitride membranes with tunable mechanical properties. NPJ COMPUTATIONAL MATERIALS 2023; 9:82. [PMID: 37273663 PMCID: PMC10221757 DOI: 10.1038/s41524-023-01037-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/07/2023] [Indexed: 06/06/2023]
Abstract
The high permeability and strong selectivity of nanoporous silicon nitride (NPN) membranes make them attractive in a broad range of applications. Despite their growing use, the strength of NPN membranes needs to be improved for further extending their biomedical applications. In this work, we implement a deep learning framework to design NPN membranes with improved or prescribed strength values. We examine the predictions of our framework using physics-based simulations. Our results confirm that the proposed framework is not only able to predict the strength of NPN membranes with a wide range of microstructures, but also can design NPN membranes with prescribed or improved strength. Our simulations further demonstrate that the microstructural heterogeneity that our framework suggests for the optimized design, lowers the stress concentration around the pores and leads to the strength improvement of NPN membranes as compared to conventional membranes with homogenous microstructures.
Collapse
Affiliation(s)
- Ali K. Shargh
- Department of Mechanical Engineering, University of Rochester, Rochester, NY 14627 USA
| | - Niaz Abdolrahim
- Department of Mechanical Engineering, University of Rochester, Rochester, NY 14627 USA
- Materials Science program, University of Rochester, Rochester, NY 14627 USA
- Laboratory for Laser Energetics, University of Rochester, Rochester, NY 14627 USA
| |
Collapse
|
11
|
Li K, Yang C, Wang W, Qiao J. An improved stochastic configuration network for concentration prediction in wastewater treatment process. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.11.134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
12
|
A pruning feedforward small-world neural network by dynamic sparse regularization with smoothing l1/2 norm for nonlinear system modeling. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
|
13
|
Drgas S. A Survey on Low-Latency DNN-Based Speech Enhancement. SENSORS (BASEL, SWITZERLAND) 2023; 23:1380. [PMID: 36772421 PMCID: PMC9921748 DOI: 10.3390/s23031380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 01/19/2023] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
This paper presents recent advances in low-latency, single-channel, deep neural network-based speech enhancement systems. The sources of latency and their acceptable values in different applications are described. This is followed by an analysis of the constraints imposed on neural network architectures. Specifically, the causal units used in deep neural networks are presented and discussed in the context of their properties, such as the number of parameters, the receptive field, and computational complexity. This is followed by a discussion of techniques used to reduce the computational complexity and memory requirements of the neural networks used in this task. Finally, the techniques used by the winners of the latest speech enhancement challenges (DNS, Clarity) are shown and compared.
Collapse
Affiliation(s)
- Szymon Drgas
- Institute of Automatic Control and Robotics, Poznan University of Technology, Piotrowo 3A Street, 60-965 Poznan, Poland
| |
Collapse
|
14
|
Buche C, Lasson F, Kerdelo S. Conditional autoencoder pre-training and optimization algorithms for personalized care of hemophiliac patients. Front Artif Intell 2023; 6:1048010. [PMID: 36762254 PMCID: PMC9905812 DOI: 10.3389/frai.2023.1048010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/02/2023] [Indexed: 01/26/2023] Open
Abstract
This paper presents the use of deep conditional autoencoder to predict the effect of treatments for patients suffering from hemophiliac disorders. Conditional autoencoder is a semi-supervised model that learns an abstract representation of the data and provides conditional reconstruction capabilities. Such models are suited to problems with limited and/or partially observable data, common situation for data in medicine. Deep conditional autoencoders allow the representation of highly non-linear functions which makes them promising candidates. However, the optimization of parameters and hyperparameters is particularly complex. For parameter optimization, the classical approach of random initialization of weight matrices works well in the case of simple architectures, but is not feasible for deep architectures. For hyperparameter optimization of deep architectures, the classical cross-validation method is costly. In this article, we propose solutions using a conditional pre-training algorithm and incremental optimization strategies. Such solutions reduce the variance of the estimation process and enhances convergence of the learning algorithm. Our proposal is applied for personalized care of hemophiliac patients. Results show better performances than generative adversarial networks (baseline) and highlight the benefits of your contribution to predict the effect of treatments for patients.
Collapse
Affiliation(s)
- Cédric Buche
- ENIB, Brest, France,IRL 2010, CNRS, Adelaide, SA, Australia,*Correspondence: Cédric Buche ✉
| | | | | |
Collapse
|
15
|
Barioul R, Kanoun O. k-Tournament Grasshopper Extreme Learner for FMG-Based Gesture Recognition. SENSORS (BASEL, SWITZERLAND) 2023; 23:1096. [PMID: 36772136 PMCID: PMC9920645 DOI: 10.3390/s23031096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/30/2022] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
The recognition of hand signs is essential for several applications. Due to the variation of possible signals and the complexity of sensor-based systems for hand gesture recognition, a new artificial neural network algorithm providing high accuracy with a reduced architecture and automatic feature selection is needed. In this paper, a novel classification method based on an extreme learning machine (ELM), supported by an improved grasshopper optimization algorithm (GOA) as a core for a weight-pruning process, is proposed. The k-tournament grasshopper optimization algorithm was implemented to select and prune the ELM weights resulting in the proposed k-tournament grasshopper extreme learner (KTGEL) classifier. Myographic methods, such as force myography (FMG), deliver interesting signals that can build the basis for hand sign recognition. FMG was investigated to limit the number of sensors at suitable positions and provide adequate signal processing algorithms for perspective implementation in wearable embedded systems. Based on the proposed KTGEL, the number of sensors and the effect of the number of subjects was investigated in the first stage. It was shown that by increasing the number of subjects participating in the data collection, eight was the minimal number of sensors needed to result in acceptable sign recognition performance. Moreover, implemented with 3000 hidden nodes, after the feature selection wrapper, the ELM had both a microaverage precision and a microaverage sensitivity of 97% for the recognition of a set of gestures, including a middle ambiguity level. The KTGEL reduced the hidden nodes to only 1000, reaching the same total sensitivity with a reduced total precision of only 1% without needing an additional feature selection method.
Collapse
|
16
|
Tang H, Ling X, Li L, Xiong L, Yao Y, Huang X. One-shot pruning of gated recurrent unit neural network by sensitivity for time-series prediction. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
17
|
Schwarz Schuler JP, Also SR, Puig D, Rashwan H, Abdel-Nasser M. An Enhanced Scheme for Reducing the Complexity of Pointwise Convolutions in CNNs for Image Classification Based on Interleaved Grouped Filters without Divisibility Constraints. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1264. [PMID: 36141151 PMCID: PMC9497893 DOI: 10.3390/e24091264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 09/01/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
In image classification with Deep Convolutional Neural Networks (DCNNs), the number of parameters in pointwise convolutions rapidly grows due to the multiplication of the number of filters by the number of input channels that come from the previous layer. Existing studies demonstrated that a subnetwork can replace pointwise convolutional layers with significantly fewer parameters and fewer floating-point computations, while maintaining the learning capacity. In this paper, we propose an improved scheme for reducing the complexity of pointwise convolutions in DCNNs for image classification based on interleaved grouped filters without divisibility constraints. The proposed scheme utilizes grouped pointwise convolutions, in which each group processes a fraction of the input channels. It requires a number of channels per group as a hyperparameter Ch. The subnetwork of the proposed scheme contains two consecutive convolutional layers K and L, connected by an interleaving layer in the middle, and summed at the end. The number of groups of filters and filters per group for layers K and L is determined by exact divisions of the original number of input channels and filters by Ch. If the divisions were not exact, the original layer could not be substituted. In this paper, we refine the previous algorithm so that input channels are replicated and groups can have different numbers of filters to cope with non exact divisibility situations. Thus, the proposed scheme further reduces the number of floating-point computations (11%) and trainable parameters (10%) achieved by the previous method. We tested our optimization on an EfficientNet-B0 as a baseline architecture and made classification tests on the CIFAR-10, Colorectal Cancer Histology, and Malaria datasets. For each dataset, our optimization achieves a saving of 76%, 89%, and 91% of the number of trainable parameters of EfficientNet-B0, while keeping its test classification accuracy.
Collapse
Affiliation(s)
- Joao Paulo Schwarz Schuler
- Departament d’Enginyeria Informatica i Matemátiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Santiago Romani Also
- Departament d’Enginyeria Informatica i Matemátiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Domenec Puig
- Departament d’Enginyeria Informatica i Matemátiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Hatem Rashwan
- Departament d’Enginyeria Informatica i Matemátiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Mohamed Abdel-Nasser
- Departament d’Enginyeria Informatica i Matemátiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain
- Electronics and Communication Engineering Section, Electrical Engineering Department, Aswan University, Aswan 81528, Egypt
| |
Collapse
|
18
|
|
19
|
CorrNet: pearson correlation based pruning for efficient convolutional neural networks. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01624-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
20
|
Batch Gradient Training Method with Smoothing Group $$L_0$$ Regularization for Feedfoward Neural Networks. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10956-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
21
|
Straub J. Automating the design and development of gradient descent trained expert system networks. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109465] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
22
|
Ramchoun H, Ettaouil M. Convergence of batch gradient algorithm with smoothing composition of group $$l_{0}$$ and $$l_{1/2}$$ regularization for feedforward neural networks. PROGRESS IN ARTIFICIAL INTELLIGENCE 2022. [DOI: 10.1007/s13748-022-00285-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
Zhang M, Cao D, Lan X, Shi X, Gao J. An Ensemble-Learning Approach To Predict the Coke Yield of Commercial FCC Unit. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.1c04735] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Mengxuan Zhang
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum−Beijing, Beijing 102249, China
| | - Daofan Cao
- Department of Chemistry & Clean Energy Institute, Southern University of Science and Technology, Shenzhen 518055, China
| | - Xingying Lan
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum−Beijing, Beijing 102249, China
| | - Xiaogang Shi
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum−Beijing, Beijing 102249, China
| | - Jinsen Gao
- State Key Laboratory of Heavy Oil Processing, China University of Petroleum−Beijing, Beijing 102249, China
| |
Collapse
|
24
|
Torres LC, Castro CL, Rocha HP, Almeida GM, Braga AP. Multi-objective neural network model selection with a graph-based large margin approach. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.03.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
25
|
Hafizi A, Koolivand‐Salooki M, Esfandyari M, Koulivand M, Fallahiyekta M. Optimization of reaction parameters of Fischer–Tropsch synthesis in the presence of Co‐V/Al
2
O
3
nano‐catalyst. INT J CHEM KINET 2022. [DOI: 10.1002/kin.21577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Ali Hafizi
- Department of Chemical Engineering Shiraz University Shiraz Iran
| | | | | | - Mohsen Koulivand
- Organization for Educational Research and Planning (OERP) Tehran Iran
| | | |
Collapse
|
26
|
Dai W, Ao Y, Zhou L, Zhou P, Wang X. Incremental learning paradigm with privileged information for random vector functional-link networks: IRVFL+. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06793-y 10.1007/s00521-021-06793-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Real-Time Embedded Implementation of Improved Object Detector for Resource-Constrained Devices. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS 2022. [DOI: 10.3390/jlpea12020021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Artificial intelligence (A.I.) has revolutionised a wide range of human activities, including the accelerated development of autonomous vehicles. Self-navigating delivery robots are recent trends in A.I. applications such as multitarget object detection, image classification, and segmentation to tackle sociotechnical challenges, including the development of autonomous driving vehicles, surveillance systems, intelligent transportation, and smart traffic monitoring systems. In recent years, object detection and its deployment on embedded edge devices have seen a rise in interest compared to other perception tasks. Embedded edge devices have limited computing power, which impedes the deployment of efficient detection algorithms in resource-constrained environments. To improve on-board computational latency, edge devices often sacrifice performance, creating the need for highly efficient A.I. models. This research examines existing loss metrics and their weaknesses, and proposes an improved loss metric that can address the bounding box regression problem. Enhanced metrics were implemented in an ultraefficient YOLOv5 network and tested on the targeted datasets. The latest version of the PyTorch framework was incorporated in model development. The model was further deployed using the ROS 2 framework running on NVIDIA Jetson Xavier NX, an embedded development platform, to conduct the experiment in real time.
Collapse
|
28
|
Rethinking Weight Decay for Efficient Neural Network Pruning. J Imaging 2022; 8:jimaging8030064. [PMID: 35324619 PMCID: PMC8950981 DOI: 10.3390/jimaging8030064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/25/2022] [Accepted: 02/27/2022] [Indexed: 02/05/2023] Open
Abstract
Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.
Collapse
|
29
|
Ioannidis VN, Chen S, Giannakis GB. Efficient and Stable Graph Scattering Transforms via Pruning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:1232-1246. [PMID: 32946387 DOI: 10.1109/tpami.2020.3025258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Graph convolutional networks (GCNs) have well-documented performance in various graph learning tasks, but their analysis is still at its infancy. Graph scattering transforms (GSTs) offer training-free deep GCN models that extract features from graph data, and are amenable to generalization and stability analyses. The price paid by GSTs is exponential complexity in space and time that increases with the number of layers. This discourages deployment of GSTs when a deep architecture is needed. The present work addresses the complexity limitation of GSTs by introducing an efficient so-termed pruned (p)GST approach. The resultant pruning algorithm is guided by a graph-spectrum-inspired criterion, and retains informative scattering features on-the-fly while bypassing the exponential complexity associated with GSTs. Stability of the novel pGSTs is also established when the input graph data or the network structure are perturbed. Furthermore, the sensitivity of pGST to random and localized signal perturbations is investigated analytically and experimentally. Numerical tests showcase that pGST performs comparably to the baseline GST at considerable computational savings. Furthermore, pGST achieves comparable performance to state-of-the-art GCNs in graph and 3D point cloud classification tasks. Upon analyzing the pGST pruning patterns, it is shown that graph data in different domains call for different network architectures, and that the pruning algorithm may be employed to guide the design choices for contemporary GCNs.
Collapse
|
30
|
Wang M, Yang X, Qian Y, Lei Y, Cai J, Huan Z, Lin X, Dong H. Adaptive Neural Network Structure Optimization Algorithm Based on Dynamic Nodes. Curr Issues Mol Biol 2022; 44:817-832. [PMID: 35723341 PMCID: PMC8929060 DOI: 10.3390/cimb44020056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 01/30/2022] [Accepted: 01/31/2022] [Indexed: 11/16/2022] Open
Abstract
Large-scale artificial neural networks have many redundant structures, making the network fall into the issue of local optimization and extended training time. Moreover, existing neural network topology optimization algorithms have the disadvantage of many calculations and complex network structure modeling. We propose a Dynamic Node-based neural network Structure optimization algorithm (DNS) to handle these issues. DNS consists of two steps: the generation step and the pruning step. In the generation step, the network generates hidden layers layer by layer until accuracy reaches the threshold. Then, the network uses a pruning algorithm based on Hebb’s rule or Pearson’s correlation for adaptation in the pruning step. In addition, we combine genetic algorithm to optimize DNS (GA-DNS). Experimental results show that compared with traditional neural network topology optimization algorithms, GA-DNS can generate neural networks with higher construction efficiency, lower structure complexity, and higher classification accuracy.
Collapse
Affiliation(s)
- Miao Wang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
| | - Xu Yang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
- Correspondence: ; Tel.: +86-010-6891-346
| | - Yunchong Qian
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
| | - Yunlin Lei
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
| | - Jian Cai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
| | - Ziyi Huan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
| | - Xialv Lin
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; (M.W.); (Y.Q.); (Y.L.); (J.C.); (Z.H.); (X.L.)
| | - Hao Dong
- Suzhou Automotive Research Institute, Tsinghua University, Suzhou 215299, China;
| |
Collapse
|
31
|
Dai W, Ao Y, Zhou L, Zhou P, Wang X. Incremental learning paradigm with privileged information for random vector functional-link networks: IRVFL+. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06793-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
32
|
Katsaouni N, Aul F, Krischker L, Schmalhofer S, Hedrich L, Schulz MH. Energy efficient convolutional neural networks for arrhythmia detection. ARRAY 2022. [DOI: 10.1016/j.array.2022.100127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
33
|
Abstract
Deep networks often possess a vast number of parameters, and their significant redundancy in parameterization has become a widely-recognized property. This presents significant challenges and restricts many deep learning applications, making the focus on reducing the complexity of models while maintaining their powerful performance. In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks. We consider not only pruning methods but also quantization methods, and low-rank factorization methods. This review also intends to clarify these major concepts, and highlights their characteristics, advantages, and shortcomings.
Collapse
|
34
|
Sankar R, Rougier NP, Leblois A. Computational benefits of structural plasticity, illustrated in songbirds. Neurosci Biobehav Rev 2021; 132:1183-1196. [PMID: 34801257 DOI: 10.1016/j.neubiorev.2021.10.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 10/13/2021] [Accepted: 10/25/2021] [Indexed: 11/29/2022]
Abstract
The plasticity of nervous systems allows animals to quickly adapt to a changing environment. In particular, the structural plasticity of brain networks is often critical to the development of the central nervous system and the acquisition of complex behaviors. As an example, structural plasticity is central to the development of song-related brain circuits and may be critical for song acquisition in juvenile songbirds. Here, we review current evidences for structural plasticity and their significance from a computational point of view. We start by reviewing evidence for structural plasticity across species and categorizing them along the spatial axes as well as the along the time course during development. We introduce the vocal learning circuitry in zebra finches, as a useful example of structural plasticity, and use this specific case to explore the possible contributions of structural plasticity to computational models. Finally, we discuss current modeling studies incorporating structural plasticity and unexplored questions which are raised by such models.
Collapse
Affiliation(s)
- Remya Sankar
- Inria Bordeaux Sud-Ouest, Talence, France; Institut des Maladies Neurodégénératives, Université de Bordeaux, Bordeaux, France; Institut des Maladies Neurodégénératives, CNRS, UMR 5293, France; LaBRI, Université de Bordeaux, INP, CNRS, UMR 5800, Talence, France
| | - Nicolas P Rougier
- Inria Bordeaux Sud-Ouest, Talence, France; Institut des Maladies Neurodégénératives, Université de Bordeaux, Bordeaux, France; Institut des Maladies Neurodégénératives, CNRS, UMR 5293, France; LaBRI, Université de Bordeaux, INP, CNRS, UMR 5800, Talence, France
| | - Arthur Leblois
- Institut des Maladies Neurodégénératives, Université de Bordeaux, Bordeaux, France; Institut des Maladies Neurodégénératives, CNRS, UMR 5293, France.
| |
Collapse
|
35
|
Ademola OA, Leier M, Petlenkov E. Evaluation of Deep Neural Network Compression Methods for Edge Devices Using Weighted Score-Based Ranking Scheme. SENSORS (BASEL, SWITZERLAND) 2021; 21:7529. [PMID: 34833610 PMCID: PMC8622199 DOI: 10.3390/s21227529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 11/04/2021] [Accepted: 11/05/2021] [Indexed: 12/02/2022]
Abstract
The demand for object detection capability in edge computing systems has surged. As such, the need for lightweight Convolutional Neural Network (CNN)-based object detection models has become a focal point. Current models are large in memory and deployment in edge devices is demanding. This shows that the models need to be optimized for the hardware without performance degradation. There exist several model compression methods; however, determining the most efficient method is of major concern. Our goal was to rank the performance of these methods using our application as a case study. We aimed to develop a real-time vehicle tracking system for cargo ships. To address this, we developed a weighted score-based ranking scheme that utilizes the model performance metrics. We demonstrated the effectiveness of this method by applying it on the baseline, compressed, and micro-CNN models trained on our dataset. The result showed that quantization is the most efficient compression method for the application, having the highest rank, with an average weighted score of 9.00, followed by binarization, having an average weighted score of 8.07. Our proposed method is extendable and can be used as a framework for the selection of suitable model compression methods for edge devices in different applications.
Collapse
Affiliation(s)
- Olutosin Ajibola Ademola
- Embedded AI Research Laboratory, Department of Computer Systems, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia;
| | - Mairo Leier
- Embedded AI Research Laboratory, Department of Computer Systems, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia;
| | - Eduard Petlenkov
- Centre for Intelligent Systems, Department of Computer Systems, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia;
| |
Collapse
|
36
|
Tonellotto N, Gotta A, Nardini FM, Gadler D, Silvestri F. Neural network quantization in federated learning at the edge. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.039] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
37
|
|
38
|
Wang J, Jiang T, Cui Z, Cao Z. Filter pruning with a feature map entropy importance criterion for convolution neural networks compressing. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.07.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
39
|
Zang K, Wu W, Luo W. Deep Sparse Learning for Automatic Modulation Classification Using Recurrent Neural Networks. SENSORS 2021; 21:s21196410. [PMID: 34640730 PMCID: PMC8512957 DOI: 10.3390/s21196410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/19/2021] [Accepted: 09/23/2021] [Indexed: 11/16/2022]
Abstract
Deep learning models, especially recurrent neural networks (RNNs), have been successfully applied to automatic modulation classification (AMC) problems recently. However, deep neural networks are usually overparameterized, i.e., most of the connections between neurons are redundant. The large model size hinders the deployment of deep neural networks in applications such as Internet-of-Things (IoT) networks. Therefore, reducing parameters without compromising the network performance via sparse learning is often desirable since it can alleviates the computational and storage burdens of deep learning models. In this paper, we propose a sparse learning algorithm that can directly train a sparsely connected neural network based on the statistics of weight magnitude and gradient momentum. We first used the MNIST and CIFAR10 datasets to demonstrate the effectiveness of this method. Subsequently, we applied it to RNNs with different pruning strategies on recurrent and non-recurrent connections for AMC problems. Experimental results demonstrated that the proposed method can effectively reduce the parameters of the neural networks while maintaining model performance. Moreover, we show that appropriate sparsity can further improve network generalization ability.
Collapse
Affiliation(s)
- Ke Zang
- College of Biomedical Engineering and Instrument Science, Yuquan Campus, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China; (K.Z.); (W.W.)
| | - Wenqi Wu
- College of Biomedical Engineering and Instrument Science, Yuquan Campus, Zhejiang University, 38 Zheda Road, Hangzhou 310027, China; (K.Z.); (W.W.)
| | - Wei Luo
- Department of Biomedical Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
- Correspondence:
| |
Collapse
|
40
|
Paluzo-Hidalgo E, Gonzalez-Diaz R, Gutiérrez-Naranjo MA, Heras J. Optimizing the Simplicial-Map Neural Network Architecture. J Imaging 2021; 7:173. [PMID: 34564099 PMCID: PMC8466576 DOI: 10.3390/jimaging7090173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/13/2021] [Accepted: 08/27/2021] [Indexed: 12/05/2022] Open
Abstract
Simplicial-map neural networks are a recent neural network architecture induced by simplicial maps defined between simplicial complexes. It has been proved that simplicial-map neural networks are universal approximators and that they can be refined to be robust to adversarial attacks. In this paper, the refinement toward robustness is optimized by reducing the number of simplices (i.e., nodes) needed. We have shown experimentally that such a refined neural network is equivalent to the original network as a classification tool but requires much less storage.
Collapse
Affiliation(s)
| | - Rocio Gonzalez-Diaz
- Department of Applied Mathematics I, University of Sevilla, 41012 Sevilla, Spain;
| | | | - Jónathan Heras
- Department of Mathematics and Computer Science, University of La Rioja, 26004 Logroño, Spain;
| |
Collapse
|
41
|
Wang X, Wang J, Zhang K, Lin F, Chang Q. Convergence and objective functions of noise-injected multilayer perceptrons with hidden multipliers. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.03.119] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
42
|
Abbasi-Asl R, Yu B. Structural Compression of Convolutional Neural Networks with Applications in Interpretability. Front Big Data 2021; 4:704182. [PMID: 34514381 PMCID: PMC8427695 DOI: 10.3389/fdata.2021.704182] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 08/06/2021] [Indexed: 11/13/2022] Open
Abstract
Deep convolutional neural networks (CNNs) have been successful in many tasks in machine vision, however, millions of weights in the form of thousands of convolutional filters in CNNs make them difficult for human interpretation or understanding in science. In this article, we introduce a greedy structural compression scheme to obtain smaller and more interpretable CNNs, while achieving close to original accuracy. The compression is based on pruning filters with the least contribution to the classification accuracy or the lowest Classification Accuracy Reduction (CAR) importance index. We demonstrate the interpretability of CAR-compressed CNNs by showing that our algorithm prunes filters with visually redundant functionalities such as color filters. These compressed networks are easier to interpret because they retain the filter diversity of uncompressed networks with an order of magnitude fewer filters. Finally, a variant of CAR is introduced to quantify the importance of each image category to each CNN filter. Specifically, the most and the least important class labels are shown to be meaningful interpretations of each filter.
Collapse
Affiliation(s)
- Reza Abbasi-Asl
- Department of Neurology, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States
- Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA, United States
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, United States
| | - Bin Yu
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, United States
- Department of Statistics, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
43
|
Awasthi N, Dayal A, Cenkeramaddi LR, Yalavarthy PK. Mini-COVIDNet: Efficient Lightweight Deep Neural Network for Ultrasound Based Point-of-Care Detection of COVID-19. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2021; 68:2023-2037. [PMID: 33755565 PMCID: PMC8544932 DOI: 10.1109/tuffc.2021.3068190] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 03/19/2021] [Indexed: 05/15/2023]
Abstract
Lung ultrasound (US) imaging has the potential to be an effective point-of-care test for detection of COVID-19, due to its ease of operation with minimal personal protection equipment along with easy disinfection. The current state-of-the-art deep learning models for detection of COVID-19 are heavy models that may not be easy to deploy in commonly utilized mobile platforms in point-of-care testing. In this work, we develop a lightweight mobile friendly efficient deep learning model for detection of COVID-19 using lung US images. Three different classes including COVID-19, pneumonia, and healthy were included in this task. The developed network, named as Mini-COVIDNet, was bench-marked with other lightweight neural network models along with state-of-the-art heavy model. It was shown that the proposed network can achieve the highest accuracy of 83.2% and requires a training time of only 24 min. The proposed Mini-COVIDNet has 4.39 times less number of parameters in the network compared to its next best performing network and requires a memory of only 51.29 MB, making the point-of-care detection of COVID-19 using lung US imaging plausible on a mobile platform. Deployment of these lightweight networks on embedded platforms shows that the proposed Mini-COVIDNet is highly versatile and provides optimal performance in terms of being accurate as well as having latency in the same order as other lightweight networks. The developed lightweight models are available at https://github.com/navchetan-awasthi/Mini-COVIDNet.
Collapse
Affiliation(s)
- Navchetan Awasthi
- Massachusetts General HospitalBostonMA02114USA
- Department of MedicineHarvard UniversityCambridgeMA02138USA
| | - Aveen Dayal
- Department of Information and Communication TechnologyUniversity of Agder4879GrimstadNorway
| | | | | |
Collapse
|
44
|
Tan K, Wang D. Towards Model Compression for Deep Learning Based Speech Enhancement. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2021; 29:1785-1794. [PMID: 34179220 PMCID: PMC8224477 DOI: 10.1109/taslp.2021.3082282] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
The use of deep neural networks (DNNs) has dramatically elevated the performance of speech enhancement over the last decade. However, to achieve strong enhancement performance typically requires a large DNN, which is both memory and computation consuming, making it difficult to deploy such speech enhancement systems on devices with limited hardware resources or in applications with strict latency requirements. In this study, we propose two compression pipelines to reduce the model size for DNN-based speech enhancement, which incorporates three different techniques: sparse regularization, iterative pruning and clustering-based quantization. We systematically investigate these techniques and evaluate the proposed compression pipelines. Experimental results demonstrate that our approach reduces the sizes of four different models by large margins without significantly sacrificing their enhancement performance. In addition, we find that the proposed approach performs well on speaker separation, which further demonstrates the effectiveness of the approach for compressing speech separation models.
Collapse
Affiliation(s)
- Ke Tan
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210-1277 USA
| | - DeLiang Wang
- Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210-1277, USA
| |
Collapse
|
45
|
Kang Q, Fan Q, Zurada JM. Deterministic convergence analysis via smoothing group Lasso regularization and adaptive momentum for Sigma-Pi-Sigma neural network. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
46
|
Bejani MM, Ghatee M. A systematic review on overfitting control in shallow and deep neural networks. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-09975-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
47
|
Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks. SENSORS 2021; 21:s21030880. [PMID: 33525527 PMCID: PMC7865320 DOI: 10.3390/s21030880] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 01/25/2021] [Accepted: 01/25/2021] [Indexed: 11/17/2022]
Abstract
Deep neural networks have evolved significantly in the past decades and are now able to achieve better progression of sensor data. Nonetheless, most of the deep models verify the ruling maxim in deep learning-bigger is better-so they have very complex structures. As the models become more complex, the computational complexity and resource consumption of these deep models are increasing significantly, making them difficult to perform on resource-limited platforms, such as sensor platforms. In this paper, we observe that different layers often have different pruning requirements, and propose a differential evolutionary layer-wise weight pruning method. Firstly, the pruning sensitivity of each layer is analyzed, and then the network is compressed by iterating the weight pruning process. Unlike some other methods that deal with pruning ratio by greedy ways or statistical analysis, we establish an optimization model to find the optimal pruning sensitivity set for each layer. Differential evolution is an effective method based on population optimization which can be used to address this task. Furthermore, we adopt a strategy to recovery some of the removed connections to increase the capacity of the pruned model during the fine-tuning phase. The effectiveness of our method has been demonstrated in experimental studies. Our method compresses the number of weight parameters in LeNet-300-100, LeNet-5, AlexNet and VGG16 by 24×, 14×, 29× and 12×, respectively.
Collapse
|
48
|
Berggren K, Xia Q, Likharev KK, Strukov DB, Jiang H, Mikolajick T, Querlioz D, Salinga M, Erickson JR, Pi S, Xiong F, Lin P, Li C, Chen Y, Xiong S, Hoskins BD, Daniels MW, Madhavan A, Liddle JA, McClelland JJ, Yang Y, Rupp J, Nonnenmann SS, Cheng KT, Gong N, Lastras-Montaño MA, Talin AA, Salleo A, Shastri BJ, de Lima TF, Prucnal P, Tait AN, Shen Y, Meng H, Roques-Carmes C, Cheng Z, Bhaskaran H, Jariwala D, Wang H, Shainline JM, Segall K, Yang JJ, Roy K, Datta S, Raychowdhury A. Roadmap on emerging hardware and technology for machine learning. NANOTECHNOLOGY 2021; 32:012002. [PMID: 32679577 DOI: 10.1088/1361-6528/aba70f] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent progress in artificial intelligence is largely attributed to the rapid development of machine learning, especially in the algorithm and neural network models. However, it is the performance of the hardware, in particular the energy efficiency of a computing system that sets the fundamental limit of the capability of machine learning. Data-centric computing requires a revolution in hardware systems, since traditional digital computers based on transistors and the von Neumann architecture were not purposely designed for neuromorphic computing. A hardware platform based on emerging devices and new architecture is the hope for future computing with dramatically improved throughput and energy efficiency. Building such a system, nevertheless, faces a number of challenges, ranging from materials selection, device optimization, circuit fabrication and system integration, to name a few. The aim of this Roadmap is to present a snapshot of emerging hardware technologies that are potentially beneficial for machine learning, providing the Nanotechnology readers with a perspective of challenges and opportunities in this burgeoning field.
Collapse
Affiliation(s)
- Karl Berggren
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America
| | - Qiangfei Xia
- Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA, United States of America
| | | | - Dmitri B Strukov
- Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA 93106, United States of America
| | - Hao Jiang
- School of Engineering & Applied Science Yale University, CT, United States of America
| | | | | | - Martin Salinga
- Institut für Materialphysik, Westfälische Wilhelms-Universität Münster, Germany
| | - John R Erickson
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261, United States of America
| | - Shuang Pi
- Lam Research, Fremont, CA, United States of America
| | - Feng Xiong
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261, United States of America
| | - Peng Lin
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America
| | - Can Li
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Yu Chen
- School of information science and technology, Fudan University, Shanghai, People's Republic of China
| | - Shisheng Xiong
- School of information science and technology, Fudan University, Shanghai, People's Republic of China
| | - Brian D Hoskins
- Physical Measurements Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States of America
| | - Matthew W Daniels
- Physical Measurements Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States of America
| | - Advait Madhavan
- Physical Measurements Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States of America
- Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD, United States of America
| | - James A Liddle
- Physical Measurements Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States of America
| | - Jabez J McClelland
- Physical Measurements Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, United States of America
| | - Yuchao Yang
- School of Electronics Engineering and Computer Science, Peking University, Beijing, People's Republic of China
| | - Jennifer Rupp
- Department of Materials Science and Engineering and Department of Electrical Engineering & Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America
- Electrochemical Materials, ETHZ Department of Materials, Hönggerbergring 64, Zürich 8093, Switzerland
| | - Stephen S Nonnenmann
- Department of Mechanical & Industrial Engineering, University of Massachusetts-Amherst, MA, United States of America
| | - Kwang-Ting Cheng
- School of Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, People's Republic of China
| | - Nanbo Gong
- IBM T J Watson Research Center, Yorktown Heights, NY 10598, United States of America
| | - Miguel Angel Lastras-Montaño
- Instituto de Investigación en Comunicación Óptica, Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, México
| | - A Alec Talin
- Sandia National Laboratories, Livermore, CA 94551, United States of America
| | - Alberto Salleo
- Department of Materials Science and Engineering, Stanford University, Stanford, California, United States of America
| | - Bhavin J Shastri
- Department of Physics, Engineering Physics & Astronomy, Queen's University, Kingston ON KL7 3N6, Canada
| | - Thomas Ferreira de Lima
- Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, United States of America
| | - Paul Prucnal
- Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, United States of America
| | - Alexander N Tait
- Physical Measurement Laboratory, National Institute of Standards and Technology (NIST), Boulder, CO 80305, United States of America
| | - Yichen Shen
- Lightelligence, 268 Summer Street, Boston, MA 02210, United States of America
| | - Huaiyu Meng
- Lightelligence, 268 Summer Street, Boston, MA 02210, United States of America
| | - Charles Roques-Carmes
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America
| | - Zengguang Cheng
- Department of Materials, University of Oxford, Oxford OX1 3PH, United Kingdom
- State Key Laboratory of ASIC and System, School of Microelectronics, Fudan University, Shanghai 200433, People's Republic of China
| | - Harish Bhaskaran
- Department of Materials, University of Oxford, Oxford OX1 3PH, United Kingdom
| | - Deep Jariwala
- Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia PA 19104, United States of America
| | - Han Wang
- University of Southern California, Los Angeles, CA 90089, United States of America
| | - Jeffrey M Shainline
- Physical Measurement Laboratory, National Institute of Standards and Technology (NIST), Boulder, CO 80305, United States of America
| | - Kenneth Segall
- Department of Physics and Astronomy, Colgate University, NY 13346, United States of America
| | - J Joshua Yang
- Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA, United States of America
| | - Kaushik Roy
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, United States of America
| | - Suman Datta
- University of Notre Dame, Notre Dame, IN 46556, United States of America
| | - Arijit Raychowdhury
- Georgia Institute of Technology, Atlanta, GA 30332, United States of America
| |
Collapse
|
49
|
Cho H, Jang J, Lee C, Yang S. Efficient architecture for deep neural networks with heterogeneous sensitivity. Neural Netw 2020; 134:95-106. [PMID: 33302052 DOI: 10.1016/j.neunet.2020.10.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 08/27/2020] [Accepted: 10/30/2020] [Indexed: 10/23/2022]
Abstract
In this study, we present a neural network that consists of nodes with heterogeneous sensitivity. Each node in a network is assigned a variable that determines the sensitivity with which it learns to perform a given task. The network is trained via a constrained optimization that maximizes the sparsity of the sensitivity variables while ensuring optimal network performance. As a result, the network learns to perform a given task using only a few sensitive nodes. Insensitive nodes, which are nodes with zero sensitivity, can be removed from a trained network to obtain a computationally efficient network. Removing zero-sensitivity nodes has no effect on the performance of the network because the network has already been trained to perform the task without them. The regularization parameter used to solve the optimization problem was simultaneously found during the training of the networks. To validate our approach, we designed networks with computationally efficient architectures for various tasks such as autoregression, object recognition, facial expression recognition, and object detection using various datasets. In our experiments, the networks designed by our proposed method provided the same or higher performances but with far less computational complexity.
Collapse
Affiliation(s)
- Hyunjoong Cho
- School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| | - Jinhyeok Jang
- Electronics and Telecommunications Research Institute (ETRI), Daejeon, Republic of Korea
| | - Chanhyeok Lee
- School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| | - Seungjoon Yang
- School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea.
| |
Collapse
|
50
|
He Y, Dong X, Kang G, Fu Y, Yan C, Yang Y. Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:3594-3604. [PMID: 31478883 DOI: 10.1109/tcyb.2019.2933477] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Deeper and wider convolutional neural networks (CNNs) achieve superior performance but bring expensive computation cost. Accelerating such overparameterized neural network has received increased attention. A typical pruning algorithm is a three-stage pipeline, i.e., training, pruning, and retraining. Prevailing approaches fix the pruned filters to zero during retraining and, thus, significantly reduce the optimization space. Besides, they directly prune a large number of filters at first, which would cause unrecoverable information loss. To solve these problems, we propose an asymptotic soft filter pruning (ASFP) method to accelerate the inference procedure of the deep neural networks. First, we update the pruned filters during the retraining stage. As a result, the optimization space of the pruned model would not be reduced but be the same as that of the original model. In this way, the model has enough capacity to learn from the training data. Second, we prune the network asymptotically. We prune few filters at first and asymptotically prune more filters during the training procedure. With asymptotic pruning, the information of the training set would be gradually concentrated in the remaining filters, so the subsequent training and pruning process would be stable. The experiments show the effectiveness of our ASFP on image classification benchmarks. Notably, on ILSVRC-2012, our ASFP reduces more than 40% FLOPs on ResNet-50 with only 0.14% top-5 accuracy degradation, which is higher than the soft filter pruning by 8%.
Collapse
|