Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

75
(from Reference Citation Analysis)

Article PDFs (32)

Cited by ≥ 1 (47)

Searched Name

deep convolutional neural networks

Year Published

Show more Refine

Article Statistics

Refine

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Journal Articles

Rank	Citation Analysis	Article Type	Number of Years	Citation(s) in RCA
1	Wu N, Phang J, Park J, Shen Y, Huang Z, Zorin M, Jastrzebski S, Fevry T, Katsnelson J, Kim E, Wolfson S, Parikh U, Gaddam S, Lin LLY, Ho K, Weinstein JD, Reig B, Gao Y, Toth H, Pysarenko K, Lewin A, Lee J, Airola K, Mema E, Chung S, Hwang E, Samreen N, Kim SG, Heacock L, Moy L, Cho K, Geras KJ. Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020;39:1184-1194. [PMID: 31603772 PMCID: PMC7427471 DOI: 10.1109/tmi.2019.2945514] [Citation(s) in RCA: 239] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023] Abstract We present a deep convolutional neural network for breast cancer screening exam classification, trained, and evaluated on over 200000 exams (over 1000000 images). Our network achieves an AUC of 0.895 in predicting the presence of cancer in the breast, when tested on the screening population. We attribute the high accuracy to a few technical advances. 1) Our network's novel two-stage architecture and training procedure, which allows us to use a high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. 2) A custom ResNet-based network used as a building block of our model, whose balance of depth and width is optimized for high-resolution medical images. 3) Pretraining the network on screening BI-RADS classification, a related task with more noisy labels. 4) Combining multiple input views in an optimal way among a number of possible choices. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and show that our model is as accurate as experienced radiologists when presented with the same data. We also show that a hybrid model, averaging the probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To further understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, the model's design, training procedure, errors, and properties of its internal representations. Our best models are publicly available at https://github.com/nyukat/breast_cancer_classifier. Collapse Key Words deep learning deep convolutional neural networks breast cancer screening mammography Collapse MESH Headings Breast/diagnostic imaging Breast Neoplasms/diagnostic imaging Deep Learning Early Detection of Cancer/methods Female Humans Image Interpretation, Computer-Assisted/methods Mammography/methods Radiologists Collapse Grants P41 EB017183 NIBIB NIH HHS R21 CA225175 NCI NIH HHS Collapse	Research Support, N.I.H., Extramural	5	239
2	Fuentes A, Yoon S, Kim SC, Park DS. A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition. SENSORS 2017;17:s17092022. [PMID: 28869539 PMCID: PMC5620500 DOI: 10.3390/s17092022] [Citation(s) in RCA: 223] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 08/24/2017] [Accepted: 08/28/2017] [Indexed: 01/18/2023] Abstract Plant Diseases and Pests are a major challenge in the agriculture sector. An accurate and a faster detection of diseases and pests in plants could help to develop an early treatment technique while substantially reducing economic losses. Recent developments in Deep Neural Networks have allowed researchers to drastically improve the accuracy of object detection and recognition systems. In this paper, we present a deep-learning-based approach to detect diseases and pests in tomato plants using images captured in-place by camera devices with various resolutions. Our goal is to find the more suitable deep-learning architecture for our task. Therefore, we consider three main families of detectors: Faster Region-based Convolutional Neural Network (Faster R-CNN), Region-based Fully Convolutional Network (R-FCN), and Single Shot Multibox Detector (SSD), which for the purpose of this work are called "deep learning meta-architectures". We combine each of these meta-architectures with "deep feature extractors" such as VGG net and Residual Network (ResNet). We demonstrate the performance of deep meta-architectures and feature extractors, and additionally propose a method for local and global class annotation and data augmentation to increase the accuracy and reduce the number of false positives during training. We train and test our systems end-to-end on our large Tomato Diseases and Pests Dataset, which contains challenging images with diseases and pests, including several inter- and extra-class variations, such as infection status and location in the plant. Experimental results show that our proposed system can effectively recognize nine different types of diseases and pests, with the ability to deal with complex scenarios from a plant's surrounding area. Collapse Key Words deep convolutional neural networks detection pest plant disease real-time processing Collapse MESH Headings Image Processing, Computer-Assisted Solanum lycopersicum Neural Networks, Computer Plant Diseases Collapse Grants Collapse	Journal Article	8	223
3	Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 2019;87:1149-1164. [PMID: 31365149 PMCID: PMC6851476 DOI: 10.1002/prot.25792] [Citation(s) in RCA: 131] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/14/2019] [Accepted: 07/27/2019] [Indexed: 12/28/2022] Abstract We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues. Collapse Key Words CASP13 ab initio folding contact prediction deep convolutional neural networks deep multiple sequence alignment protein structure prediction Collapse MESH Headings Algorithms Amino Acid Sequence/genetics Computational Biology Databases, Protein Deep Learning Models, Molecular Neural Networks, Computer Protein Conformation Protein Folding Proteins/chemistry Proteins/genetics Proteins/ultrastructure Sequence Alignment Software Collapse Grants R01 GM116960 NIGMS NIH HHS R01 GM084222 NIGMS NIH HHS GM083107 NIGMS NIH HHS R01 GM083107 NIGMS NIH HHS GM116960 NIGMS NIH HHS AI134678 NIAID NIH HHS R01 AI134678 NIAID NIH HHS Collapse	Research Support, N.I.H., Extramural	6	131
4	Nasrullah N, Sang J, Alam MS, Mateen M, Cai B, Hu H. Automated Lung Nodule Detection and Classification Using Deep Learning Combined with Multiple Strategies. SENSORS 2019;19:s19173722. [PMID: 31466261 PMCID: PMC6749467 DOI: 10.3390/s19173722] [Citation(s) in RCA: 125] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 08/13/2019] [Accepted: 08/26/2019] [Indexed: 01/12/2023] Abstract Lung cancer is one of the major causes of cancer-related deaths due to its aggressive nature and delayed detections at advanced stages. Early detection of lung cancer is very important for the survival of an individual, and is a significant challenging problem. Generally, chest radiographs (X-ray) and computed tomography (CT) scans are used initially for the diagnosis of the malignant nodules; however, the possible existence of benign nodules leads to erroneous decisions. At early stages, the benign and the malignant nodules show very close resemblance to each other. In this paper, a novel deep learning-based model with multiple strategies is proposed for the precise diagnosis of the malignant nodules. Due to the recent achievements of deep convolutional neural networks (CNN) in image analysis, we have used two deep three-dimensional (3D) customized mixed link network (CMixNet) architectures for lung nodule detection and classification, respectively. Nodule detections were performed through faster R-CNN on efficiently-learned features from CMixNet and U-Net like encoder-decoder architecture. Classification of the nodules was performed through a gradient boosting machine (GBM) on the learned features from the designed 3D CMixNet structure. To reduce false positives and misdiagnosis results due to different types of errors, the final decision was performed in connection with physiological symptoms and clinical biomarkers. With the advent of the internet of things (IoT) and electro-medical technology, wireless body area networks (WBANs) provide continuous monitoring of patients, which helps in diagnosis of chronic diseases-especially metastatic cancers. The deep learning model for nodules' detection and classification, combined with clinical factors, helps in the reduction of misdiagnosis and false positive (FP) results in early-stage lung cancer diagnosis. The proposed system was evaluated on LIDC-IDRI datasets in the form of sensitivity (94%) and specificity (91%), and better results were obatined compared to the existing methods. Collapse Key Words clinical biomarkers deep convolutional neural networks internet of things pulmonary nodules wireless body area networks Collapse MESH Headings Databases, Factual Deep Learning Diagnosis, Computer-Assisted Early Detection of Cancer Humans Image Processing, Computer-Assisted/methods Internet of Things Lung/diagnostic imaging Lung/physiology Lung Neoplasms/diagnosis Lung Neoplasms/diagnostic imaging Lung Neoplasms/pathology Neoplasms/diagnosis Neoplasms/diagnostic imaging Neoplasms/pathology Neural Networks, Computer Radiographic Image Interpretation, Computer-Assisted/methods Tomography, X-Ray Computed Wireless Technology Collapse Grants Collapse	Journal Article	6	125
5	Guo L, Wang T, Wu Z, Wang J, Wang M, Cui Z, Ji S, Cai J, Xu C, Chen X. Portable Food-Freshness Prediction Platform Based on Colorimetric Barcode Combinatorics and Deep Convolutional Neural Networks. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020;32:e2004805. [PMID: 33006183 DOI: 10.1002/adma.202004805] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/24/2020] [Indexed: 05/14/2023] Abstract Artificial scent screening systems (known as electronic noses, E-noses) have been researched extensively. A portable, automatic, and accurate, real-time E-nose requires both robust cross-reactive sensing and fingerprint pattern recognition. Few E-noses have been commercialized because they suffer from either sensing or pattern-recognition issues. Here, cross-reactive colorimetric barcode combinatorics and deep convolutional neural networks (DCNNs) are combined to form a system for monitoring meat freshness that concurrently provides scent fingerprint and fingerprint recognition. The barcodes-comprising 20 different types of porous nanocomposites of chitosan, dye, and cellulose acetate-form scent fingerprints that are identifiable by DCNN. A fully supervised DCNN trained using 3475 labeled barcode images predicts meat freshness with an overall accuracy of 98.5%. Incorporating DCNN into a smartphone application forms a simple platform for rapid barcode scanning and identification of food freshness in real time. The system is fast, accurate, and non-destructive, enabling consumers and all stakeholders in the food supply chain to monitor food freshness. Collapse Key Words colorimetric barcode combinatorics deep convolutional neural networks food freshness Collapse MESH Headings Cellulose/analogs & derivatives Cellulose/chemistry Chitosan/chemistry Colorimetry/instrumentation Coloring Agents/chemistry Deep Learning Food Quality Nanocomposites/chemistry Porosity Collapse Grants Collapse		5	92
6	Zheng YY, Kong JL, Jin XB, Wang XY, Su TL, Zuo M. CropDeep: The Crop Vision Dataset for Deep-Learning-Based Classification and Detection in Precision Agriculture. SENSORS 2019;19:s19051058. [PMID: 30832283 PMCID: PMC6427818 DOI: 10.3390/s19051058] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 02/25/2019] [Accepted: 02/26/2019] [Indexed: 02/04/2023] Abstract Intelligence has been considered as the major challenge in promoting economic potential and production efficiency of precision agriculture. In order to apply advanced deep-learning technology to complete various agricultural tasks in online and offline ways, a large number of crop vision datasets with domain-specific annotation are urgently needed. To encourage further progress in challenging realistic agricultural conditions, we present the CropDeep species classification and detection dataset, consisting of 31,147 images with over 49,000 annotated instances from 31 different classes. In contrast to existing vision datasets, images were collected with different cameras and equipment in greenhouses, captured in a wide variety of situations. It features visually similar species and periodic changes with more representative annotations, which have supported a stronger benchmark for deep-learning-based classification and detection. To further verify the application prospect, we provide extensive baseline experiments using state-of-the-art deep-learning classification and detection models. Results show that current deep-learning-based methods achieve well performance in classification accuracy over 99%. While current deep-learning methods achieve only 92% detection accuracy, illustrating the difficulty of the dataset and improvement room of state-of-the-art deep-learning models when applied to crops production and management. Specifically, we suggest that the YOLOv3 network has good potential application in agricultural detection tasks. Collapse Key Words Internet of Things agricultural autonomous robots deep convolutional neural networks greenhouse real-time online processing Collapse MESH Headings Agriculture/methods Algorithms Deep Learning Machine Learning Neural Networks, Computer Collapse Grants Collapse	Journal Article	6	91
7	Xie J, Liu R, Luttrell J, Zhang C. Deep Learning Based Analysis of Histopathological Images of Breast Cancer. Front Genet 2019;10:80. [PMID: 30838023 PMCID: PMC6390493 DOI: 10.3389/fgene.2019.00080] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 01/28/2019] [Indexed: 01/04/2023] Open Abstract Breast cancer is associated with the highest morbidity rates for cancer diagnoses in the world and has become a major public health issue. Early diagnosis can increase the chance of successful treatment and survival. However, it is a very challenging and time-consuming task that relies on the experience of pathologists. The automatic diagnosis of breast cancer by analyzing histopathological images plays a significant role for patients and their prognosis. However, traditional feature extraction methods can only extract some low-level features of images, and prior knowledge is necessary to select useful features, which can be greatly affected by humans. Deep learning techniques can extract high-level abstract features from images automatically. Therefore, we introduce it to analyze histopathological images of breast cancer via supervised and unsupervised deep convolutional neural networks. First, we adapted Inception_V3 and Inception_ResNet_V2 architectures to the binary and multi-class issues of breast cancer histopathological image classification by utilizing transfer learning techniques. Then, to overcome the influence from the imbalanced histopathological images in subclasses, we balanced the subclasses with Ductal Carcinoma as the baseline by turning images up and down, right and left, and rotating them counterclockwise by 90 and 180 degrees. Our experimental results of the supervised histopathological image classification of breast cancer and the comparison to the results from other studies demonstrate that Inception_V3 and Inception_ResNet_V2 based histopathological image classification of breast cancer is superior to the existing methods. Furthermore, these findings show that Inception_ResNet_V2 network is the best deep learning architecture so far for diagnosing breast cancers by analyzing histopathological images. Therefore, we used Inception_ResNet_V2 to extract features from breast cancer histopathological images to perform unsupervised analysis of the images. We also constructed a new autoencoder network to transform the features extracted by Inception_ResNet_V2 to a low dimensional space to do clustering analysis of the images. The experimental results demonstrate that using our proposed autoencoder network results in better clustering results than those based on features extracted only by Inception_ResNet_V2 network. All of our experimental results demonstrate that Inception_ResNet_V2 network based deep transfer learning provides a new means of performing analysis of histopathological images of breast cancer. Collapse Key Words autoencoder breast cancer classification clustering deep convolutional neural networks histopathological images transfer learning Collapse MESH Headings Collapse Grants Collapse	Journal Article	6	90
8	Mallat S. Understanding deep convolutional networks. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016;374:20150203. [PMID: 26953183 PMCID: PMC4792410 DOI: 10.1098/rsta.2015.0203] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/16/2015] [Indexed: 05/22/2023] Abstract Deep convolutional networks provide state-of-the-art classifications and regressions results over many high-dimensional problems. We review their architecture, which scatters data with a cascade of linear filter weights and nonlinearities. A mathematical framework is introduced to analyse their properties. Computations of invariants involve multiscale contractions with wavelets, the linearization of hierarchical symmetries and sparse separations. Applications are discussed. Collapse Key Words deep convolutional neural networks learning wavelets Collapse MESH Headings Collapse Grants ERC Collapse	Review	9	85
9	An Adaptive Multi-Sensor Data Fusion Method Based on Deep Convolutional Neural Networks for Fault Diagnosis of Planetary Gearbox. SENSORS 2017;17:s17020414. [PMID: 28230767 PMCID: PMC5335931 DOI: 10.3390/s17020414] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Revised: 02/11/2017] [Accepted: 02/16/2017] [Indexed: 11/28/2022] Abstract A fault diagnosis approach based on multi-sensor data fusion is a promising tool to deal with complicated damage detection problems of mechanical systems. Nevertheless, this approach suffers from two challenges, which are (1) the feature extraction from various types of sensory data and (2) the selection of a suitable fusion level. It is usually difficult to choose an optimal feature or fusion level for a specific fault diagnosis task, and extensive domain expertise and human labor are also highly required during these selections. To address these two challenges, we propose an adaptive multi-sensor data fusion method based on deep convolutional neural networks (DCNN) for fault diagnosis. The proposed method can learn features from raw data and optimize a combination of different fusion levels adaptively to satisfy the requirements of any fault diagnosis task. The proposed method is tested through a planetary gearbox test rig. Handcraft features, manual-selected fusion levels, single sensory data, and two traditional intelligent models, back-propagation neural networks (BPNN) and a support vector machine (SVM), are used as comparisons in the experiment. The results demonstrate that the proposed method is able to detect the conditions of the planetary gearbox effectively with the best diagnosis accuracy among all comparative methods in the experiment. Collapse Key Words deep convolutional neural networks fault diagnosis feature learning multi-sensor data fusion planetary gearbox Collapse MESH Headings Collapse Grants Collapse	Journal Article	8	80
10	Zhang J, Liu M, Shen D. Detecting Anatomical Landmarks From Limited Medical Imaging Data Using Two-Stage Task-Oriented Deep Neural Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017;26:4753-4764. [PMID: 28678706 PMCID: PMC5729285 DOI: 10.1109/tip.2017.2721106] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023] Abstract One of the major challenges in anatomical landmark detection, based on deep neural networks, is the limited availability of medical imaging data for network learning. To address this problem, we present a two-stage task-oriented deep learning method to detect large-scale anatomical landmarks simultaneously in real time, using limited training data. Specifically, our method consists of two deep convolutional neural networks (CNN), with each focusing on one specific task. Specifically, to alleviate the problem of limited training data, in the first stage, we propose a CNN based regression model using millions of image patches as input, aiming to learn inherent associations between local image patches and target anatomical landmarks. To further model the correlations among image patches, in the second stage, we develop another CNN model, which includes a) a fully convolutional network that shares the same architecture and network weights as the CNN used in the first stage and also b) several extra layers to jointly predict coordinates of multiple anatomical landmarks. Importantly, our method can jointly detect large-scale (e.g., thousands of) landmarks in real time. We have conducted various experiments for detecting 1200 brain landmarks from the 3D T1-weighted magnetic resonance images of 700 subjects, and also 7 prostate landmarks from the 3D computed tomography images of 73 subjects. The experimental results show the effectiveness of our method regarding both accuracy and efficiency in the anatomical landmark detection. Collapse Key Words anatomical landmark detection deep convolutional neural networks task-oriented real-time limited medical imaging data Collapse MESH Headings Collapse Grants R01 EB008374 NIBIB NIH HHS R01 EB006733 NIBIB NIH HHS R01 AG041721 NIA NIH HHS R01 CA140413 NCI NIH HHS R01 AG049371 NIA NIH HHS R01 AG042599 NIA NIH HHS RF1 AG053867 NIA NIH HHS R01 EB009634 NIBIB NIH HHS R01 MH100217 NIMH NIH HHS Collapse	research-article	8	73
11	Mezgec S, Koroušić Seljak B. NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment. Nutrients 2017;9:E657. [PMID: 28653995 PMCID: PMC5537777 DOI: 10.3390/nu9070657] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Revised: 06/05/2017] [Accepted: 06/20/2017] [Indexed: 11/25/2022] Open Abstract Automatic food image recognition systems are alleviating the process of food-intake estimation and dietary assessment. However, due to the nature of food images, their recognition is a particularly challenging task, which is why traditional approaches in the field have achieved a low classification accuracy. Deep neural networks have outperformed such solutions, and we present a novel approach to the problem of food and drink image detection and recognition that uses a newly-defined deep convolutional neural network architecture, called NutriNet. This architecture was tuned on a recognition dataset containing 225,953 512 × 512 pixel images of 520 different food and drink items from a broad spectrum of food groups, on which we achieved a classification accuracy of 86 . 72 % , along with an accuracy of 94 . 47 % on a detection dataset containing 130 , 517 images. We also performed a real-world test on a dataset of self-acquired images, combined with images from Parkinson's disease patients, all taken using a smartphone camera, achieving a top-five accuracy of 55 % , which is an encouraging result for real-world images. Additionally, we tested NutriNet on the University of Milano-Bicocca 2016 (UNIMIB2016) food image dataset, on which we improved upon the provided baseline recognition result. An online training component was implemented to continually fine-tune the food and drink recognition model on new images. The model is being used in practice as part of a mobile app for the dietary assessment of Parkinson's disease patients. Collapse Key Words NutriNet Parkinson’s disease deep convolutional neural networks deep learning drink detection drink recognition food detection food recognition Collapse MESH Headings Algorithms Beverages Computer Simulation Food Image Processing, Computer-Assisted Internet Machine Learning Mobile Applications Neural Networks, Computer Nutrition Assessment Smartphone Collapse Grants Collapse	research-article	8	67
12	Gao Y, Gao B, Chen Q, Liu J, Zhang Y. Deep Convolutional Neural Network-Based Epileptic Electroencephalogram (EEG) Signal Classification. Front Neurol 2020;11:375. [PMID: 32528398 PMCID: PMC7257380 DOI: 10.3389/fneur.2020.00375] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 04/14/2020] [Indexed: 11/28/2022] Open Abstract Electroencephalogram (EEG) signals contain vital information on the electrical activities of the brain and are widely used to aid epilepsy analysis. A challenging element of epilepsy diagnosis, accurate classification of different epileptic states, is of particular interest and has been extensively investigated. A new deep learning-based classification methodology, namely epileptic EEG signal classification (EESC), is proposed in this paper. This methodology first transforms epileptic EEG signals to power spectrum density energy diagrams (PSDEDs), then applies deep convolutional neural networks (DCNNs) and transfer learning to automatically extract features from the PSDED, and finally classifies four categories of epileptic states (interictal, preictal duration to 30 min, preictal duration to 10 min, and seizure). It outperforms the existing epilepsy classification methods in terms of accuracy and efficiency. For instance, it achieves an average classification accuracy of over 90% in a case study with CHB-MIT epileptic EEG data. Collapse Key Words EEG deep convolutional neural networks electroencephalogram epileptic EEG signal classification power spectrum density energy diagram Collapse MESH Headings Collapse Grants Collapse	Journal Article	5	49
13	Peng P, Zhao X, Pan X, Ye W. Gas Classification Using Deep Convolutional Neural Networks. SENSORS 2018;18:s18010157. [PMID: 29316723 PMCID: PMC5795481 DOI: 10.3390/s18010157] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 12/21/2017] [Accepted: 12/30/2017] [Indexed: 11/16/2022] Abstract In this work, we propose a novel Deep Convolutional Neural Network (DCNN) tailored for gas classification. Inspired by the great success of DCNN in the field of computer vision, we designed a DCNN with up to 38 layers. In general, the proposed gas neural network, named GasNet, consists of: six convolutional blocks, each block consist of six layers; a pooling layer; and a fully-connected layer. Together, these various layers make up a powerful deep model for gas classification. Experimental results show that the proposed DCNN method is an effective technique for classifying electronic nose data. We also demonstrate that the DCNN method can provide higher classification accuracy than comparable Support Vector Machine (SVM) methods and Multiple Layer Perceptron (MLP). Collapse Key Words deep convolutional neural networks electronic nose gas classification Collapse MESH Headings Collapse Grants Collapse	Journal Article	7	42
14	Tong X, Wei J, Sun B, Su S, Zuo Z, Wu P. ASCU-Net: Attention Gate, Spatial and Channel Attention U-Net for Skin Lesion Segmentation. Diagnostics (Basel) 2021;11:501. [PMID: 33809048 PMCID: PMC7999819 DOI: 10.3390/diagnostics11030501] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 01/29/2023] Open Abstract Segmentation of skin lesions is a challenging task because of the wide range of skin lesion shapes, sizes, colors, and texture types. In the past few years, deep learning networks such as U-Net have been successfully applied to medical image segmentation and exhibited faster and more accurate performance. In this paper, we propose an extended version of U-Net for the segmentation of skin lesions using the concept of the triple attention mechanism. We first selected regions using attention coefficients computed by the attention gate and contextual information. Second, a dual attention decoding module consisting of spatial attention and channel attention was used to capture the spatial correlation between features and improve segmentation performance. The combination of the three attentional mechanisms helped the network to focus on a more relevant field of view of the target. The proposed model was evaluated using three datasets, ISIC-2016, ISIC-2017, and PH2. The experimental results demonstrated the effectiveness of our method with strong robustness to the presence of irregular borders, lesion and skin smooth transitions, noise, and artifacts. Collapse Key Words U-Net attention mechanism deep convolutional neural networks skin lesion segmentation Collapse MESH Headings Collapse Grants Collapse	research-article	4	38
15	Mao J, Luo Y, Liu L, Lao J, Shao Y, Zhang M, Zhang C, Sun M, Shen L. Automated diagnosis and quantitative analysis of plus disease in retinopathy of prematurity based on deep convolutional neural networks. Acta Ophthalmol 2020;98:e339-e345. [PMID: 31559701 DOI: 10.1111/aos.14264] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/06/2019] [Indexed: 12/24/2022] Abstract BACKGROUND The purpose of this study was to develop an automated diagnosis and quantitative analysis system for plus disease. The system provides a diagnostic decision but also performs quantitative analysis of the typical pathological features of the disease, which helps the physicians to make the best judgement and communicate the decisions. METHODS The deep learning network provided segmentation of the retinal vessels and the optic disc (OD). Based on the vessel segmentation, plus disease was classified and tortuosity, width, fractal dimension and vessel density were evaluated automatically. RESULTS The trained network achieved a sensitivity of 95.1% with 97.8% specificity for the diagnosis of plus disease. For detection of preplus or worse, the sensitivity and specificity were 92.4% and 97.4%. The quadratic weighted k was 0.9244. The tortuosities for the normal, preplus and plus groups were 3.61 ± 0.08, 5.95 ± 1.57 and 10.67 ± 0.50 (10⁴ cm^-3 ). The widths of the blood vessels were 63.46 ± 0.39, 67.21 ± 0.70 and 68.89 ± 0.75 μm. The fractal dimensions were 1.18 ± 0.01, 1.22 ± 0.01 and 1.26 ± 0.02. The vessel densities were 1.39 ± 0.03, 1.60 ± 0.01 and 1.64 ± 0.09 (%). All values were statistically different among the groups. After treatment for plus disease with ranibizumab injection, quantitative analysis showed significant changes in the pathological features. CONCLUSIONS Our system achieved high accuracy of diagnosis of plus disease in retinopathy of prematurity. It provided a quantitative analysis of the dynamic features of the disease progression. This automated system can assist physicians by providing a classification decision with auxiliary quantitative evaluation of the typical pathological features of the disease. Collapse Key Words deep convolutional neural networks plus disease retina retinopathy of prematurity Collapse MESH Headings Collapse Grants Collapse	Validation Study	5	33
16	Yousif H, Yuan J, Kays R, He Z. Animal Scanner: Software for classifying humans, animals, and empty frames in camera trap images. Ecol Evol 2019;9:1578-1589. [PMID: 30847057 PMCID: PMC6392355 DOI: 10.1002/ece3.4747] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 10/19/2018] [Accepted: 10/24/2018] [Indexed: 11/20/2022] Open Abstract Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple photographs for each detection and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of photographs per study. The task of converting photographs to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g., camera malfunction or moving vegetation) or pictures of humans. We developed computer vision algorithms to detect and classify moving objects to aid the first step of camera trap image filtering-separating the animal detections from the empty frames and pictures of humans. Our new work couples foreground object segmentation through background subtraction with deep learning classification to provide a fast and accurate scheme for human-animal detection. We provide these programs as both Matlab GUI and command prompt developed with C++. The software reads folders of camera trap images and outputs images annotated with bounding boxes around moving objects and a text file summary of results. This software maintains high accuracy while reducing the execution time by 14 times. It takes about 6 seconds to process a sequence of ten frames (on a 2.6 GHZ CPU computer). For those cameras with excessive empty frames due to camera malfunction or blowing vegetation automatically removes 54% of the false-triggers sequences without influencing the human/animal sequences. We achieve 99.58% on image-level empty versus object classification of Serengeti dataset. We offer the first computer vision tool for processing camera trap images providing substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps. Collapse Key Words background subtraction camera trap images deep convolutional neural networks human–animal detection wildlife monitoring Collapse MESH Headings Collapse Grants Collapse	research-article	6	29
17	Tian T, Li C, Xu J, Ma J. Urban Area Detection in Very High Resolution Remote Sensing Images Using Deep Convolutional Neural Networks. SENSORS 2018;18:s18030904. [PMID: 29562651 PMCID: PMC5876601 DOI: 10.3390/s18030904] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 03/14/2018] [Accepted: 03/14/2018] [Indexed: 11/24/2022] Abstract Detecting urban areas from very high resolution (VHR) remote sensing images plays an important role in the field of Earth observation. The recently-developed deep convolutional neural networks (DCNNs), which can extract rich features from training data automatically, have achieved outstanding performance on many image classification databases. Motivated by this fact, we propose a new urban area detection method based on DCNNs in this paper. The proposed method mainly includes three steps: (i) a visual dictionary is obtained based on the deep features extracted by pre-trained DCNNs; (ii) urban words are learned from labeled images; (iii) the urban regions are detected in a new image based on the nearest dictionary word criterion. The qualitative and quantitative experiments on different datasets demonstrate that the proposed method can obtain a remarkable overall accuracy (OA) and kappa coefficient. Moreover, it can also strike a good balance between the true positive rate (TPR) and false positive rate (FPR). Collapse Key Words deep convolutional neural networks remote sensing urban area detection very high resolution Collapse MESH Headings Collapse Grants Collapse	Journal Article	7	28
18	Cavazos JG, Phillips PJ, Castillo CD, O'Toole AJ. Accuracy comparison across face recognition algorithms: Where are we on measuring race bias? ACTA ACUST UNITED AC 2020;3:101-111. [PMID: 33585821 DOI: 10.1109/tbiom.2020.3027269] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract Previous generations of face recognition algorithms differ in accuracy for images of different races (race bias). Here, we present the possible underlying factors (data-driven and scenario modeling) and methodological considerations for assessing race bias in algorithms. We discuss data-driven factors (e.g., image quality, image population statistics, and algorithm architecture), and scenario modeling factors that consider the role of the "user" of the algorithm (e.g., threshold decisions and demographic constraints). To illustrate how these issues apply, we present data from four face recognition algorithms (a previous-generation algorithm and three deep convolutional neural networks, DCNNs) for East Asian and Caucasian faces. First, dataset difficulty affected both overall recognition accuracy and race bias, such that race bias increased with item difficulty. Second, for all four algorithms, the degree of bias varied depending on the identification decision threshold. To achieve equal false accept rates (FARs), East Asian faces required higher identification thresholds than Caucasian faces, for all algorithms. Third, demographic constraints on the formulation of the distributions used in the test, impacted estimates of algorithm accuracy. We conclude that race bias needs to be measured for individual applications and we provide a checklist for measuring this bias in face recognition algorithms. Collapse Key Words deep convolutional neural networks face recognition algorithm race bias the other-race effect Collapse MESH Headings Collapse Grants Collapse	Journal Article	5	27
19	Melanoma Classification Using a Novel Deep Convolutional Neural Network with Dermoscopic Images. SENSORS 2022;22:s22031134. [PMID: 35161878 PMCID: PMC8838143 DOI: 10.3390/s22031134] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 01/18/2022] [Accepted: 01/27/2022] [Indexed: 02/01/2023] Abstract Automatic melanoma detection from dermoscopic skin samples is a very challenging task. However, using a deep learning approach as a machine vision tool can overcome some challenges. This research proposes an automated melanoma classifier based on a deep convolutional neural network (DCNN) to accurately classify malignant vs. benign melanoma. The structure of the DCNN is carefully designed by organizing many layers that are responsible for extracting low to high-level features of the skin images in a unique fashion. Other vital criteria in the design of DCNN are the selection of multiple filters and their sizes, employing proper deep learning layers, choosing the depth of the network, and optimizing hyperparameters. The primary objective is to propose a lightweight and less complex DCNN than other state-of-the-art methods to classify melanoma skin cancer with high efficiency. For this study, dermoscopic images containing different cancer samples were obtained from the International Skin Imaging Collaboration datastores (ISIC 2016, ISIC2017, and ISIC 2020). We evaluated the model based on accuracy, precision, recall, specificity, and F1-score. The proposed DCNN classifier achieved accuracies of 81.41%, 88.23%, and 90.42% on the ISIC 2016, 2017, and 2020 datasets, respectively, demonstrating high performance compared with the other state-of-the-art networks. Therefore, this proposed approach could provide a less complex and advanced framework for automating the melanoma diagnostic process and expediting the identification process to save a life. Collapse Key Words classification deep convolutional neural networks melanoma skin cancer Collapse MESH Headings Collapse Grants Collapse		3	21
20	Bracci S, Op de Beeck HP. Understanding Human Object Vision: A Picture Is Worth a Thousand Representations. Annu Rev Psychol 2023;74:113-135. [PMID: 36378917 DOI: 10.1146/annurev-psych-032720-041031] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Objects are the core meaningful elements in our visual environment. Classic theories of object vision focus upon object recognition and are elegant and simple. Some of their proposals still stand, yet the simplicity is gone. Recent evolutions in behavioral paradigms, neuroscientific methods, and computational modeling have allowed vision scientists to uncover the complexity of the multidimensional representational space that underlies object vision. We review these findings and propose that the key to understanding this complexity is to relate object vision to the full repertoire of behavioral goals that underlie human behavior, running far beyond object recognition. There might be no such thing as core object recognition, and if it exists, then its importance is more limited than traditionally thought. Collapse Key Words DCNNs behavior deep convolutional neural networks object recognition object representations visual cortex Collapse MESH Headings Collapse Grants Collapse	Review	2	20
21	Wang Y, Zhou L, Wang M, Shao C, Shi L, Yang S, Zhang Z, Feng M, Shan F, Liu L. Combination of generative adversarial network and convolutional neural network for automatic subcentimeter pulmonary adenocarcinoma classification. Quant Imaging Med Surg 2020;10:1249-1264. [PMID: 32550134 DOI: 10.21037/qims-19-982] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Abstract Background The efficient and accurate diagnosis of pulmonary adenocarcinoma before surgery is of considerable significance to clinicians. Although computed tomography (CT) examinations are widely used in practice, it is still challenging and time-consuming for radiologists to distinguish between different types of subcentimeter pulmonary nodules. Although there have been many deep learning algorithms proposed, their performance largely depends on vast amounts of data, which is difficult to collect in the medical imaging area. Therefore, we propose an automatic classification system for subcentimeter pulmonary adenocarcinoma, combining a convolutional neural network (CNN) and a generative adversarial network (GAN) to optimize clinical decision-making and to provide small dataset algorithm design ideas. Methods A total of 206 nodules with postoperative pathological labels were analyzed. Among them were 30 adenocarcinomas in situ (AISs), 119 minimally invasive adenocarcinomas (MIAs), and 57 invasive adenocarcinomas (IACs). Our system consisted of two parts, a GAN-based image synthesis, and a CNN classification. First, several popular existing GAN techniques were employed to augment the datasets, and comprehensive experiments were conducted to evaluate the quality of the GAN synthesis. Additionally, our classification system processes were based on two-dimensional (2D) nodule-centered CT patches without the need of manual labeling information. Results For GAN-based image synthesis, the visual Turing test showed that even radiologists could not tell the GAN-synthesized from the raw images (accuracy: primary radiologist 56%, senior radiologist 65%). For CNN classification, our progressive growing wGAN improved the performance of CNN most effectively (area under the curve =0.83). The experiments indicated that the proposed GAN augmentation method improved the classification accuracy by 23.5% (from 37.0% to 60.5%) and 7.3% (from 53.2% to 60.5%) in comparison with training methods using raw and common augmented images respectively. The performance of this combined GAN and CNN method (accuracy: 60.5%±2.6%) was comparable to the state-of-the-art methods, and our CNN was also more lightweight. Conclusions The experiments revealed that GAN synthesis techniques could effectively alleviate the problem of insufficient data in medical imaging. The proposed GAN plus CNN framework can be generalized for use in building other computer-aided detection (CADx) algorithms and thus assist in diagnosis. Collapse Key Words Subcentimeter pulmonary adenocarcinoma diagnosis computed tomography data augmentation deep convolutional neural networks generative adversarial network (GAN) Collapse MESH Headings Collapse Grants Collapse	Journal Article	5	19
22	Sklan JES, Plassard AJ, Fabbri D, Landman BA. Toward Content Based Image Retrieval with Deep Convolutional Neural Networks. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2015;9417. [PMID: 25914507 DOI: 10.1117/12.2081551] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Abstract Content-based image retrieval (CBIR) offers the potential to identify similar case histories, understand rare disorders, and eventually, improve patient care. Recent advances in database capacity, algorithm efficiency, and deep Convolutional Neural Networks (dCNN), a machine learning technique, have enabled great CBIR success for general photographic images. Here, we investigate applying the leading ImageNet CBIR technique to clinically acquired medical images captured by the Vanderbilt Medical Center. Briefly, we (1) constructed a dCNN with four hidden layers, reducing dimensionality of an input scaled to 128×128 to an output encoded layer of 4×384, (2) trained the network using back-propagation 1 million random magnetic resonance (MR) and computed tomography (CT) images, (3) labeled an independent set of 2100 images, and (4) evaluated classifiers on the projection of the labeled images into manifold space. Quantitative results were disappointing (averaging a true positive rate of only 20%); however, the data suggest that improvements would be possible with more evenly distributed sampling across labels and potential re-grouping of label structures. This prelimainry effort at automated classification of medical images with ImageNet is promising, but shows that more work is needed beyond direct adaptation of existing techniques. Collapse Key Words content based image retrieval deep convolutional neural networks medical images unsupervised learning Collapse MESH Headings Collapse Grants Collapse	Journal Article	10	16
23	Ma P, Xu W, Teng Z, Luo Y, Gong C, Wang Q. An Integrated Food Freshness Sensor Array System Augmented by a Metal-Organic Framework Mixed-Matrix Membrane and Deep Learning. ACS Sens 2022;7:1847-1854. [PMID: 35834210 DOI: 10.1021/acssensors.2c00255] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract The static labels presently prevalent on the food market are confronted with challenges due to the assumption that a food product only undergoes a limited range of predefined conditions, which cause elevated safety risks or waste of perishable food products. Hence, integrated systems for measuring food freshness in real time have been developed for improving the reliability, safety, and sustainability of the food supply. However, these systems are limited by poor sensitivity and accuracy. Here, a metal-organic framework mixed-matrix membrane and deep learning technology were combined to tackle these challenges. UiO-66-OH and polyvinyl alcohol were impregnated with six chromogenic indicators to prepare sensor array composites. The sensors underwent color changes after being exposed to ammonia at different pH values. The limit of detection of 80 ppm for trimethylamine was obtained, which was practically acceptable in the food industry. Four state-of-the-art deep convolutional neural networks were applied to recognize the color change, endowing it with high-accuracy freshness estimation. The simulation test for chicken freshness estimation achieved accuracy up to 98.95% by the WISeR-50 algorithm. Moreover, 3D printing was applied to create a mold for possible scale-up production, and a portable food freshness detector platform was conceptually built. This approach has the potential to advance integrated and real-time food freshness estimation. Collapse Key Words UiO-66 ammonia deep convolutional neural networks metal−organic frameworks real-time estimation Collapse MESH Headings Collapse Grants Collapse		3	16
24	Zisimopoulos O, Flouty E, Stacey M, Muscroft S, Giataganas P, Nehme J, Chow A, Stoyanov D. Can surgical simulation be used to train detection and classification of neural networks? Healthc Technol Lett 2017;4:216-222. [PMID: 29184668 PMCID: PMC5683210 DOI: 10.1049/htl.2017.0064] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 07/31/2017] [Indexed: 01/25/2023] Open Abstract Computer-assisted interventions (CAI) aim to increase the effectiveness, precision and repeatability of procedures to improve surgical outcomes. The presence and motion of surgical tools is a key information input for CAI surgical phase recognition algorithms. Vision-based tool detection and recognition approaches are an attractive solution and can be designed to take advantage of the powerful deep learning paradigm that is rapidly advancing image recognition and classification. The challenge for such algorithms is the availability and quality of labelled data used for training. In this Letter, surgical simulation is used to train tool detection and segmentation based on deep convolutional neural networks and generative adversarial networks. The authors experiment with two network architectures for image segmentation in tool classes commonly encountered during cataract surgery. A commercially-available simulator is used to create a simulated cataract dataset for training models prior to performing transfer learning on real surgical data. To the best of authors' knowledge, this is the first attempt to train deep learning models for surgical instrument detection on simulated data while demonstrating promising results to generalise on real data. Results indicate that simulated data does have some potential for training advanced classification methods for CAI systems. Collapse Key Words CAI surgical phase recognition algorithms Vision20 based tool detection biomedical optical imaging cataract surgery computer-assisted interventions deep convolutional neural networks deep learning generative adversarial networks image classification image recognition image segmentation learning (artificial intelligence) medical image processing neural nets surgery surgical simulation tool detection tool segmentation video signal processing Collapse MESH Headings Collapse Grants Collapse	research-article	8	15
25	Pramod RT, Cohen MA, Tenenbaum JB, Kanwisher N. Invariant representation of physical stability in the human brain. eLife 2022;11:e71736. [PMID: 35635277 PMCID: PMC9150889 DOI: 10.7554/elife.71736] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open Abstract Successful engagement with the world requires the ability to predict what will happen next. Here, we investigate how the brain makes a fundamental prediction about the physical world: whether the situation in front of us is stable, and hence likely to stay the same, or unstable, and hence likely to change in the immediate future. Specifically, we ask if judgments of stability can be supported by the kinds of representations that have proven to be highly effective at visual object recognition in both machines and brains, or instead if the ability to determine the physical stability of natural scenes may require generative algorithms that simulate the physics of the world. To find out, we measured responses in both convolutional neural networks (CNNs) and the brain (using fMRI) to natural images of physically stable versus unstable scenarios. We find no evidence for generalizable representations of physical stability in either standard CNNs trained on visual object and scene classification (ImageNet), or in the human ventral visual pathway, which has long been implicated in the same process. However, in frontoparietal regions previously implicated in intuitive physical reasoning we find both scenario-invariant representations of physical stability, and higher univariate responses to unstable than stable scenes. These results demonstrate abstract representations of physical stability in the dorsal but not ventral pathway, consistent with the hypothesis that the computations underlying stability entail not just pattern classification but forward physical simulation. Collapse Key Words deep convolutional neural networks human intuitive physics neuroscience physical stability Collapse MESH Headings Brain/diagnostic imaging Brain Mapping Humans Magnetic Resonance Imaging/methods Neural Networks, Computer Photic Stimulation Collapse Grants DP1 HD091947 NICHD NIH HHS S10 OD021569 NIH HHS National Institutes of Health Office of Naval Research National Science Foundation Collapse	Research Support, N.I.H., Extramural	3	14

Please SIGN IN to browse more articles.