1
|
Zhou C, Cao S, Li M. Coordinate-aware three-dimensional neural network for lower extremity arterial stenosis classification in CT angiography. Heliyon 2024; 10:e34309. [PMID: 39100455 PMCID: PMC11295843 DOI: 10.1016/j.heliyon.2024.e34309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 06/16/2024] [Accepted: 07/08/2024] [Indexed: 08/06/2024] Open
Abstract
Background Lower Extremity Computed Tomography Angiography (CTA) is an effective non-invasive diagnostic tool for lower extremity artery disease (LEAD). This study aimed to develop an automatic classification model based on a coordinate-aware 3D deep neural network to evaluate the degree of arterial stenosis in lower extremity CTA. Methods This retrospective study included 277 patients who underwent lower extremity CTA between May 1, 2017, and August 31, 2023. Radiologists annotated the lower extremity artery segments according to the degree of stenosis, and 12,450 3D patches containing the regions of interest were segmented to construct the dataset. A Coordinate-Aware Three-Dimensional Neural Network was implemented to classify the degree of stenosis of the lower extremity arteries with these patches. Metrics including accuracy, sensitivity, specificity, F1 score, and receiver operating characteristic (ROC) curves were used to evaluate the performance of the proposed model. Results The accuracy, F1 score, and area under the ROC curve (AUC) of our proposed model were 93.08 %, 91.96 %, and 99.15 % for the above-knee arteries, and 91.70 %, 89.67 %, and 98.2 % respectively for below-knee arteries. The results of our proposed model exhibited a lead of 4-5% in accuracy score over the 3D baseline model and a lead of more than 10 % over the 2D baseline model. Conclusion We successfully implemented a deep learning model, a promising tool for assisting radiologists in evaluating lower extremity arterial stenosis on CT angiography.
Collapse
Affiliation(s)
- Chenwei Zhou
- Department of Radiology, Songjiang Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shengnan Cao
- Department of Radiology, Shanghai TCM - Integrated Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Maolin Li
- Department of Computer Science, City University of London, London, United Kingdom
| |
Collapse
|
2
|
de Silva A, Ranasinghe R, Sounthararajah A, Haghighi H, Kodikara J. Beyond Conventional Monitoring: A Semantic Segmentation Approach to Quantifying Traffic-Induced Dust on Unsealed Roads. SENSORS (BASEL, SWITZERLAND) 2024; 24:510. [PMID: 38257603 PMCID: PMC11154504 DOI: 10.3390/s24020510] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/26/2023] [Accepted: 01/12/2024] [Indexed: 01/24/2024]
Abstract
Road dust is a mixture of fine and coarse particles released into the air due to an external force, such as tire-ground friction or wind, which is harmful to human health when inhaled. Continuous dust emission from the road surfaces is detrimental to the road itself and the road users. Due to this, multiple dust monitoring and control techniques are currently adopted in the world. The current dust monitoring methods require expensive equipment and expertise. This study introduces a novel pragmatic and robust approach to quantifying traffic-induced road dust using a deep learning method called semantic segmentation. Based on the authors' previous works, the best-performing semantic segmentation machine learning models were selected and used to identify dust in an image pixel-wise. The total number of dust pixels was then correlated with real-world dust measurements obtained from a research-grade dust monitor. Our method shows that semantic segmentation can be adopted to quantify traffic-induced dust reasonably. Over 90% of the predictions from both correlations fall in true positive quadrant, indicating that when dust concentrations are below the threshold, the segmentation can accurately predict them. The results were validated and extended for real-time application. Our code implementation is publicly available.
Collapse
Affiliation(s)
- Asanka de Silva
- ARC Industrial Transformation Research Hub (ITRH)—SPARC Hub, Department of Civil Engineering, Monash University, Clayton Campus, Clayton, VIC 3800, Australia; (A.d.S.); (R.R.); (A.S.)
| | - Rajitha Ranasinghe
- ARC Industrial Transformation Research Hub (ITRH)—SPARC Hub, Department of Civil Engineering, Monash University, Clayton Campus, Clayton, VIC 3800, Australia; (A.d.S.); (R.R.); (A.S.)
| | - Arooran Sounthararajah
- ARC Industrial Transformation Research Hub (ITRH)—SPARC Hub, Department of Civil Engineering, Monash University, Clayton Campus, Clayton, VIC 3800, Australia; (A.d.S.); (R.R.); (A.S.)
| | - Hamed Haghighi
- Product Development Hub, Road Science, Downer EDI Works Pty Ltd., Somerton, VIC 3061, Australia;
| | - Jayantha Kodikara
- ARC Industrial Transformation Research Hub (ITRH)—SPARC Hub, Department of Civil Engineering, Monash University, Clayton Campus, Clayton, VIC 3800, Australia; (A.d.S.); (R.R.); (A.S.)
| |
Collapse
|
3
|
Cadoni M, Lagorio A, Grosso E. Face detection based on a human attention guided multi-scale model. BIOLOGICAL CYBERNETICS 2023; 117:453-466. [PMID: 38038793 PMCID: PMC10752920 DOI: 10.1007/s00422-023-00978-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 11/02/2023] [Indexed: 12/02/2023]
Abstract
Multiscale models are among the cutting-edge technologies used for face detection and recognition. An example is Deformable part-based models (DPMs), which encode a face as a multiplicity of local areas (parts) at different resolution scales and their hierarchical and spatial relationship. Although these models have proven successful and incredibly efficient in practical applications, the mutual position and spatial resolution of the parts involved are arbitrarily defined by a human specialist and the final choice of the optimal scales and parts is based on heuristics. This work seeks to understand whether a multi-scale model can take inspiration from human fixations to select specific areas and spatial scales. In more detail, it shows that a multi-scale pyramid representation can be adopted to extract interesting points, and that human attention can be used to select the points at the scales that lead to the best face detection performance. Human fixations can therefore provide a valid methodological basis on which to build a multiscale model, by selecting the spatial scales and areas of interest that are most relevant to humans.
Collapse
Affiliation(s)
- Marinella Cadoni
- Dipartimento di Scienze Biomediche, Università di Sassari, Viale San Pietro 43B, 07100, Sassari, Italy.
| | - Andrea Lagorio
- Dipartimento di Scienze Biomediche, Università di Sassari, Viale San Pietro 43B, 07100, Sassari, Italy
| | - Enrico Grosso
- Dipartimento di Scienze Biomediche, Università di Sassari, Viale San Pietro 43B, 07100, Sassari, Italy
| |
Collapse
|
4
|
Lee MJ, DiCarlo JJ. How well do rudimentary plasticity rules predict adult visual object learning? PLoS Comput Biol 2023; 19:e1011713. [PMID: 38079444 PMCID: PMC10754461 DOI: 10.1371/journal.pcbi.1011713] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 12/28/2023] [Accepted: 11/27/2023] [Indexed: 12/29/2023] Open
Abstract
A core problem in visual object learning is using a finite number of images of a new object to accurately identify that object in future, novel images. One longstanding, conceptual hypothesis asserts that this core problem is solved by adult brains through two connected mechanisms: 1) the re-representation of incoming retinal images as points in a fixed, multidimensional neural space, and 2) the optimization of linear decision boundaries in that space, via simple plasticity rules applied to a single downstream layer. Though this scheme is biologically plausible, the extent to which it explains learning behavior in humans has been unclear-in part because of a historical lack of image-computable models of the putative neural space, and in part because of a lack of measurements of human learning behaviors in difficult, naturalistic settings. Here, we addressed these gaps by 1) drawing from contemporary, image-computable models of the primate ventral visual stream to create a large set of testable learning models (n = 2,408 models), and 2) using online psychophysics to measure human learning trajectories over a varied set of tasks involving novel 3D objects (n = 371,000 trials), which we then used to develop (and publicly release) empirical benchmarks for comparing learning models to humans. We evaluated each learning model on these benchmarks, and found those based on deep, high-level representations from neural networks were surprisingly aligned with human behavior. While no tested model explained the entirety of replicable human behavior, these results establish that rudimentary plasticity rules, when combined with appropriate visual representations, have high explanatory power in predicting human behavior with respect to this core object learning problem.
Collapse
Affiliation(s)
- Michael J. Lee
- Department of Brain and Cognitive Sciences, MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds and Machines, MIT, Cambridge, Massachusetts, United States of America
| | - James J. DiCarlo
- Department of Brain and Cognitive Sciences, MIT, Cambridge, Massachusetts, United States of America
- Center for Brains, Minds and Machines, MIT, Cambridge, Massachusetts, United States of America
- McGovern Institute for Brain Research, MIT, Cambridge, Massachusetts, United States of America
| |
Collapse
|
5
|
Osterbrink C, Herwig A. What determines location specificity or generalization of transsaccadic learning? J Vis 2023; 23:8. [PMID: 36648417 PMCID: PMC9851281 DOI: 10.1167/jov.23.1.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Humans incorporate knowledge of transsaccadic associations into peripheral object perception. Several studies have shown that learning of new manipulated transsaccadic associations leads to a presaccadic perceptual bias. However, there was still disagreement whether this learning effect was location specific (Herwig, Weiß, & Schneider, 2018) or generalizes to new locations (Valsecchi & Gegenfurtner, 2016). The current study investigated under what conditions location generalization of transsaccadic learning occurs. In all experiments, there were acquisition phases in which the spatial frequency (Experiment 1) or the size (Experiment 2 and 3) of objects was changed transsaccadically. In the test phases, participants judged the respective feature of peripheral objects. These could appear either at the location where learning had taken place or at new locations. All experiments replicated the perceptual bias effect at the old learning locations. In two experiments, transsaccadic learning remained location specific even when learning occurred at multiple locations (Experiment 1) or with the feature of size (Experiment 2) for which a transfer had previously been shown. Only in Experiment 3 was a transfer of the learning effect to new locations observable. Here, learning only took place for one object and not for several objects that had to be discriminated. Therefore, one can conclude that, when specific associations are learned for multiple objects, transsaccadic learning stays location specific and when a transsaccadic association is learned for only one object it allows a generalization to other locations.
Collapse
Affiliation(s)
- Corinna Osterbrink
- Department of Psychology and Cluster of Excellence Cognitive Interaction Technology, Bielefeld University, Bielefeld, Germany.,
| | - Arvid Herwig
- Department of Psychology, Bielefeld University, Bielefeld, Germany.,
| |
Collapse
|
6
|
Wang S, Linsley JW, Linsley DA, Lamstein J, Finkbeiner S. Fluorescently labeled nuclear morphology is highly informative of neurotoxicity. FRONTIERS IN TOXICOLOGY 2022; 4:935438. [PMID: 36093369 PMCID: PMC9449453 DOI: 10.3389/ftox.2022.935438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/27/2022] [Indexed: 11/16/2022] Open
Abstract
Neurotoxicity can be detected in live microscopy by morphological changes such as retraction of neurites, fragmentation, blebbing of the neuronal soma and ultimately the disappearance of fluorescently labeled neurons. However, quantification of these features is often difficult, low-throughput, and imprecise due to the overreliance on human curation. Recently, we showed that convolutional neural network (CNN) models can outperform human curators in the assessment of neuronal death from images of fluorescently labeled neurons, suggesting that there is information within the images that indicates toxicity but that is not apparent to the human eye. In particular, the CNN's decision strategy indicated that information within the nuclear region was essential for its superhuman performance. Here, we systematically tested this prediction by comparing images of fluorescent neuronal morphology from nuclear-localized fluorescent protein to those from freely diffused fluorescent protein for classifying neuronal death. We found that biomarker-optimized (BO-) CNNs could learn to classify neuronal death from fluorescent protein-localized nuclear morphology (mApple-NLS-CNN) alone, with super-human accuracy. Furthermore, leveraging methods from explainable artificial intelligence, we identified novel features within the nuclear-localized fluorescent protein signal that were indicative of neuronal death. Our findings suggest that the use of a nuclear morphology marker in live imaging combined with computational models such mApple-NLS-CNN can provide an optimal readout of neuronal death, a common result of neurotoxicity.
Collapse
Affiliation(s)
- Shijie Wang
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
| | - Jeremy W. Linsley
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
| | - Drew A. Linsley
- Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, United States
- Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, RI, United States
| | - Josh Lamstein
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
| | - Steven Finkbeiner
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
- Taube/Koret Center for Neurodegenerative Disease, Gladstone Institutes, San Francisco, CA, United States
- Departments of Neurology and Physiology, University of California, San Francisco, San Francisco, CA, United States
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA, United States
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
7
|
Pegado F. Written Language Acquisition Is Both Shaped by and Has an Impact on Brain Functioning and Cognition. Front Hum Neurosci 2022; 16:819956. [PMID: 35754773 PMCID: PMC9226919 DOI: 10.3389/fnhum.2022.819956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 05/06/2022] [Indexed: 11/18/2022] Open
Abstract
Spoken language is a distinctive trace of our species and it is naturally acquired during infancy. Written language, in contrast, is artificial, and the correspondences between arbitrary visual symbols and the spoken language for reading and writing should be explicitly learned with external help. In this paper, I present several examples of how written language acquisition is both shaped by and has an impact on brain function and cognition. They show in one hand how our phylogenetic legacy influences education and on the other hand how ontogenetic needs for education can rapidly subdue deeply rooted neurocognitive mechanisms. The understanding of this bidirectional influences provides a more dynamic view of how plasticity interfaces phylogeny and ontogeny in human learning, with implications for both neurosciences and education.
Collapse
Affiliation(s)
- Felipe Pegado
- Aix-Marseille University, CNRS, LPC, Marseille, France
| |
Collapse
|
8
|
Masarwa S, Kreichman O, Gilaie-Dotan S. Larger images are better remembered during naturalistic encoding. Proc Natl Acad Sci U S A 2022; 119:e2119614119. [PMID: 35046050 PMCID: PMC8794838 DOI: 10.1073/pnas.2119614119] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 12/03/2021] [Indexed: 11/18/2022] Open
Abstract
We are constantly exposed to multiple visual scenes, and while freely viewing them without an intentional effort to memorize or encode them, only some are remembered. It has been suggested that image memory is influenced by multiple factors, such as depth of processing, familiarity, and visual category. However, this is typically investigated when people are instructed to perform a task (e.g., remember or make some judgment about the images), which may modulate processing at multiple levels and thus, may not generalize to naturalistic visual behavior. Visual memory is assumed to rely on high-level visual perception that shows a level of size invariance and therefore is not assumed to be highly dependent on image size. Here, we reasoned that during naturalistic vision, free of task-related modulations, bigger images stimulate more visual system processing resources (from retina to cortex) and would, therefore, be better remembered. In an extensive set of seven experiments, naïve participants (n = 182) were asked to freely view presented images (sized 3° to 24°) without any instructed encoding task. Afterward, they were given a surprise recognition test (midsized images, 50% already seen). Larger images were remembered better than smaller ones across all experiments (∼20% higher accuracy or ∼1.5 times better). Memory was proportional to image size, faces were better remembered, and outdoors the least. Results were robust even when controlling for image set, presentation order, screen resolution, image scaling at test, or the amount of information. While multiple factors affect image memory, our results suggest that low- to high-level processes may all contribute to image memory.
Collapse
Affiliation(s)
- Shaimaa Masarwa
- School of Optometry and Vision Science, Faculty of Life Science, Bar Ilan University, Ramat Gan 5290002, Israel
- The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Olga Kreichman
- School of Optometry and Vision Science, Faculty of Life Science, Bar Ilan University, Ramat Gan 5290002, Israel
- The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Sharon Gilaie-Dotan
- School of Optometry and Vision Science, Faculty of Life Science, Bar Ilan University, Ramat Gan 5290002, Israel;
- The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan 5290002, Israel
- Institute of Cognitive Neuroscience, University College London, London WC1N 3AZ, United Kingdom
| |
Collapse
|
9
|
|
10
|
Biological convolutions improve DNN robustness to noise and generalisation. Neural Netw 2021; 148:96-110. [PMID: 35114495 DOI: 10.1016/j.neunet.2021.12.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 11/11/2021] [Accepted: 12/07/2021] [Indexed: 11/19/2022]
Abstract
Deep Convolutional Neural Networks (DNNs) have achieved superhuman accuracy on standard image classification benchmarks. Their success has reignited significant interest in their use as models of the primate visual system, bolstered by claims of their architectural and representational similarities. However, closer scrutiny of these models suggests that they rely on various forms of shortcut learning to achieve their impressive performance, such as using texture rather than shape information. Such superficial solutions to image recognition have been shown to make DNNs brittle in the face of more challenging tests such as noise-perturbed or out-of-distribution images, casting doubt on their similarity to their biological counterparts. In the present work, we demonstrate that adding fixed biological filter banks, in particular banks of Gabor filters, helps to constrain the networks to avoid reliance on shortcuts, making them develop more structured internal representations and more tolerance to noise. Importantly, they also gained around 20-35% improved accuracy when generalising to our novel out-of-distribution test image sets over standard end-to-end trained architectures. We take these findings to suggest that these properties of the primate visual system should be incorporated into DNNs to make them more able to cope with real-world vision and better capture some of the more impressive aspects of human visual perception such as generalisation.
Collapse
|
11
|
Blything R, Biscione V, Vankov II, Ludwig CJH, Bowers JS. The human visual system and CNNs can both support robust online translation tolerance following extreme displacements. J Vis 2021; 21:9. [PMID: 33620380 PMCID: PMC7910631 DOI: 10.1167/jov.21.2.9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs).
Collapse
Affiliation(s)
- Ryan Blything
- School of Psychological Science, University of Bristol, Bristol, UK.,
| | - Valerio Biscione
- School of Psychological Science, University of Bristol, Bristol, UK.,
| | - Ivan I Vankov
- Department of Cognitive Science and Psychology, Sofia, New Bulgarian University, Bulgaria.,
| | | | - Jeffrey S Bowers
- School of Psychological Science, University of Bristol, Bristol, UK.,
| |
Collapse
|
12
|
Kiyokawa H, Tashiro T, Yamauchi Y, Nagai T. Spatial Frequency Effective for Increasing Perceived Glossiness by Contrast Enhancement. Front Psychol 2021; 12:625135. [PMID: 33613400 PMCID: PMC7892470 DOI: 10.3389/fpsyg.2021.625135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 01/15/2021] [Indexed: 11/13/2022] Open
Abstract
It has been suggested that luminance edges in retinal images are potential cues for glossiness perception, particularly when the perception relies on low-luminance specular regions. However, a previous study has shown only statistical correlations between luminance edges and perceived glossiness, not their causal relations. Additionally, although specular components should be embedded at various spatial frequencies depending on the micro-roughness on the object surface, it is not well understood what spatial frequencies are essential for glossiness perception on objects with different micro-roughness. To address these issues, we examined the impact of a sub-band contrast enhancement on the perceived glossiness in the two conditions of stimuli: the Full condition where the stimulus had natural specular components and the Dark condition where it had specular components only in dark regions. Object images with various degrees of surface roughness were generated as stimuli, and their contrast was increased in various spatial-frequency sub-bands. The results indicate that the enhancement of the sub-band contrast can significantly increase perceived glossiness as expected. Furthermore, the effectiveness of each spatial frequency band depends on the surface roughness in the Full condition. However, effective spatial frequencies are constant at a middle spatial frequency regardless of the stimulus surface roughness in the Dark condition. These results suggest that, for glossiness perception, our visual system depends on specular-related information embedded in high spatial frequency components but may change the dependency on spatial frequency based on the surface luminance to be judged.
Collapse
Affiliation(s)
- Hiroaki Kiyokawa
- Department of Electrical Engineering and Informatics, Yamagata University, Yamagata, Japan.,Japan Society for the Promotion of Science, Tokyo, Japan
| | - Tomonori Tashiro
- Department of Informatics and Electronics, Yamagata University, Yamagata, Japan
| | - Yasuki Yamauchi
- Department of Informatics and Electronics, Yamagata University, Yamagata, Japan
| | - Takehiro Nagai
- Department of Information and Communications Engineering, Tokyo Institute of Technology, Yokohama, Japan
| |
Collapse
|
13
|
Abstract
In this article, I present a framework that would accommodate the classic ideas of visual information processing together with more recent computational approaches. I used the current knowledge about visual crowding, capacity limitations, attention, and saliency to place these phenomena within a standard neural network model. I suggest some revisions to traditional mechanisms of attention and feature integration that are required to fit better into this framework. The results allow us to explain some apparent theoretical controversies in vision research, suggesting a rationale for the limited spatial extent of crowding, a role of saliency in crowding experiments, and several amendments to the feature integration theory. The scheme can be elaborated or modified by future research.
Collapse
Affiliation(s)
- Endel Põder
- Institute of Psychology, University of Tartu, Tartu, Estonia
- www.ut.ee/~endelp/
| |
Collapse
|