1
|
Kaheni H, Shiran MB, Kamrava SK, Zare-Sadeghi A. Intra and inter-regional functional connectivity of the human brain due to Task-Evoked fMRI Data classification through CNN & LSTM. J Neuroradiol 2024:S0150-9861(24)00109-3. [PMID: 38408721 DOI: 10.1016/j.neurad.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 01/27/2024] [Accepted: 02/21/2024] [Indexed: 02/28/2024]
Abstract
BACKGROUND AND PURPOSE Olfaction is an early marker of neurodegenerative disease. Standard olfactory function is essential due to the importance of olfaction in human life. The psychophysical evaluation assesses the olfactory function commonly. It is patient-reported, and results rely on the patient's answers and collaboration. However, methodological difficulties attributed to the psychophysical evaluation of olfactory-related cerebral areas led to limited assessment of olfactory function in the human brain. MATERIALS AND METHODS The current study utilized clustering approaches to assess olfactory function in fMRI data and used brain activity to parcellate the brain with homogeneous properties. Deep neural network architecture based on ResNet convolutional neural networks (CNN) and Long Short-Term Model (LSTM) designed to classify healthy with olfactory disorders subjects. RESULTS The fMRI result obtained by k-means unsupervised machine learning model was within the expected outcome and similar to those found with the conn toolbox in detecting active areas. There was no significant difference between the means of subjects and every subject. Proposing a CRNN deep learning model to classify fMRI data in two different healthy and with olfactory disorders groups leads to an accuracy score of 97 %. CONCLUSIONS The K-means unsupervised algorithm can detect the active regions in the brain and analyze olfactory function. Classification results prove the CNN-LSTM architecture using ResNet provides the best accuracy score in olfactory fMRI data. It is the first attempt conducted on olfactory fMRI data in detail until now.
Collapse
Affiliation(s)
- Haniyeh Kaheni
- Finetech in Medicine Research Center, Department of Medical Physics, School of Medicine, Iran University of Medical Sciences (IUMS), Tehran, Iran
| | - Mohammad Bagher Shiran
- Finetech in Medicine Research Center, Department of Medical Physics, School of Medicine, Iran University of Medical Sciences (IUMS), Tehran, Iran
| | - Seyed Kamran Kamrava
- ENT and Head and Neck Research Center and Department, The Five Senses Health Institute, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Arash Zare-Sadeghi
- Finetech in Medicine Research Center, Department of Medical Physics, School of Medicine, Iran University of Medical Sciences (IUMS), Tehran, Iran.
| |
Collapse
|
2
|
Zhong M, Wen J, Ma J, Cui H, Zhang Q, Parizi MK. A hierarchical multi-leadership sine cosine algorithm to dissolving global optimization and data classification: The COVID-19 case study. Comput Biol Med 2023; 164:107212. [PMID: 37478712 DOI: 10.1016/j.compbiomed.2023.107212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 06/18/2023] [Accepted: 06/25/2023] [Indexed: 07/23/2023]
Abstract
The Sine Cosine Algorithm (SCA) is an outstanding optimizer that is appreciably used to dissolve complicated real-world problems. Nevertheless, this algorithm lacks sufficient population diversification and a sufficient balance between exploration and exploitation. So, effective techniques are required to tackle the SCA's fundamental shortcomings. Accordingly, the present paper suggests an improved version of SCA called Hierarchical Multi-Leadership SCA (HMLSCA) which uses an effective hierarchical multi-leadership search mechanism to lead the search process on multiple paths. The efficiency of the HMLSCA has been appraised and compared with a set of famous metaheuristic algorithms to dissolve the classical eighteen benchmark functions and thirty CEC 2017 test suites. The results demonstrate that the HMLSCA outperforms all compared algorithms and that the proposed algorithm provided a promising efficiency. Moreover, the HMLSCA was applied to handle the medicine data classification by optimizing the support vector machine's (SVM) parameters and feature weighting in eight datasets. The experiential outcomes verify the productivity of the HMLSCA with the highest classification accuracy and a gain scoring 1.00 Friedman mean rank versus the other evaluated metaheuristic algorithms. Furthermore, the proposed algorithm was used to diagnose COVID-19, in which it attained the topmost accuracy of 98% in diagnosing the infection on the COVID-19 dataset, which proves the performance of the proposed search strategy.
Collapse
Affiliation(s)
- Mingyang Zhong
- College of Artificial Intelligence, Southwest University, 400715, China.
| | - Jiahui Wen
- Defense Innovation Institute, 100085, China.
| | - Jingwei Ma
- School of Information Science and Engineering, Shandong Normal University, 250399, China.
| | - Hao Cui
- College of Artificial Intelligence, Southwest University, 400715, China.
| | - Qiuling Zhang
- College of Artificial Intelligence, Southwest University, 400715, China.
| | - Morteza Karimzadeh Parizi
- Department of Computer Engineering,Faculty of Shahid Chamran, Kerman Branch,Technical and Vocational University (TVU), Kerman, Iran.
| |
Collapse
|
3
|
EL-Omairi MA, El Garouani A. A review on advancements in lithological mapping utilizing machine learning algorithms and remote sensing data. Heliyon 2023; 9:e20168. [PMID: 37809824 PMCID: PMC10559961 DOI: 10.1016/j.heliyon.2023.e20168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 10/10/2023] Open
Abstract
Lithological mapping is a fundamental undertaking in geological research, mineral resource exploration, and environmental management. However, conventional methods for lithological mapping are often laborious and challenging, particularly in remote or inaccessible areas. Fortunately, a transformative solution has emerged through the integration of remote sensing and machine learning algorithms, providing an efficient and accurate means of deciphering the geological features of the Earth's crust. Remote sensing offers vast and comprehensive data across extensive geographical regions, while machine learning algorithms excel at recognizing intricate patterns and features in the data, enabling the classification of different lithological units. Compared to traditional methods, this approach is faster, more efficient, and remarkably accurate. The combination of remote sensing and machine learning presents numerous advantages, including the ability to amalgamate multiple data sources, provide up-to-date information on rapidly changing regions, and manage vast volumes of data. This review article delves into the invaluable contributions of remote sensing and machine learning algorithms to lithological mapping. It extensively explores diverse remote sensing datasets, such as Landsat, Sentinel-2, ASTER, and Hyperion data, which can be effectively harnessed for this purpose. Additionally, the study investigates a range of machine learning algorithms, including SVM, RF, and ANN, specifically tailored for lithological mapping. By scrutinizing practical use cases, the review underscores the strengths, limitations, and potential future developments of remote sensing and machine learning algorithms in the context of lithological mapping. Practical use cases have demonstrated the immense potential of machine learning algorithms, with the SVM classifier emerging as a reliable and accurate option for lithological mapping. Moreover, the choice of the most appropriate data source depends on the specific objectives of the application. Overall, the transformative potential of remote sensing and machine learning in lithological mapping cannot be overstated. This integrated approach not only enhances our understanding of geological features but also enables diverse applications in geological research and environmental management. With the promise of a more informed and sustainable future, the utilization of remote sensing and machine learning in lithological mapping represents a pivotal advancement in the field of geological sciences.
Collapse
Affiliation(s)
- Mohamed Ali EL-Omairi
- Functional Ecology and Environmental Engineering Laboratory, Sidi Mohamed Ben Abdellah University, 2202, Fez, B.P, Morocco
| | - Abdelkader El Garouani
- Functional Ecology and Environmental Engineering Laboratory, Sidi Mohamed Ben Abdellah University, 2202, Fez, B.P, Morocco
| |
Collapse
|
4
|
Zhang B, Hu S, Li M. Comparative study of multiple machine learning algorithms for risk level prediction in goaf. Heliyon 2023; 9:e19092. [PMID: 37636440 PMCID: PMC10448475 DOI: 10.1016/j.heliyon.2023.e19092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 08/29/2023] Open
Abstract
With the acceleration of the mining process, the goaf has become one of the main sources of danger in underground mines, seriously threatening the safe production of mines. To make an accurate prediction of the risk level of the goaf quickly, this paper optimizes the features of the goaf by correlation analysis and feature importance and constructs a combination of feature parameters for the risk level prediction of the goaf to solve the problem of redundancy of evaluation indexes. Multiple machine learning algorithms are applied to 121 sets of goaf data respectively, and the optimal algorithm and the best combination of feature parameters are obtained by evaluating the mining area with multiple indicators such as accuracy and kappa coefficient. The best combination of features parameters are ground-water, goaf layout, volume of goaf, goaf volume, span-height ratio, and mining disturbance, and the optimal algorithm is Extra Tree (ET), which needles the goaf risk level prediction problem with the accuracy of 94%. This model can be used to solve the problem of how to quickly and accurately predict the risk level of the goaf.
Collapse
Affiliation(s)
- Bin Zhang
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan, Hubei, 430070, China
| | - Shaohua Hu
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan, Hubei, 430070, China
| | - Moxiao Li
- School of Safety Science and Emergency Management, Wuhan University of Technology, Wuhan, Hubei, 430070, China
| |
Collapse
|
5
|
Paul V, Ramesh R, Sreeja P, Jarin T, Sujith Kumar PS, Ansar S, Ashraf GA, Pandey S, Said Z. Hybridization of long short-term memory with Sparrow Search Optimization model for water quality index prediction. Chemosphere 2022; 307:135762. [PMID: 35863408 DOI: 10.1016/j.chemosphere.2022.135762] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 07/09/2022] [Accepted: 07/14/2022] [Indexed: 06/15/2023]
Abstract
Water quality (WQ) analysis is a critical stage in water resource management and should be handled immediately in order to control pollutants that could have a negative influence on the ecosystem. The dramatic increase in population, the use of fertilizers and pesticides, and the industrial revolution have resulted in severe effects on the WQ environment. As a result, the prediction of WQ greatly helped to monitor water pollution. Accurate prediction of WQ is the foundation of managing water environments and is of high importance for protecting water environment. WQ data presents in the form of multi-variate time-sequence dataset. It is clear that the accuracy of predicting WQ will be enhanced when the multi-variate relation and time sequence dataset of WQ are fully utilized. This article presents the Water Quality Prediction utilising Sparrow Search Optimization with Hybrid Long Short-Term Memory (WQP-SSHLSTM) model. The presented WQP-SSHLSTM model intends to examine the data and classify WQ into distinct classes. To achieve this, the presented WQP-SSHLSTM model undergoes data scaling process to scale the input data into uniform format. Followed by, a hybrid long short-term memory-deep belief network (LSTM-DBN) technique is employed for the recognition and classification of WQ. Moreover, Sparrow search optimization algorithm (SSOA) is utilized as a hyperparameter optimizer of the proposed DBN-LSTM model. For demonstrating the enhanced outcomes of the presented WQP-SSHLSTM model, a sequence of experiments has been performed and the outcomes are reviewed under distinct prospects. The WQP-SSHLSTM model has achieved 99.84 percent accuracy, which is the maximum attainable. The simulation outcomes ensured the enhanced outcomes of the WQP-SSHLSTM model on recent methods.
Collapse
Affiliation(s)
- Vince Paul
- Dept. of Computer Science and Engineering, Eranad Knowledge City Technical Campus, Kerala, India
| | - R Ramesh
- DCA, Cochin University of Science and Technology, Kerala, India
| | - P Sreeja
- Department of EEE, KMEA Engineering College, Kerala, India
| | - T Jarin
- Department of EEE, Jyothi Engineering College, Kerala, India.
| | - P S Sujith Kumar
- Ilahia College of Engineering and Technology, Muvattupuzha, Kerala, India
| | - Sabah Ansar
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Saud University, P.O. Box 10219, Riyadh, 11433, Saudi Arabia
| | - Ghulam Abbas Ashraf
- Department of Physics, Zhejiang Normal University, Zhejiang, 321004, Jinhua, China.
| | - Sadanand Pandey
- Department of Chemistry, College of Natural Science, Yeungnam University, 280 Daehak-Ro, Gyeongsan, Gyeongbuk, 38541, Republic of Korea
| | - Zafar Said
- Department of Sustainable and Renewable Energy Engineering, University of Sharjah, 27272, Sharjah, United Arab Emirates; U.S.-Pakistan Center for Advanced Studies in Energy (USPCAS-E), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| |
Collapse
|
6
|
Yao J, Luo Y, Zhang Z, Li J, Li C, Li C, Guo Z, Wang L, Zhang W, Zhao H, Zhou L. The development of real-time digital PCR technology using an improved data classification method. Biosens Bioelectron 2021; 199:113873. [PMID: 34953301 DOI: 10.1016/j.bios.2021.113873] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/26/2021] [Accepted: 12/06/2021] [Indexed: 02/09/2023]
Abstract
For digital polymerase chain reaction (PCR), data classification is always a crucial task. The dynamic real-time amplification process information of each partition is always ignored in typical digital PCR analysis, which can easily lead to inaccurate outcomes. In this work, an integrated device that offers real-time chip-based digital PCR analysis was established. In addition, an enhanced process-based classification model (PAM) was built and trained. And then the device and the analytical model were employed in classification tasks for different concentrations of Epstein-Barr Virus (EBV) plasmid quantification assays. The results indicated that the real-time analysis device achieved a linearity of 0.97, the classification method was able to distinguish the false-positive curves, and the recognition error of positive wells was decreased by 64.4% compared with typical static analysis techniques when low concentrations of samples were tested. With these advantages, it is supposed that the real-time digital PCR analysis apparatus and the improved classification method can be employed to enhance the performance of digital PCR technology.
Collapse
Affiliation(s)
- Jia Yao
- School of Electronic and Information Engineering, Soochow University, Suzhou, 215006, China; CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Yuanyuan Luo
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Zhiqi Zhang
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Jinze Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Chuanyu Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Chao Li
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Zhen Guo
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Lirong Wang
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
| | - Wei Zhang
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China.
| | - Heming Zhao
- School of Electronic and Information Engineering, Soochow University, Suzhou, 215006, China.
| | - Lianqun Zhou
- CAS Key Lab of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China; School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China.
| |
Collapse
|
7
|
Reshi AA, Ashraf I, Rustam F, Shahzad HF, Mehmood A, Choi GS. Diagnosis of vertebral column pathologies using concatenated resampling with machine learning algorithms. PeerJ Comput Sci 2021; 7:e547. [PMID: 34395856 PMCID: PMC8323723 DOI: 10.7717/peerj-cs.547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 04/25/2021] [Indexed: 06/13/2023]
Abstract
Medical diagnosis through the classification of biomedical attributes is one of the exponentially growing fields in bioinformatics. Although a large number of approaches have been presented in the past, wide use and superior performance of the machine learning (ML) methods in medical diagnosis necessitates significant consideration for automatic diagnostic methods. This study proposes a novel approach called concatenated resampling (CR) to increase the efficacy of traditional ML algorithms. The performance is analyzed leveraging four ML approaches like tree-based ensemble approaches, and linear machine learning approach for automatic diagnosis of inter-vertebral pathologies with increased. Besides, undersampling, over-sampling, and proposed CR techniques have been applied to unbalanced training dataset to analyze the impact of these techniques on the accuracy of each of the classification model. Extensive experiments have been conducted to make comparisons among different classification models using several metrics including accuracy, precision, recall, and F 1 score. Comparative analysis has been performed on the experimental results to identify the best performing classifier along with the application of the re-sampling technique. The results show that the extra tree classifier achieves an accuracy of 0.99 in association with the proposed CR technique.
Collapse
Affiliation(s)
- Aijaz Ahmad Reshi
- College of Computer Science and Engineering, Department of Computer Science, Taibah University, Al Madinah Al Munawarah, Saudi Arabia
| | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongbuk, Gyeongsan-si, South Korea
| | - Furqan Rustam
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Hina Fatima Shahzad
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Arif Mehmood
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Gyu Sang Choi
- Information and Communication Engineering, Yeungnam University, Gyeongbuk, Gyeongsan-si, South Korea
| |
Collapse
|
8
|
Guerrero MC, Parada JS, Espitia HE. EEG signal analysis using classification techniques: Logistic regression, artificial neural networks, support vector machines, and convolutional neural networks. Heliyon 2021; 7:e07258. [PMID: 34159278 PMCID: PMC8203713 DOI: 10.1016/j.heliyon.2021.e07258] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 02/21/2021] [Accepted: 06/03/2021] [Indexed: 12/18/2022] Open
Abstract
Epilepsy is a brain abnormality that leads its patients to suffer from seizures, which conditions their behavior and lifestyle. Neurologists use an electroencephalogram (EEG) to diagnose this disease. This test illustrates the signaling behavior of a person's brain, allowing, among other things, the diagnosis of epilepsy. From a visual analysis of these signals, neurologists identify patterns such as peaks or valleys, looking for any indication of brain disorder that leads to the diagnosis of epilepsy in a purely qualitative way. However, by applying a test based on Fourier signal analysis through rapid transformation in the frequency domain, patterns can be quantitatively identified to differentiate patients diagnosed with the disease and others who are not. In this article, an analysis of the EEG signal is performed to extract characteristics in patients already classified as epileptic and non-epileptic, which will be used in the training of models based on classification techniques such as logistic regression, artificial neural networks, support vector machines, and convolutional neural networks. Based on the results obtained with each technique, an analysis is performed to decide which of these behaves better. In this study traditional classification techniques were implemented that had as data frequency data in the channels with distinctive information of EEG examinations, this was done through a feature extraction obtained with Fourier analysis considering frequency bands. The techniques used for classification were implemented in Python and through a comparison of metrics and performance, it was concluded that the best classification technique to characterize epileptic patients are artificial neural networks with an accuracy of 86%.
Collapse
|
9
|
De Lellis P, Nakayama S, Porfiri M. Using demographics toward efficient data classification in citizen science: a Bayesian approach. PeerJ Comput Sci 2019; 5:e239. [PMID: 33816892 PMCID: PMC7924415 DOI: 10.7717/peerj-cs.239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 10/26/2019] [Indexed: 06/12/2023]
Abstract
Public participation in scientific activities, often called citizen science, offers a possibility to collect and analyze an unprecedentedly large amount of data. However, diversity of volunteers poses a challenge to obtain accurate information when these data are aggregated. To overcome this problem, we propose a classification algorithm using Bayesian inference that harnesses diversity of volunteers to improve data accuracy. In the algorithm, each volunteer is grouped into a distinct class based on a survey regarding either their level of education or motivation to citizen science. We obtained the behavior of each class through a training set, which was then used as a prior information to estimate performance of new volunteers. By applying this approach to an existing citizen science dataset to classify images into categories, we demonstrate improvement in data accuracy, compared to the traditional majority voting. Our algorithm offers a simple, yet powerful, way to improve data accuracy under limited effort of volunteers by predicting the behavior of a class of individuals, rather than attempting at a granular description of each of them.
Collapse
Affiliation(s)
- Pietro De Lellis
- Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy
- Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA
| | - Shinnosuke Nakayama
- Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA
| | - Maurizio Porfiri
- Department of Mechanical and Aerospace Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA
- Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA
| |
Collapse
|
10
|
Shakhari S, Banerjee I. A multi-class classification system for continuous water quality monitoring. Heliyon 2019; 5:e01822. [PMID: 31193957 PMCID: PMC6545331 DOI: 10.1016/j.heliyon.2019.e01822] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 04/14/2019] [Accepted: 05/22/2019] [Indexed: 11/27/2022] Open
Abstract
The issue addressed in this exposition is the classification of multivariate data collected through different sensors for water quality monitoring. Multivariate data are sequences that have various attributes in every instance of the sequences. A few endeavours exist to address this issue; however, none of them has given full emphasis on continuous dataset. Another solution for this issue is to reduce the instances to a single attribute while losing significant information. Different arrangements address both the multivariate and the sequential part of the data yet give an un-versatile solution. The proposed algorithm is not only able to monitor continuous water quality, but it also produces a better classification model for other continuous datasets as well. Instead of decreasing the attributes of the dataset, we introduce three additional reference indicators which are dependent on the actual attributes. We compare the classification accuracy of our proposed algorithm with standard classification models. The proposed method gives better classification accuracy compared to existing methods.
Collapse
Affiliation(s)
- Swapan Shakhari
- Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, 711103, India
| | - Indrajit Banerjee
- Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, West Bengal, 711103, India
| |
Collapse
|
11
|
Carneiro MG, Cheng R, Zhao L, Jin Y. Particle swarm optimization for network-based data classification. Neural Netw 2019; 110:243-255. [PMID: 30616096 DOI: 10.1016/j.neunet.2018.12.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 10/20/2018] [Accepted: 12/04/2018] [Indexed: 11/28/2022]
Abstract
Complex networks provide a powerful tool for data representation due to its ability to describe the interplay between topological, functional, and dynamical properties of the input data. A fundamental process in network-based (graph-based) data analysis techniques is the network construction from original data usually in vector form. Here, a natural question is: How to construct an "optimal" network regarding a given processing goal? This paper investigates structural optimization in the context of network-based data classification tasks. To be specific, we propose a particle swarm optimization framework which is responsible for building a network from vector-based data set while optimizing a quality function driven by the classification accuracy. The classification process considers both topological and physical features of the training and test data and employing PageRank measure for classification according to the importance concept of a test instance to each class. Results on artificial and real-world problems reveal that data network generated using structural optimization provides better results in general than those generated by classical network formation methods. Moreover, this investigation suggests that other kinds of network-based machine learning and data mining tasks, such as dimensionality reduction and data clustering, can benefit from the proposed structural optimization method.
Collapse
Affiliation(s)
- Murillo G Carneiro
- Faculty of Computing, Federal University of Uberlândia, Uberlândia, MG, 38400-902, Brazil.
| | - Ran Cheng
- Shenzhen Key Laboratory of Computational Intelligence, University Key Laboratory of Evolving Intelligent Systems of Guangdong Province, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China.
| | - Liang Zhao
- Department of Computing and Mathematics, University of São Paulo, Ribeirão Preto, SP, 14040-901, Brazil.
| | - Yaochu Jin
- Department of Computer Science, University of Surrey, Guildford, GU2 7XH, UK.
| |
Collapse
|
12
|
Wei JX, Wang J, Zhu YX, Sun J, Xu HM, Li M. Traditional Chinese medicine pharmacovigilance in signal detection: decision tree-based data classification. BMC Med Inform Decis Mak 2018. [PMID: 29523131 PMCID: PMC5845291 DOI: 10.1186/s12911-018-0599-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Traditional Chinese Medicine (TCM) is a style of traditional medicine informed by modern medicine but built on a foundation of more than 2500 years of Chinese medical practice. According to statistics, TCM accounts for approximately 14% of total adverse drug reaction (ADR) spontaneous reporting data in China. Because of the complexity of the components in TCM formula, which makes it essentially different from Western medicine, it is critical to determine whether ADR reports of TCM should be analyzed independently. METHODS Reports in the Chinese spontaneous reporting database between 2010 and 2011 were selected. The dataset was processed and divided into the total sample (all data) and the subsample (including TCM data only). Four different ADR signal detection methods-PRR, ROR, MHRA and IC- currently widely used in China, were applied for signal detection on the two samples. By comparison of experimental results, three of them-PRR, MHRA and IC-were chosen to do the experiment. We designed several indicators for performance evaluation such as R (recall ratio), P (precision ratio), and D (discrepancy ratio) based on the reference database and then constructed a decision tree for data classification based on such indicators. RESULTS For PRR: R1-R2 = 0.72%, P1-P2 = 0.16% and D = 0.92%; For MHRA: R1-R2 = 0.97%, P1-P2 = 0.20% and D = 1.18%; For IC: R1-R2 = 1.44%, P2-P1 = 4.06% and D = 4.72%. The threshold of R,Pand Dis set as 2%, 2% and 3% respectively. Based on the decision tree, the results are "separation" for PRR, MHRA and IC. CONCLUSIONS In order to improve the efficiency and accuracy of signal detection, we suggest that TCM data should be separated from the total sample when conducting analyses.
Collapse
Affiliation(s)
- Jian-Xiang Wei
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China.
| | - Jing Wang
- School of Computer Science and Technology, School of Software, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yun-Xia Zhu
- School of Computer Science and Technology, School of Software, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Jun Sun
- Jiangsu Center for ADR Monitoring, Nanjing, 210002, China
| | - Hou-Ming Xu
- Jiangsu Center for ADR Monitoring, Nanjing, 210002, China
| | - Ming Li
- Jiangsu Center for ADR Monitoring, Nanjing, 210002, China
| |
Collapse
|
13
|
Lynch CM, Abdollahi B, Fuqua JD, de Carlo AR, Bartholomai JA, Balgemann RN, van Berkel VH, Frieboes HB. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 2017; 108:1-8. [PMID: 29132615 DOI: 10.1016/j.ijmedinf.2017.09.013] [Citation(s) in RCA: 109] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 08/29/2017] [Accepted: 09/23/2017] [Indexed: 12/20/2022]
Abstract
Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods.
Collapse
Affiliation(s)
- Chip M Lynch
- Department of Computer Engineering and Computer Science, University of Louisville, KY, USA
| | - Behnaz Abdollahi
- Department of Electrical and Computer Engineering, University of Louisville, KY, USA
| | - Joshua D Fuqua
- Department of Bioengineering, University of Louisville, KY, USA
| | | | | | | | - Victor H van Berkel
- Department of Cardiovascular and Thoracic Surgery, University of Louisville, KY, USA
| | - Hermann B Frieboes
- Department of Bioengineering, University of Louisville, KY, USA; James Graham Brown Cancer Center, University of Louisville, KY, USA.
| |
Collapse
|
14
|
Arabani M, Pirouz M. Water treatment plant site location using rough set theory. Environ Monit Assess 2015; 188:552. [PMID: 27613288 DOI: 10.1007/s10661-016-5539-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 08/22/2016] [Indexed: 06/06/2023]
Abstract
Currently, advanced methods have been developed to select an appropriate site for an engineering project. The ability to make a good decision in site selection can help the engineers to reduce the expensive costs, which are very important in large construction projects. In this paper, a new approach for site selection is presented. This method is based on rough set theory which is a mathematical theory presented by professor Pawlak. In this study, the results of the rough set decision-making are compared with the results of the regression method in a practical case study for the site location of a water treatment plant in Ardabil Province in the northwest of Iran, to demonstrate that the rough set theory provides a useful method for site selection. The results of practical studies indicate that using this method for site selection decision-making can reduce costs and prevent hazards that may happen due to civil engineering uncertainties.
Collapse
Affiliation(s)
- M Arabani
- Department of Civil Engineering, University of Guilan, Guilan, Islamic Republic of Iran
| | - M Pirouz
- University of Guilan, Guilan, Islamic Republic of Iran.
| |
Collapse
|