1
|
Eimen R, Pillai M, Scarpato KR, Bowden AK. Towards improved 3D reconstruction of cystoscopies through real-time feedback for frame reacquisition. BIOMEDICAL OPTICS EXPRESS 2024; 15:3394-3411. [PMID: 38855702 PMCID: PMC11161358 DOI: 10.1364/boe.523361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/15/2024] [Accepted: 04/16/2024] [Indexed: 06/11/2024]
Abstract
Cystoscopic video can be cumbersome to review; however, preservation of data in the form of 3D bladder reconstructions has the potential to improve patient care. Unfortunately, not all cystoscopy videos produce viable reconstructions, because their underlying frames contain artifacts such as motion blur and bladder debris, which consequently make them unusable for 3D reconstructions. Here, we develop a real-time pipeline, termed the Assessment and Feedback Pipeline (AFP), that alerts clinicians when unusable frames are detected and encourages them to recollect the last few seconds of data. We show that the AFP classifies frames as usable or unusable with a balanced accuracy of 81.60% and demonstrate that use of the AFP improves 3D reconstruction coverage. These results suggest that clinical implementation of the AFP would improve 3D reconstruction quality through real-time detection and recollection of unusable frames.
Collapse
Affiliation(s)
- Rachel Eimen
- Vanderbilt Biophotonics Center, Vanderbilt University, Nashville, TN 37232, USA
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37232, USA
| | - Mayaank Pillai
- Department of Computer Science, Vanderbilt University, Nashville, TN 37232, USA
| | - Kristen R. Scarpato
- Department of Urology, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Audrey K. Bowden
- Vanderbilt Biophotonics Center, Vanderbilt University, Nashville, TN 37232, USA
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37232, USA
- Department of Electrical Engineering, Vanderbilt University, Nashville, TN 37232, USA
| |
Collapse
|
2
|
Choi Y, Cha J, Choi S. Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES). BMC Bioinformatics 2024; 25:56. [PMID: 38308205 PMCID: PMC10837879 DOI: 10.1186/s12859-024-05677-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 01/26/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). RESULTS First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. CONCLUSIONS Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.
Collapse
Affiliation(s)
- Yongjun Choi
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea
| | - Junho Cha
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea
| | - Sungkyoung Choi
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
- Department of Mathematical Data Science, College of Science and Convergence Technology, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
| |
Collapse
|
3
|
Ensemble filters with harmonize PSO-SVM algorithm for optimal hearing disorder prediction. Neural Comput Appl 2023; 35:10473-10496. [PMID: 36747886 PMCID: PMC9894525 DOI: 10.1007/s00521-023-08244-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 01/06/2023] [Indexed: 02/05/2023]
Abstract
Discovering a hearing disorder at an earlier intervention is critical for reducing the effects of hearing loss and the approaches to increase the remaining hearing ability can be implemented to achieve the successful development of human communication. Recently, the explosive dataset features have increased the complexity for audiologists to decide the proper treatment for the patient. In most cases, data with irrelevant features and improper classifier parameters causes a crucial influence on the audiometry system in terms of accuracy. This is due to the dependent processes of these two, where the classification accuracy performance could be worsened if both processes are conducted independently. Although the filter algorithm is capable of eliminating irrelevant features, it still lacks the ability to consider feature reliance and results in a poor selection of significant features. Improper kernel parameter settings may also contribute to poor accuracy performance. In this paper, an ensemble filters feature selection based on Information Gain (IG), Gain Ratio (GR), Chi-squared (CS), and Relief-F (RF) with harmonize optimization of Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) is presented to mitigate these problems. Ensemble filters are utilized so that the initial top dominant features relevant for classification can be considered. Then, PSO and SVM are optimized simultaneously to achieve the optimal solution. The results on a standard Audiology dataset show that the proposed method produces 96.50% accuracy with optimal solution compared to classical SVM, which signifies the proposed method is effective in handling high dimensional data for hearing disorder prediction.
Collapse
|
4
|
Ebiaredoh-Mienye SA, Swart TG, Esenogho E, Mienye ID. A Machine Learning Method with Filter-Based Feature Selection for Improved Prediction of Chronic Kidney Disease. Bioengineering (Basel) 2022; 9:350. [PMID: 36004875 PMCID: PMC9405039 DOI: 10.3390/bioengineering9080350] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/06/2022] [Accepted: 07/21/2022] [Indexed: 11/25/2022] Open
Abstract
The high prevalence of chronic kidney disease (CKD) is a significant public health concern globally. The condition has a high mortality rate, especially in developing countries. CKD often go undetected since there are no obvious early-stage symptoms. Meanwhile, early detection and on-time clinical intervention are necessary to reduce the disease progression. Machine learning (ML) models can provide an efficient and cost-effective computer-aided diagnosis to assist clinicians in achieving early CKD detection. This research proposed an approach to effectively detect CKD by combining the information-gain-based feature selection technique and a cost-sensitive adaptive boosting (AdaBoost) classifier. An approach like this could save CKD screening time and cost since only a few clinical test attributes would be needed for the diagnosis. The proposed approach was benchmarked against recently proposed CKD prediction methods and well-known classifiers. Among these classifiers, the proposed cost-sensitive AdaBoost trained with the reduced feature set achieved the best classification performance with an accuracy, sensitivity, and specificity of 99.8%, 100%, and 99.8%, respectively. Additionally, the experimental results show that the feature selection positively impacted the performance of the various classifiers. The proposed approach has produced an effective predictive model for CKD diagnosis and could be applied to more imbalanced medical datasets for effective disease detection.
Collapse
Affiliation(s)
- Sarah A. Ebiaredoh-Mienye
- Center for Telecommunications, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; (S.A.E.-M.); (E.E.)
| | - Theo G. Swart
- Center for Telecommunications, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; (S.A.E.-M.); (E.E.)
| | - Ebenezer Esenogho
- Center for Telecommunications, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; (S.A.E.-M.); (E.E.)
| | - Ibomoiye Domor Mienye
- Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa;
| |
Collapse
|
5
|
Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health Inf Sci Syst 2022; 10:2. [PMID: 35178244 PMCID: PMC8828812 DOI: 10.1007/s13755-021-00168-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/15/2022] Open
Abstract
Type 2 Diabetes (T2D) is a chronic disease characterized by abnormally high blood glucose levels due to insulin resistance and reduced pancreatic insulin production. The challenge of this work is to identify T2D-associated features that can distinguish T2D sub-types for prognosis and treatment purposes. We thus employed machine learning (ML) techniques to categorize T2D patients using data from the Pima Indian Diabetes Dataset from the Kaggle ML repository. After data preprocessing, several feature selection techniques were used to extract feature subsets, and a range of classification techniques were used to analyze these. We then compared the derived classification results to identify the best classifiers by considering accuracy, kappa statistics, area under the receiver operating characteristic (AUROC), sensitivity, specificity, and logarithmic loss (logloss). To evaluate the performance of different classifiers, we investigated their outcomes using the summary statistics with a resampling distribution. Therefore, Generalized Boosted Regression modeling showed the highest accuracy (90.91%), followed by kappa statistics (78.77%) and specificity (85.19%). In addition, Sparse Distance Weighted Discrimination, Generalized Additive Model using LOESS and Boosted Generalized Additive Models also gave the maximum sensitivity (100%), highest AUROC (95.26%) and lowest logarithmic loss (30.98%) respectively. Notably, the Generalized Additive Model using LOESS was the top-ranked algorithm according to non-parametric Friedman testing. Of the features identified by these machine learning models, glucose levels, body mass index, diabetes pedigree function, and age were consistently identified as the best and most frequently accurate outcome predictors. These results indicate the utility of ML methods in constructing improved prediction models for T2D and successfully identified outcome predictors for this Pima Indian population.
Collapse
|
6
|
Yilmaz S, Fakhouri M, Koyutürk M, Çiçek AE, Tastan O. Uncovering complementary sets of variants for predicting quantitative phenotypes. Bioinformatics 2022; 38:908-917. [PMID: 34864867 DOI: 10.1093/bioinformatics/btab803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/21/2021] [Accepted: 11/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Genome-wide association studies show that variants in individual genomic loci alone are not sufficient to explain the heritability of complex, quantitative phenotypes. Many computational methods have been developed to address this issue by considering subsets of loci that can collectively predict the phenotype. This problem can be considered a challenging instance of feature selection in which the number of dimensions (loci that are screened) is much larger than the number of samples. While currently available methods can achieve decent phenotype prediction performance, they either do not scale to large datasets or have parameters that require extensive tuning. RESULTS We propose a fast and simple algorithm, Macarons, to select a small, complementary subset of variants by avoiding redundant pairs that are likely to be in linkage disequilibrium. Our method features two interpretable parameters that control the time/performance trade-off without requiring parameter tuning. In our computational experiments, we show that Macarons consistently achieves similar or better prediction performance than state-of-the-art selection methods while having a simpler premise and being at least two orders of magnitude faster. Overall, Macarons can seamlessly scale to the human genome with ∼107 variants in a matter of minutes while taking the dependencies between the variants into account. AVAILABILITYAND IMPLEMENTATION Macarons is available in Matlab and Python at https://github.com/serhan-yilmaz/macarons. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Serhan Yilmaz
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Mohamad Fakhouri
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, USA.,Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - A Ercüment Çiçek
- Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey.,Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| |
Collapse
|
7
|
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. AI applications in functional genomics. Comput Struct Biotechnol J 2021; 19:5762-5790. [PMID: 34765093 PMCID: PMC8566780 DOI: 10.1016/j.csbj.2021.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 10/05/2021] [Accepted: 10/05/2021] [Indexed: 12/13/2022] Open
Abstract
We review the current applications of artificial intelligence (AI) in functional genomics. The recent explosion of AI follows the remarkable achievements made possible by "deep learning", along with a burst of "big data" that can meet its hunger. Biology is about to overthrow astronomy as the paradigmatic representative of big data producer. This has been made possible by huge advancements in the field of high throughput technologies, applied to determine how the individual components of a biological system work together to accomplish different processes. The disciplines contributing to this bulk of data are collectively known as functional genomics. They consist in studies of: i) the information contained in the DNA (genomics); ii) the modifications that DNA can reversibly undergo (epigenomics); iii) the RNA transcripts originated by a genome (transcriptomics); iv) the ensemble of chemical modifications decorating different types of RNA transcripts (epitranscriptomics); v) the products of protein-coding transcripts (proteomics); and vi) the small molecules produced from cell metabolism (metabolomics) present in an organism or system at a given time, in physiological or pathological conditions. After reviewing main applications of AI in functional genomics, we discuss important accompanying issues, including ethical, legal and economic issues and the importance of explainability.
Collapse
Affiliation(s)
- Claudia Caudai
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Antonella Galizia
- CNR, Institute of Applied Mathematics and Information Technologies (IMATI), Genoa, Italy
| | - Filippo Geraci
- CNR, Institute for Informatics and Telematics (IIT), Pisa, Italy
| | - Loredana Le Pera
- CNR, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Bari, Italy
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Veronica Morea
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Emanuele Salerno
- CNR, Institute of Information Science and Technologies “A. Faedo” (ISTI), Pisa, Italy
| | - Allegra Via
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| | - Teresa Colombo
- CNR, Institute of Molecular Biology and Pathology (IBPM), Rome, Italy
| |
Collapse
|
8
|
Endalie D, Tegegne T. Designing a hybrid dimension reduction for improving the performance of Amharic news document classification. PLoS One 2021; 16:e0251902. [PMID: 34019571 PMCID: PMC8139506 DOI: 10.1371/journal.pone.0251902] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/05/2021] [Indexed: 11/18/2022] Open
Abstract
The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.
Collapse
Affiliation(s)
- Demeke Endalie
- Factuality of computing and Informatics, Jimma institute of technology, Jimma, Ethiopia
- * E-mail:
| | - Tesfa Tegegne
- Factuality of computing, Bahir Dar Institute of Technology, Bahir Dar, Ethiopia
| |
Collapse
|
9
|
Using Class-Specific Feature Selection for Cancer Detection with Gene Expression Profile Data of Platelets. SENSORS 2020; 20:s20051528. [PMID: 32164283 PMCID: PMC7085688 DOI: 10.3390/s20051528] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 03/04/2020] [Accepted: 03/07/2020] [Indexed: 12/16/2022]
Abstract
A novel multi-classification method, which integrates the elastic net and probabilistic support vector machine, was proposed to solve this problem in cancer detection with gene expression profile data of platelets, whose problems mainly are a kind of multi-class classification problem with high dimension, small samples, and collinear data. The strategy of one-against-all (OVA) was employed to decompose the multi-classification problem into a series of binary classification problems. The elastic net was used to select class-specific features for the binary classification problems, and the probabilistic support vector machine was used to make the outputs of the binary classifiers with class-specific features comparable. Simulation data and gene expression profile data were intended to verify the effectiveness of the proposed method. Results indicate that the proposed method can automatically select class-specific features and obtain better performance of classification than that of the conventional multi-class classification methods, which are mainly based on global feature selection methods. This study indicates the proposed method is suitable for general multi-classification problems featured with high-dimension, small samples, and collinear data.
Collapse
|
10
|
Nagi S, Bhattacharyya DK. Classification of microarray cancer data using ensemble approach. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/s13721-013-0034-x] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
11
|
|
12
|
Linear B cell epitope prediction for epitope vaccine design against meningococcal disease and their computational validations through physicochemical properties. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/s13721-012-0019-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
13
|
|