1
|
Moradi E, Pepe A, Gaser C, Huttunen H, Tohka J. Machine learning framework for early MRI-based Alzheimer's conversion prediction in MCI subjects. Neuroimage 2015; 104:398-412. [PMID: 25312773 PMCID: PMC5957071 DOI: 10.1016/j.neuroimage.2014.10.002] [Citation(s) in RCA: 382] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 09/16/2014] [Accepted: 10/01/2014] [Indexed: 01/20/2023] Open
Abstract
Mild cognitive impairment (MCI) is a transitional stage between age-related cognitive decline and Alzheimer's disease (AD). For the effective treatment of AD, it would be important to identify MCI patients at high risk for conversion to AD. In this study, we present a novel magnetic resonance imaging (MRI)-based method for predicting the MCI-to-AD conversion from one to three years before the clinical diagnosis. First, we developed a novel MRI biomarker of MCI-to-AD conversion using semi-supervised learning and then integrated it with age and cognitive measures about the subjects using a supervised learning algorithm resulting in what we call the aggregate biomarker. The novel characteristics of the methods for learning the biomarkers are as follows: 1) We used a semi-supervised learning method (low density separation) for the construction of MRI biomarker as opposed to more typical supervised methods; 2) We performed a feature selection on MRI data from AD subjects and normal controls without using data from MCI subjects via regularized logistic regression; 3) We removed the aging effects from the MRI data before the classifier training to prevent possible confounding between AD and age related atrophies; and 4) We constructed the aggregate biomarker by first learning a separate MRI biomarker and then combining it with age and cognitive measures about the MCI subjects at the baseline by applying a random forest classifier. We experimentally demonstrated the added value of these novel characteristics in predicting the MCI-to-AD conversion on data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. With the ADNI data, the MRI biomarker achieved a 10-fold cross-validated area under the receiver operating characteristic curve (AUC) of 0.7661 in discriminating progressive MCI patients (pMCI) from stable MCI patients (sMCI). Our aggregate biomarker based on MRI data together with baseline cognitive measurements and age achieved a 10-fold cross-validated AUC score of 0.9020 in discriminating pMCI from sMCI. The results presented in this study demonstrate the potential of the suggested approach for early AD diagnosis and an important role of MRI in the MCI-to-AD conversion prediction. However, it is evident based on our results that combining MRI data with cognitive test results improved the accuracy of the MCI-to-AD conversion prediction.
Collapse
|
Research Support, N.I.H., Extramural |
10 |
382 |
2
|
Deepak S, Ameer PM. Brain tumor classification using deep CNN features via transfer learning. Comput Biol Med 2019; 111:103345. [PMID: 31279167 DOI: 10.1016/j.compbiomed.2019.103345] [Citation(s) in RCA: 303] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/26/2019] [Accepted: 06/26/2019] [Indexed: 11/28/2022]
Abstract
Brain tumor classification is an important problem in computer-aided diagnosis (CAD) for medical applications. This paper focuses on a 3-class classification problem to differentiate among glioma, meningioma and pituitary tumors, which form three prominent types of brain tumor. The proposed classification system adopts the concept of deep transfer learning and uses a pre-trained GoogLeNet to extract features from brain MRI images. Proven classifier models are integrated to classify the extracted features. The experiment follows a patient-level five-fold cross-validation process, on MRI dataset from figshare. The proposed system records a mean classification accuracy of 98%, outperforming all state-of-the-art methods. Other performance measures used in the study are the area under the curve (AUC), precision, recall, F-score and specificity. In addition, the paper addresses a practical aspect by evaluating the system with fewer training samples. The observations of the study imply that transfer learning is a useful technique when the availability of medical images is limited. The paper provides an analytical discussion on misclassifications also.
Collapse
|
Journal Article |
6 |
303 |
3
|
Ramgopal S, Thome-Souza S, Jackson M, Kadish NE, Sánchez Fernández I, Klehm J, Bosl W, Reinsberger C, Schachter S, Loddenkemper T. Seizure detection, seizure prediction, and closed-loop warning systems in epilepsy. Epilepsy Behav 2014; 37:291-307. [PMID: 25174001 DOI: 10.1016/j.yebeh.2014.06.023] [Citation(s) in RCA: 219] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Revised: 06/04/2014] [Accepted: 06/10/2014] [Indexed: 12/16/2022]
Abstract
Nearly one-third of patients with epilepsy continue to have seizures despite optimal medication management. Systems employed to detect seizures may have the potential to improve outcomes in these patients by allowing more tailored therapies and might, additionally, have a role in accident and SUDEP prevention. Automated seizure detection and prediction require algorithms which employ feature computation and subsequent classification. Over the last few decades, methods have been developed to detect seizures utilizing scalp and intracranial EEG, electrocardiography, accelerometry and motion sensors, electrodermal activity, and audio/video captures. To date, it is unclear which combination of detection technologies yields the best results, and approaches may ultimately need to be individualized. This review presents an overview of seizure detection and related prediction methods and discusses their potential uses in closed-loop warning systems in epilepsy.
Collapse
|
Case Reports |
11 |
219 |
4
|
Chen W, Feng PM, Deng EZ, Lin H, Chou KC. iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem 2014; 462:76-83. [PMID: 25016190 DOI: 10.1016/j.ab.2014.06.022] [Citation(s) in RCA: 196] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 06/26/2014] [Accepted: 06/27/2014] [Indexed: 01/25/2023]
Abstract
Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called "iTIS-PseTNC," was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
196 |
5
|
Accurate multimodal probabilistic prediction of conversion to Alzheimer's disease in patients with mild cognitive impairment. NEUROIMAGE-CLINICAL 2013; 2:735-45. [PMID: 24179825 PMCID: PMC3777690 DOI: 10.1016/j.nicl.2013.05.004] [Citation(s) in RCA: 168] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Revised: 05/07/2013] [Accepted: 05/08/2013] [Indexed: 01/23/2023]
Abstract
Accurately identifying the patients that have mild cognitive impairment (MCI) who will go on to develop Alzheimer's disease (AD) will become essential as new treatments will require identification of AD patients at earlier stages in the disease process. Most previous work in this area has centred around the same automated techniques used to diagnose AD patients from healthy controls, by coupling high dimensional brain image data or other relevant biomarker data to modern machine learning techniques. Such studies can now distinguish between AD patients and controls as accurately as an experienced clinician. Models trained on patients with AD and control subjects can also distinguish between MCI patients that will convert to AD within a given timeframe (MCI-c) and those that remain stable (MCI-s), although differences between these groups are smaller and thus, the corresponding accuracy is lower. The most common type of classifier used in these studies is the support vector machine, which gives categorical class decisions. In this paper, we introduce Gaussian process (GP) classification to the problem. This fully Bayesian method produces naturally probabilistic predictions, which we show correlate well with the actual chances of converting to AD within 3 years in a population of 96 MCI-s and 47 MCI-c subjects. Furthermore, we show that GPs can integrate multimodal data (in this study volumetric MRI, FDG-PET, cerebrospinal fluid, and APOE genotype with the classification process through the use of a mixed kernel). The GP approach aids combination of different data sources by learning parameters automatically from training data via type-II maximum likelihood, which we compare to a more conventional method based on cross validation and an SVM classifier. When the resulting probabilities from the GP are dichotomised to produce a binary classification, the results for predicting MCI conversion based on the combination of all three types of data show a balanced accuracy of 74%. This is a substantially higher accuracy than could be obtained using any individual modality or using a multikernel SVM, and is competitive with the highest accuracy yet achieved for predicting conversion within three years on the widely used ADNI dataset.
Prediction of MCI to AD conversion using ADNI data and Gaussian processes. 74% accuracy, 0.795 area under ROC curve for predicting conversion within 3 years. Gaussian processes allow automatic parameter tuning including multimodal weights. Statistically significant improvement for multimodal vs best unimodal prediction. Probabilistic interpretation of results to better reflect continuum of disease.
Collapse
|
Journal Article |
12 |
168 |
6
|
Wang J, Wu CJ, Bao ML, Zhang J, Wang XN, Zhang YD. Machine learning-based analysis of MR radiomics can help to improve the diagnostic performance of PI-RADS v2 in clinically relevant prostate cancer. Eur Radiol 2017; 27:4082-4090. [PMID: 28374077 DOI: 10.1007/s00330-017-4800-5] [Citation(s) in RCA: 162] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Accepted: 03/13/2017] [Indexed: 12/22/2022]
Abstract
OBJECTIVE To investigate whether machine learning-based analysis of MR radiomics can help improve the performance PI-RADS v2 in clinically relevant prostate cancer (PCa). METHODS This IRB-approved study included 54 patients with PCa undergoing multi-parametric (mp) MRI before prostatectomy. Imaging analysis was performed on 54 tumours, 47 normal peripheral (PZ) and 48 normal transitional (TZ) zone based on histological-radiological correlation. Mp-MRI was scored via PI-RADS, and quantified by measuring radiomic features. Predictive model was developed using a novel support vector machine trained with: (i) radiomics, (ii) PI-RADS scores, (iii) radiomics and PI-RADS scores. Paired comparison was made via ROC analysis. RESULTS For PCa versus normal TZ, the model trained with radiomics had a significantly higher area under the ROC curve (Az) (0.955 [95% CI 0.923-0.976]) than PI-RADS (Az: 0.878 [0.834-0.914], p < 0.001). The Az between them was insignificant for PCa versus PZ (0.972 [0.945-0.988] vs. 0.940 [0.905-0.965], p = 0.097). When radiomics was added, performance of PI-RADS was significantly improved for PCa versus PZ (Az: 0.983 [0.960-0.995]) and PCa versus TZ (Az: 0.968 [0.940-0.985]). CONCLUSION Machine learning analysis of MR radiomics can help improve the performance of PI-RADS in clinically relevant PCa. KEY POINTS • Machine-based analysis of MR radiomics outperformed in TZ cancer against PI-RADS. • Adding MR radiomics significantly improved the performance of PI-RADS. • DKI-derived Dapp and Kapp were two strong markers for the diagnosis of PCa.
Collapse
|
Journal Article |
8 |
162 |
7
|
Zhao YQ, Zeng D, Laber EB, Kosorok MR. New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes. J Am Stat Assoc 2015; 110:583-598. [PMID: 26236062 PMCID: PMC4517946 DOI: 10.1080/01621459.2014.937488] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules. Simulation results suggest the proposed methods produce superior DTRs compared with Q-learning especially in small samples. We illustrate the methods using data from a clinical trial for smoking cessation.
Collapse
|
research-article |
10 |
136 |
8
|
Schmitter D, Roche A, Maréchal B, Ribes D, Abdulkadir A, Bach-Cuadra M, Daducci A, Granziera C, Klöppel S, Maeder P, Meuli R, Krueger G. An evaluation of volume-based morphometry for prediction of mild cognitive impairment and Alzheimer's disease. NEUROIMAGE-CLINICAL 2014; 7:7-17. [PMID: 25429357 PMCID: PMC4238047 DOI: 10.1016/j.nicl.2014.11.001] [Citation(s) in RCA: 134] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Revised: 06/17/2014] [Accepted: 11/04/2014] [Indexed: 01/10/2023]
Abstract
Voxel-based morphometry from conventional T1-weighted images has proved effective to quantify Alzheimer's disease (AD) related brain atrophy and to enable fairly accurate automated classification of AD patients, mild cognitive impaired patients (MCI) and elderly controls. Little is known, however, about the classification power of volume-based morphometry, where features of interest consist of a few brain structure volumes (e.g. hippocampi, lobes, ventricles) as opposed to hundreds of thousands of voxel-wise gray matter concentrations. In this work, we experimentally evaluate two distinct volume-based morphometry algorithms (FreeSurfer and an in-house algorithm called MorphoBox) for automatic disease classification on a standardized data set from the Alzheimer's Disease Neuroimaging Initiative. Results indicate that both algorithms achieve classification accuracy comparable to the conventional whole-brain voxel-based morphometry pipeline using SPM for AD vs elderly controls and MCI vs controls, and higher accuracy for classification of AD vs MCI and early vs late AD converters, thereby demonstrating the potential of volume-based morphometry to assist diagnosis of mild cognitive impairment and Alzheimer's disease.
Collapse
|
Research Support, Non-U.S. Gov't |
11 |
134 |
9
|
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N 6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562:59-65. [PMID: 30201554 DOI: 10.1016/j.ab.2018.09.002] [Citation(s) in RCA: 133] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 08/31/2018] [Accepted: 09/03/2018] [Indexed: 01/28/2023]
Abstract
As a prevalent post-transcriptional modification, N6-methyladenosine (m6A) plays key roles in a series of biological processes. Although experimental technologies have been developed and applied to identify m6A sites, they are still cost-ineffective for transcriptome-wide detections of m6A. As good complements to the experimental techniques, some computational methods have been proposed to identify m6A sites. However, their performance remains unsatisfactory. In this study, we firstly proposed an Euclidean distance based method to construct a high quality benchmark dataset. By encoding the RNA sequences using pseudo nucleotide composition, a new predictor called iRNA(m6A)-PseDNC was developed to identify m6A sites in the Saccharomyces cerevisiae genome. It has been demonstrated by the 10-fold cross validation test that the performance of iRNA(m6A)-PseDNC is superior to the existing methods. Meanwhile, for the convenience of most experimental scientists, established at the site http://lin-group.cn/server/iRNA(m6A)-PseDNC.php is its web-server, by which users can easily get their desired results without need to go through the detailed mathematics. It is anticipated that iRNA(m6A)-PseDNC will become a useful high throughput tool for identifying m6A sites in the S. cerevisiae genome.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
133 |
10
|
Zou Q, Wan S, Ju Y, Tang J, Zeng X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC SYSTEMS BIOLOGY 2016; 10:114. [PMID: 28155714 PMCID: PMC5259984 DOI: 10.1186/s12918-016-0353-5] [Citation(s) in RCA: 130] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Background It is necessary and essential to discovery protein function from the novel primary sequences. Wet lab experimental procedures are not only time-consuming, but also costly, so predicting protein structure and function reliably based only on amino acid sequence has significant value. TATA-binding protein (TBP) is a kind of DNA binding protein, which plays a key role in the transcription regulation. Our study proposed an automatic approach for identifying TATA-binding proteins efficiently, accurately, and conveniently. This method would guide for the special protein identification with computational intelligence strategies. Results Firstly, we proposed novel fingerprint features for TBP based on pseudo amino acid composition, physicochemical properties, and secondary structure. Secondly, hierarchical features dimensionality reduction strategies were employed to improve the performance furthermore. Currently, Pretata achieves 92.92% TATA-binding protein prediction accuracy, which is better than all other existing methods. Conclusions The experiments demonstrate that our method could greatly improve the prediction accuracy and speed, thus allowing large-scale NGS data prediction to be practical. A web server is developed to facilitate the other researchers, which can be accessed at http://server.malab.cn/preTata/.
Collapse
|
Journal Article |
9 |
130 |
11
|
Abbasi M, El Hanandeh A. Forecasting municipal solid waste generation using artificial intelligence modelling approaches. WASTE MANAGEMENT (NEW YORK, N.Y.) 2016; 56:13-22. [PMID: 27297046 DOI: 10.1016/j.wasman.2016.05.018] [Citation(s) in RCA: 113] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Revised: 04/16/2016] [Accepted: 05/18/2016] [Indexed: 05/20/2023]
Abstract
Municipal solid waste (MSW) management is a major concern to local governments to protect human health, the environment and to preserve natural resources. The design and operation of an effective MSW management system requires accurate estimation of future waste generation quantities. The main objective of this study was to develop a model for accurate forecasting of MSW generation that helps waste related organizations to better design and operate effective MSW management systems. Four intelligent system algorithms including support vector machine (SVM), adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and k-nearest neighbours (kNN) were tested for their ability to predict monthly waste generation in the Logan City Council region in Queensland, Australia. Results showed artificial intelligence models have good prediction performance and could be successfully applied to establish municipal solid waste forecasting models. Using machine learning algorithms can reliably predict monthly MSW generation by training with waste generation time series. In addition, results suggest that ANFIS system produced the most accurate forecasts of the peaks while kNN was successful in predicting the monthly averages of waste quantities. Based on the results, the total annual MSW generated in Logan City will reach 9.4×10(7)kg by 2020 while the peak monthly waste will reach 9.37×10(6)kg.
Collapse
|
|
9 |
113 |
12
|
Kim S, Jhong JH, Lee J, Koo JY. Meta-analytic support vector machine for integrating multiple omics data. BioData Min 2017; 10:2. [PMID: 28149325 PMCID: PMC5270233 DOI: 10.1186/s13040-017-0126-8] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 01/11/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Of late, high-throughput microarray and sequencing data have been extensively used to monitor biomarkers and biological processes related to many diseases. Under this circumstance, the support vector machine (SVM) has been popularly used and been successful for gene selection in many applications. Despite surpassing benefits of the SVMs, single data analysis using small- and mid-size of data inevitably runs into the problem of low reproducibility and statistical power. To address this problem, we propose a meta-analytic support vector machine (Meta-SVM) that can accommodate multiple omics data, making it possible to detect consensus genes associated with diseases across studies. RESULTS Experimental studies show that the Meta-SVM is superior to the existing meta-analysis method in detecting true signal genes. In real data applications, diverse omics data of breast cancer (TCGA) and mRNA expression data of lung disease (idiopathic pulmonary fibrosis; IPF) were applied. As a result, we identified gene sets consistently associated with the diseases across studies. In particular, the ascertained gene set of TCGA omics data was found to be significantly enriched in the ABC transporters pathways well known as critical for the breast cancer mechanism. CONCLUSION The Meta-SVM effectively achieves the purpose of meta-analysis as jointly leveraging multiple omics data, and facilitates identifying potential biomarkers and elucidating the disease process.
Collapse
|
Journal Article |
8 |
99 |
13
|
Park Y, Cho KH, Park J, Cha SM, Kim JH. Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea. THE SCIENCE OF THE TOTAL ENVIRONMENT 2015; 502:31-41. [PMID: 25241206 DOI: 10.1016/j.scitotenv.2014.09.005] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 08/08/2014] [Accepted: 09/01/2014] [Indexed: 06/03/2023]
Abstract
Chlorophyll-a (Chl-a) is a direct indicator used to evaluate the ecological state of a waterbody, such as algal blooms that degrade the water quality in lakes, reservoirs and estuaries. In this study, artificial neural network (ANN) and support vector machine (SVM) were used to predict Chl-a concentration for the early warning in the Juam Reservoir and Yeongsan Reservoir, which are located in an upstream region (freshwater reservoir) and downstream region (estuarine reservoir), respectively. Weekly water quality data and meteorological data for a 7-year period were used to train and validate both the ANN and SVM models. The Latin-hypercube one-factor-at-a-time (LH-OAT) method and a pattern search algorithm were applied to perform sensitivity analyses for the input variables and to optimize the parameters of the two models, respectively. Results revealed that the two models well-reproduced the temporal variation of Chl-a based on the weekly input variables. In particular, the SVM model showed better performance than the ANN model, displaying a higher prediction accuracy in the validation step. The Williams-Kloot test and sensitivity analysis demonstrated that the SVM model was superior for predicting Chl-a in terms of prediction accuracy and description of the cause-and-effect relationship between Chl-a concentration and environmental variables in both the Juam Reservoir and Yeongsan Reservoir. Furthermore, a 7-day interval was determined as an efficient early warning interval in the two reservoirs. As such, this study suggested an effective early-warning prediction method for Chl-a concentration and improved the eutrophication management scheme for reservoirs.
Collapse
|
|
10 |
98 |
14
|
Shan J, Zhao J, Liu L, Zhang Y, Wang X, Wu F. A novel way to rapidly monitor microplastics in soil by hyperspectral imaging technology and chemometrics. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2018; 238:121-129. [PMID: 29554560 DOI: 10.1016/j.envpol.2018.03.026] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 03/08/2018] [Accepted: 03/09/2018] [Indexed: 06/08/2023]
Abstract
Hyperspectral imaging technology has been investigated as a possible way to detect microplastics contamination in soil directly and efficiently in this study. Hyperspectral images with wavelength range between 400 and 1000 nm were obtained from soil samples containing different materials including microplastics, fresh leaves, wilted leaves, rocks and dry branches. Supervised classification algorithms such as support vector machine (SVM), mahalanobis distance (MD) and maximum likelihood (ML) algorithms were used to identify microplastics from the other materials in hyperspectral images. To investigate the effect of particle size and color, white polyethylene (PE) and black PE particles extracted from soil with two different particle size ranges (1-5 mm and 0.5-1 mm) were studied in this work. The results showed that SVM was the most applicable method for detecting white PE in soil, with the precision of 84% and 77% for PE particles in size ranges of 1-5 mm and 0.5-1 mm respectively. The precision of black PE detection achieved by SVM were 58% and 76% for particles of 1-5 mm and 0.5-1 mm respectively. Six kinds of household polymers including drink bottle, bottle cap, rubber, packing bag, clothes hanger and plastic clip were used to validate the developed method, and the classification precision of polymers were obtained from 79% to 100% and 86%-99% for microplastics particle 1-5 mm and 0.5-1 mm respectively. The results indicate that hyperspectral imaging technology is a potential technique to determine and visualize the microplastics with particle size from 0.5 to 5 mm on soil surface directly.
Collapse
|
|
7 |
96 |
15
|
Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions. J Med Biol Eng 2017; 38:244-260. [PMID: 29670502 PMCID: PMC5897457 DOI: 10.1007/s40846-017-0297-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 05/09/2017] [Indexed: 12/12/2022]
Abstract
The increasing amount of data in biomechanics research has greatly increased the importance of developing advanced multivariate analysis and machine learning techniques, which are better able to handle “big data”. Consequently, advances in data science methods will expand the knowledge for testing new hypotheses about biomechanical risk factors associated with walking and running gait-related musculoskeletal injury. This paper begins with a brief introduction to an automated three-dimensional (3D) biomechanical gait data collection system: 3D GAIT, followed by how the studies in the field of gait biomechanics fit the quantities in the 5 V’s definition of big data: volume, velocity, variety, veracity, and value. Next, we provide a review of recent research and development in multivariate and machine learning methods-based gait analysis that can be applied to big data analytics. These modern biomechanical gait analysis methods include several main modules such as initial input features, dimensionality reduction (feature selection and extraction), and learning algorithms (classification and clustering). Finally, a promising big data exploration tool called “topological data analysis” and directions for future research are outlined and discussed.
Collapse
|
Journal Article |
8 |
95 |
16
|
Li Q, Rajagopalan C, Clifford GD. A machine learning approach to multi-level ECG signal quality classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 117:435-447. [PMID: 25306242 DOI: 10.1016/j.cmpb.2014.09.002] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 09/06/2014] [Accepted: 09/09/2014] [Indexed: 06/04/2023]
Abstract
Current electrocardiogram (ECG) signal quality assessment studies have aimed to provide a two-level classification: clean or noisy. However, clinical usage demands more specific noise level classification for varying applications. This work outlines a five-level ECG signal quality classification algorithm. A total of 13 signal quality metrics were derived from segments of ECG waveforms, which were labeled by experts. A support vector machine (SVM) was trained to perform the classification and tested on a simulated dataset and was validated using data from the MIT-BIH arrhythmia database (MITDB). The simulated training and test datasets were created by selecting clean segments of the ECG in the 2011 PhysioNet/Computing in Cardiology Challenge database, and adding three types of real ECG noise at different signal-to-noise ratio (SNR) levels from the MIT-BIH Noise Stress Test Database (NSTDB). The MITDB was re-annotated for five levels of signal quality. Different combinations of the 13 metrics were trained and tested on the simulated datasets and the best combination that produced the highest classification accuracy was selected and validated on the MITDB. Performance was assessed using classification accuracy (Ac), and a single class overlap accuracy (OAc), which assumes that an individual type classified into an adjacent class is acceptable. An Ac of 80.26% and an OAc of 98.60% on the test set were obtained by selecting 10 metrics while 57.26% (Ac) and 94.23% (OAc) were the numbers for the unseen MITDB validation data without retraining. By performing the fivefold cross validation, an Ac of 88.07±0.32% and OAc of 99.34±0.07% were gained on the validation fold of MITDB.
Collapse
|
|
11 |
92 |
17
|
Kocak B, Yardimci AH, Bektas CT, Turkcanoglu MH, Erdim C, Yucetas U, Koca SB, Kilickesmez O. Textural differences between renal cell carcinoma subtypes: Machine learning-based quantitative computed tomography texture analysis with independent external validation. Eur J Radiol 2018; 107:149-157. [PMID: 30292260 DOI: 10.1016/j.ejrad.2018.08.014] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Revised: 08/04/2018] [Accepted: 08/13/2018] [Indexed: 12/01/2022]
Abstract
OBJECTIVE To develop externally validated, reproducible, and generalizable models for distinguishing three major subtypes of renal cell carcinomas (RCCs) using machine learning-based quantitative computed tomography (CT) texture analysis (qCT-TA). MATERIALS AND METHODS Sixty-eight RCCs were included in this retrospective study for model development and internal validation. Another 26 RCCs were included from public databases (The Cancer Genome Atlas-TCGA) for independent external validation. Following image preparation steps (reconstruction, resampling, normalization, and discretization), 275 texture features were extracted from unenhanced and corticomedullary phase CT images. Feature selection was firstly done with reproducibility analysis by three radiologists, and; then, with a wrapper-based classifier-specific algorithm. A nested cross-validation was performed for feature selection and model optimization. Base classifiers were the artificial neural network (ANN) and support vector machine (SVM). Base classifiers were also combined with three additional algorithms to improve generalizability performance. Classifications were done with the following groups: (i), non-clear cell RCC (non-cc-RCC) versus clear cell RCC (cc-RCC) and (ii), cc-RCC versus papillary cell RCC (pc-RCC) versus chromophobe cell RCC (chc-RCC). Main performance metric for comparisons was the Matthews correlation coefficient (MCC). RESULTS Number of the reproducible features is smaller for the unenhanced images (93 out of 275) compared to the corticomedullary phase images (232 out of 275). Overall performance metrics of the machine learning-based qCT-TA derived from corticomedullary phase images were better than those of unenhanced images. Using corticomedullary phase images, ANN with adaptive boosting algorithm performed best for discrimination of non-cc-RCCs from cc-RCCs (MCC = 0.728) with an external validation accuracy, sensitivity, and specificity of 84.6%, 69.2%, and 100%, respectively. On the other hand, the performance of the machine learning-based qCT-TA is rather poor for distinguishing three major subtypes. The SVM with bagging algorithm performed best for discrimination of pc-RCC from other RCC subtypes (MCC = 0.804) with an external validation accuracy, sensitivity, and specificity of 69.2%, 71.4%, and 100%, respectively. CONCLUSIONS Machine learning-based qCT-TA can distinguish non-cc-RCCs from cc-RCCs with a satisfying performance. On the other hand, the performance of the method for distinguishing three major subtypes is rather poor. Corticomedullary phase CT images provide much more valuable texture parameters than unenhanced images.
Collapse
|
Journal Article |
7 |
91 |
18
|
Automated identification of normal and diabetes heart rate signals using nonlinear measures. Comput Biol Med 2013; 43:1523-9. [PMID: 24034744 DOI: 10.1016/j.compbiomed.2013.05.024] [Citation(s) in RCA: 88] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Revised: 05/28/2013] [Accepted: 05/30/2013] [Indexed: 11/22/2022]
Abstract
Diabetes mellitus (DM) affects considerable number of people in the world and the number of cases is increasing every year. Due to a strong link to the genetic basis of the disease, it is extremely difficult to cure. However, it can be controlled to prevent severe consequences, such as organ damage. Therefore, diabetes diagnosis and monitoring of its treatment is very important. In this paper, we have proposed a non-invasive diagnosis support system for DM. The system determines whether or not diabetes is present by determining the cardiac health of a patient using heart rate variability (HRV) analysis. This analysis was based on nine nonlinear features namely: Approximate Entropy (ApEn), largest Lyapunov exponet (LLE), detrended fluctuation analysis (DFA) and recurrence quantification analysis (RQA). Clinically significant measures were used as input to classification algorithms, namely AdaBoost, decision tree (DT), fuzzy Sugeno classifier (FSC), k-nearest neighbor algorithm (k-NN), probabilistic neural network (PNN) and support vector machine (SVM). Ten-fold stratified cross-validation was used to select the best classifier. AdaBoost, with least squares (LS) as weak learner, performed better than the other classifiers, yielding an average accuracy of 90%, sensitivity of 92.5% and specificity of 88.7%.
Collapse
|
Journal Article |
12 |
88 |
19
|
Le NQK, Yapp EKY, Ho QT, Nagasundaram N, Ou YY, Yeh HY. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou's 5-step rule and word embedding. Anal Biochem 2019; 571:53-61. [PMID: 30822398 DOI: 10.1016/j.ab.2019.02.017] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Revised: 02/17/2019] [Accepted: 02/19/2019] [Indexed: 12/22/2022]
Abstract
An enhancer is a short (50-1500bp) region of DNA that plays an important role in gene expression and the production of RNA and proteins. Genetic variation in enhancers has been linked to many human diseases, such as cancer, disorder or inflammatory bowel disease. Due to the importance of enhancers in genomics, the classification of enhancers has become a popular area of research in computational biology. Despite the few computational tools employed to address this problem, their resulting performance still requires improvements. In this study, we treat enhancers by the word embeddings, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to classify them. We present iEnhancer-5Step, a web server containing two-layer classifiers to identify enhancers and their strength. We are able to attain an independent test accuracy of 79% and 63.5% in the two layers, respectively. Compared to current predictors on the same dataset, our proposed method is able to yield superior performance as compared to the other methods. Moreover, this study provides a basis for further research that can enrich the field of applying natural language processing techniques in biological sequences. iEnhancer-5Step is freely accessible via http://biologydeep.com/fastenc/.
Collapse
|
Research Support, Non-U.S. Gov't |
6 |
88 |
20
|
Basith S, Manavalan B, Shin TH, Lee G. iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Comput Struct Biotechnol J 2018; 16:412-420. [PMID: 30425802 PMCID: PMC6222285 DOI: 10.1016/j.csbj.2018.10.007] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 10/04/2018] [Accepted: 10/12/2018] [Indexed: 11/27/2022] Open
Abstract
A soluble carrier growth hormone binding protein (GHBP) that can selectively and non-covalently interact with growth hormone, thereby acting as a modulator or inhibitor of growth hormone signalling. Accurate identification of the GHBP from a given protein sequence also provides important clues for understanding cell growth and cellular mechanisms. In the postgenomic era, there has been an abundance of protein sequence data garnered, hence it is crucial to develop an automated computational method which enables fast and accurate identification of putative GHBPs within a vast number of candidate proteins. In this study, we describe a novel machine-learning-based predictor called iGHBP for the identification of GHBP. In order to predict GHBP from a given protein sequence, we trained an extremely randomised tree with an optimal feature set that was obtained from a combination of dipeptide composition and amino acid index values by applying a two-step feature selection protocol. During cross-validation analysis, iGHBP achieved an accuracy of 84.9%, which was ~7% higher than the control extremely randomised tree predictor trained with all features, thus demonstrating the effectiveness of our feature selection protocol. Furthermore, when objectively evaluated on an independent data set, our proposed iGHBP method displayed superior performance compared to the existing method. Additionally, a user-friendly web server that implements the proposed iGHBP has been established and is available at http://thegleelab.org/iGHBP.
Collapse
|
research-article |
7 |
87 |
21
|
Sun H, Nguyen K, Kerns E, Yan Z, Yu KR, Shah P, Jadhav A, Xu X. Highly predictive and interpretable models for PAMPA permeability. Bioorg Med Chem 2016; 25:1266-1276. [PMID: 28082071 DOI: 10.1016/j.bmc.2016.12.049] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 12/22/2016] [Accepted: 12/27/2016] [Indexed: 11/28/2022]
Abstract
Cell membrane permeability is an important determinant for oral absorption and bioavailability of a drug molecule. An in silico model predicting drug permeability is described, which is built based on a large permeability dataset of 7488 compound entries or 5435 structurally unique molecules measured by the same lab using parallel artificial membrane permeability assay (PAMPA). On the basis of customized molecular descriptors, the support vector regression (SVR) model trained with 4071 compounds with quantitative data is able to predict the remaining 1364 compounds with the qualitative data with an area under the curve of receiver operating characteristic (AUC-ROC) of 0.90. The support vector classification (SVC) model trained with half of the whole dataset comprised of both the quantitative and the qualitative data produced accurate predictions to the remaining data with the AUC-ROC of 0.88. The results suggest that the developed SVR model is highly predictive and provides medicinal chemists a useful in silico tool to facilitate design and synthesis of novel compounds with optimal drug-like properties, and thus accelerate the lead optimization in drug discovery.
Collapse
|
Journal Article |
9 |
85 |
22
|
Phinyomark A, Osis ST, Hettinga BA, Kobsar D, Ferber R. Gender differences in gait kinematics for patients with knee osteoarthritis. BMC Musculoskelet Disord 2016; 17:157. [PMID: 27072641 PMCID: PMC4830067 DOI: 10.1186/s12891-016-1013-z] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 04/07/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Females have a two-fold risk of developing knee osteoarthritis (OA) as compared to their male counterparts and atypical walking gait biomechanics are also considered a factor in the aetiology of knee OA. However, few studies have investigated sex-related differences in walking mechanics for patients with knee OA and of those, conflicting results have been reported. Therefore, this study was designed to examine the differences in gait kinematics (1) between male and female subjects with and without knee OA and (2) between healthy gender-matched subjects as compared with their OA counterparts. METHODS One hundred subjects with knee OA (45 males and 55 females) and 43 healthy subjects (18 males and 25 females) participated in this study. Three-dimensional kinematic data were collected during treadmill-walking and analysed using (1) a traditional approach based on discrete variables and (2) a machine learning approach based on principal component analysis (PCA) and support vector machine (SVM) using waveform data. RESULTS OA and healthy females exhibited significantly greater knee abduction and hip adduction angles compared to their male counterparts. No significant differences were found in any discrete gait kinematic variable between OA and healthy subjects in either the male or female group. Using PCA and SVM approaches, classification accuracies of 98-100% were found between gender groups as well as between OA groups. CONCLUSIONS These results suggest that care should be taken to account for gender when investigating the biomechanical aetiology of knee OA and that gender-specific analysis and rehabilitation protocols should be developed.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
85 |
23
|
Premaladha J, Ravichandran KS. Novel Approaches for Diagnosing Melanoma Skin Lesions Through Supervised and Deep Learning Algorithms. J Med Syst 2016; 40:96. [PMID: 26872778 DOI: 10.1007/s10916-016-0460-2] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 02/01/2016] [Indexed: 11/29/2022]
Abstract
Dermoscopy is a technique used to capture the images of skin, and these images are useful to analyze the different types of skin diseases. Malignant melanoma is a kind of skin cancer whose severity even leads to death. Earlier detection of melanoma prevents death and the clinicians can treat the patients to increase the chances of survival. Only few machine learning algorithms are developed to detect the melanoma using its features. This paper proposes a Computer Aided Diagnosis (CAD) system which equips efficient algorithms to classify and predict the melanoma. Enhancement of the images are done using Contrast Limited Adaptive Histogram Equalization technique (CLAHE) and median filter. A new segmentation algorithm called Normalized Otsu's Segmentation (NOS) is implemented to segment the affected skin lesion from the normal skin, which overcomes the problem of variable illumination. Fifteen features are derived and extracted from the segmented images are fed into the proposed classification techniques like Deep Learning based Neural Networks and Hybrid Adaboost-Support Vector Machine (SVM) algorithms. The proposed system is tested and validated with nearly 992 images (malignant & benign lesions) and it provides a high classification accuracy of 93 %. The proposed CAD system can assist the dermatologists to confirm the decision of the diagnosis and to avoid excisional biopsies.
Collapse
|
|
9 |
83 |
24
|
Guo H, Jeong K, Lim J, Jo J, Kim YM, Park JP, Kim JH, Cho KH. Prediction of effluent concentration in a wastewater treatment plant using machine learning models. J Environ Sci (China) 2015; 32:90-101. [PMID: 26040735 DOI: 10.1016/j.jes.2015.01.007] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 12/11/2014] [Accepted: 01/22/2015] [Indexed: 06/04/2023]
Abstract
Of growing amount of food waste, the integrated food waste and waste water treatment was regarded as one of the efficient modeling method. However, the load of food waste to the conventional waste treatment process might lead to the high concentration of total nitrogen (T-N) impact on the effluent water quality. The objective of this study is to establish two machine learning models-artificial neural networks (ANNs) and support vector machines (SVMs), in order to predict 1-day interval T-N concentration of effluent from a wastewater treatment plant in Ulsan, Korea. Daily water quality data and meteorological data were used and the performance of both models was evaluated in terms of the coefficient of determination (R2), Nash-Sutcliff efficiency (NSE), relative efficiency criteria (drel). Additionally, Latin-Hypercube one-factor-at-a-time (LH-OAT) and a pattern search algorithm were applied to sensitivity analysis and model parameter optimization, respectively. Results showed that both models could be effectively applied to the 1-day interval prediction of T-N concentration of effluent. SVM model showed a higher prediction accuracy in the training stage and similar result in the validation stage. However, the sensitivity analysis demonstrated that the ANN model was a superior model for 1-day interval T-N concentration prediction in terms of the cause-and-effect relationship between T-N concentration and modeling input values to integrated food waste and waste water treatment. This study suggested the efficient and robust nonlinear time-series modeling method for an early prediction of the water quality of integrated food waste and waste water treatment process.
Collapse
|
|
10 |
83 |
25
|
The impact of machine learning techniques in the study of bipolar disorder: A systematic review. Neurosci Biobehav Rev 2017; 80:538-554. [PMID: 28728937 DOI: 10.1016/j.neubiorev.2017.07.004] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 06/15/2017] [Accepted: 07/08/2017] [Indexed: 01/10/2023]
Abstract
Machine learning techniques provide new methods to predict diagnosis and clinical outcomes at an individual level. We aim to review the existing literature on the use of machine learning techniques in the assessment of subjects with bipolar disorder. We systematically searched PubMed, Embase and Web of Science for articles published in any language up to January 2017. We found 757 abstracts and included 51 studies in our review. Most of the included studies used multiple levels of biological data to distinguish the diagnosis of bipolar disorder from other psychiatric disorders or healthy controls. We also found studies that assessed the prediction of clinical outcomes and studies using unsupervised machine learning to build more consistent clinical phenotypes of bipolar disorder. We concluded that given the clinical heterogeneity of samples of patients with BD, machine learning techniques may provide clinicians and researchers with important insights in fields such as diagnosis, personalized treatment and prognosis orientation.
Collapse
|
Systematic Review |
8 |
81 |