101
|
Nath A, Leier A. Improved cytokine-receptor interaction prediction by exploiting the negative sample space. BMC Bioinformatics 2020; 21:493. [PMID: 33129275 PMCID: PMC7603689 DOI: 10.1186/s12859-020-03835-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 10/23/2020] [Indexed: 01/19/2023] Open
Abstract
Background Cytokines act by binding to specific receptors in the plasma membrane of target cells. Knowledge of cytokine–receptor interaction (CRI) is very important for understanding the pathogenesis of various human diseases—notably autoimmune, inflammatory and infectious diseases—and identifying potential therapeutic targets. Recently, machine learning algorithms have been used to predict CRIs. “Gold Standard” negative datasets are still lacking and strong biases in negative datasets can significantly affect the training of learning algorithms and their evaluation. To mitigate the unrepresentativeness and bias inherent in the negative sample selection (non-interacting proteins), we propose a clustering-based approach for representative negative sample selection. Results We used deep autoencoders to investigate the effect of different sampling approaches for non-interacting pairs on the training and the performance of machine learning classifiers. By using the anomaly detection capabilities of deep autoencoders we deduced the effects of different categories of negative samples on the training of learning algorithms. Random sampling for selecting non-interacting pairs results in either over- or under-representation of hard or easy to classify instances. When K-means based sampling of negative datasets is applied to mitigate the inadequacies of random sampling, random forest (RF) together with the combined feature set of atomic composition, physicochemical-2grams and two different representations of evolutionary information performs best. Average model performances based on leave-one-out cross validation (loocv) over ten different negative sample sets that each model was trained with, show that RF models significantly outperform the previous best CRI predictor in terms of accuracy (+ 5.1%), specificity (+ 13%), mcc (+ 0.1) and g-means value (+ 5.1). Evaluations using tenfold cv and training/testing splits confirm the competitive performance. Conclusions A comparative analysis was performed to assess the effect of three different sampling methods (random, K-means and uniform sampling) on the training of learning algorithms using different evaluation methods. Models trained on K-means sampled datasets generally show a significantly improved performance compared to those trained on random selections—with RF seemingly benefiting most in our particular setting. Our findings on the sampling are highly relevant and apply to many applications of supervised learning approaches in bioinformatics.
Collapse
Affiliation(s)
- Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India.
| | - André Leier
- Department of Genetics, Department of Cell Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA.
| |
Collapse
|
102
|
Braiki M, Benzinou A, Nasreddine K, Hymery N. Automatic Human Dendritic Cells Segmentation Using K-Means Clustering and Chan-Vese Active Contour Model. Comput Methods Programs Biomed 2020; 195:105520. [PMID: 32497772 DOI: 10.1016/j.cmpb.2020.105520] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 03/09/2020] [Accepted: 04/23/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND AND OBJECTIVE Nowadays, the number of pathologies related to food are multiplied. Mycotoxins are one of the most severe food contaminants that cause serious effects on the human health. Therefore, it is necessary to develop an assessment tool for evaluating their impact on the immune response. Recently, a new investigational method using human dendritic cells was endorsed by biologists. Nevertheless, analysis of the morphological features and the behavior of these cells remains merely visual. In addition, this manual analysis is difficult and time-consuming. Here, we focus mainly on automating the evaluation process by using advanced image processing technology. METHODS An automatic segmentation approach of microscopic dendritic cell images is developed to provide a fast and objective evaluation. First, a combination of K-means clustering and mathematical morphology is used to detect dendritic cells. Second, a region-based Chan-Vese active contour model is used to segment the detected cells more precisely. Finally, dendritic cells are extracted by a filtering based on eccentricity measure. RESULTS The proposed scheme is tested on an actual dataset containing 421 microscopic dendritic cell images. The experimental results show high conformity between the results of the proposed scheme and ground-truth elaborated by biological expert. Moreover, a comparative study with other state-of-art segmentation schemes demonstrates the efficiency of the proposed method. It gives the highest average accuracy rate (99.42 %) compared to recent studied approaches. CONCLUSIONS The proposed image segmentation method for morphological analysis of dendrite inhibition can consistently be used as an assessment tool for biologists to facilitate the evaluation of serious health impacts of mycotoxins.
Collapse
Affiliation(s)
- Marwa Braiki
- ENIB, UMR CNRS 6285 LabSTICC, 29238, Brest, France; UTM, ISTMT, LR13ES07 (LRBTM), 1006, Tunis, Tunisie
| | | | | | | |
Collapse
|
103
|
Utomo D, Hsiung PA. A Multitiered Solution for Anomaly Detection in Edge Computing for Smart Meters. Sensors (Basel) 2020; 20:s20185159. [PMID: 32927672 PMCID: PMC7571075 DOI: 10.3390/s20185159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 08/17/2020] [Accepted: 08/25/2020] [Indexed: 06/11/2023]
Abstract
In systems connected to smart grids, smart meters with fast and efficient responses are very helpful in detecting anomalies in realtime. However, sending data with a frequency of a minute or less is not normal with today's technology because of the bottleneck of the communication network and storage media. Because mitigation cannot be done in realtime, we propose prediction techniques using Deep Neural Network (DNN), Support Vector Regression (SVR), and k-Nearest Neighbors (KNN). In addition to these techniques, the prediction timestep is chosen per day and wrapped in sliding windows, and clustering using Kmeans and intersection Kmeans and HDBSCAN is also evaluated. The predictive ability applied here is to predict whether anomalies in electricity usage will occur in the next few weeks. The aim is to give the user time to check their usage and from the utility side, whether it is necessary to prepare a sufficient supply. We also propose the latency reduction to counter higher latency as in the traditional centralized system by adding layer Edge Meter Data Management System (MDMS) and Cloud-MDMS as the inference and training model. Based on the experiments when running in the Raspberry Pi, the best solution is choosing DNN that has the shortest latency 1.25 ms, 159 kB persistent file size, and at 128 timesteps.
Collapse
Affiliation(s)
- Darmawan Utomo
- Computer Science and Information Engineering, National Chung Cheng University, No. 168, Sec. 1, University Rd., Minhsiung, Chiayi 62102, Taiwan;
- Faculty of Electronics and Computer Engineering, Satya Wacana Christian University, Jalan Diponegoro 52-60, Salatiga 50711, Indonesia
| | - Pao-Ann Hsiung
- Computer Science and Information Engineering, National Chung Cheng University, No. 168, Sec. 1, University Rd., Minhsiung, Chiayi 62102, Taiwan;
| |
Collapse
|
104
|
Soltani AA, Bermad A, Boutaghane H, Oukil A, Abdalla O, Hasbaia M, Oulebsir R, Zeroual S, Lefkir A. An integrated approach for assessing surface water quality: Case of Beni Haroun dam (Northeast Algeria). Environ Monit Assess 2020; 192:630. [PMID: 32902799 DOI: 10.1007/s10661-020-08572-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/24/2020] [Indexed: 06/11/2023]
Abstract
In this paper, we use an integrated approach to carry out a comprehensive evaluation of water quality in the Beni Haroun (BH) dam, the largest surface water resource in Algeria. Several techniques have been employed under the same framework, including the Canadian Council Ministers Environment Water Quality Index (CCME-WQI), principal component analysis and factor analysis (PCA/FA), the K-means clustering, and the ordinary least square (OLS) analysis. A data set of 22 physicochemical parameters has been collected, over a period of 11 years, from three sampling stations: Ain Smara (ST1) and Menia (ST2), both located upstream of "Wadi Rhumel," and BH dam station (ST3), located at the dam site. The PCA/FA enables the identification of seven key factors that influence significantly BH dam water quality. The average values of CCME indices at the BH dam were 17, 40, 42, and 32 for drinking, irrigation, industry, and aquatic life purposes, respectively, which indicate poor water quality, according to the CCME categorization scheme. Besides, the K-means algorithm has been proven to be a very useful machine learning tool to detect that the major source of BH dam pollution is "Wadi Rhumel." Finally, OLS analysis, along with the Mann-Kendall test, highlighted the positive trend of BH dam's water quality.
Collapse
Affiliation(s)
- Ahmed Amin Soltani
- VESDD Laboratory, Hydraulic Department, University of M'sila, P.O. Box 166, 28000, Ichebilia, M'sila, Algeria
| | - Abdelmalek Bermad
- Hydraulics Department, Ecole Nationale Polytechnique d'Alger, Algiers, Algeria
| | - Hamouda Boutaghane
- Hydraulics Department, Engineering Faculty, Badji Mokhtar University, Annaba, Algeria
| | - Amar Oukil
- Department of Operations Management & Business Statistics, College of Economics & Political Science, Sultan Qaboos University, P.O. Box 20, PC 123, Muscat, Al Khoud, Oman.
| | - Osman Abdalla
- Water Research Center, Department of Earth Sciences, College of Science, Sultan Qaboos University, P.O. Box 36, PC 123, Muscat, Al Khoud, Oman
| | - Mahmoud Hasbaia
- VESDD Laboratory, Hydraulic Department, University of M'sila, P.O. Box 166, 28000, Ichebilia, M'sila, Algeria
| | - Rafik Oulebsir
- Université des Sciences et de la Technologie Houari Boumediene, Algiers, Algeria
| | - Sara Zeroual
- VESDD Laboratory, Hydraulic Department, University of M'sila, P.O. Box 166, 28000, Ichebilia, M'sila, Algeria
| | | |
Collapse
|
105
|
Jiang J, Chen Q, Xue J, Wang H, Chen Z. A Novel Method about the Representation and Discrimination of Traffic State. Sensors (Basel) 2020; 20:s20185039. [PMID: 32899826 PMCID: PMC7570472 DOI: 10.3390/s20185039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 08/31/2020] [Accepted: 09/03/2020] [Indexed: 06/11/2023]
Abstract
The representation and discrimination of various traffic states play an essential role in solving traffic accidents and congestion as the foundation of traffic state prediction. However, the existing representation of the traffic state usually only considers the road congestion layer and divides the traffic state into congested and unblocked. Representation only at the congestion layer is difficult to reflect the road traffic state comprehensively. Therefore, we select three indicators from the layers of road congestion, road safety, and road stability, respectively, then utilizing K-means to cluster the traffic state. The clustering results can be regarded as a new type for the representation of a traffic state. As a result, the traffic states are divided into four classes, which comprehensively reflects the level of road congestion, safety, and stability. Using the four traffic states obtained from the clustering results as class labels, we applied a multi-layer perceptron (MLP) to classify the different traffic states, and the receiver operating characteristic (ROC) curve is assessed to verify the superiority of the classification results. Finally, a visual display of the real-time traffic state in a city's central area was given.
Collapse
Affiliation(s)
- Junfeng Jiang
- College of artificial intelligence, Wuhan Technology and Business University, Wuhan 430073, China;
| | - Qiushi Chen
- Intelligent Transportation Systems Center (ITSC), Wuhan University of Technology, Wuhan 430000, China; (Q.C.); (H.W.); (Z.C.)
| | - Jie Xue
- Faculty of Technology, Policy and Management, Safety and Security Science Group (S3G), Delft University of Technology, 2628BX Delft, The Netherlands
| | - Haobo Wang
- Intelligent Transportation Systems Center (ITSC), Wuhan University of Technology, Wuhan 430000, China; (Q.C.); (H.W.); (Z.C.)
| | - Zhijun Chen
- Intelligent Transportation Systems Center (ITSC), Wuhan University of Technology, Wuhan 430000, China; (Q.C.); (H.W.); (Z.C.)
| |
Collapse
|
106
|
Jung SH, Lee H, Huh JH. A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost. Entropy (Basel) 2020; 22:E902. [PMID: 33286671 DOI: 10.3390/e22080902] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 07/27/2020] [Accepted: 08/11/2020] [Indexed: 11/17/2022]
Abstract
Today, semi-structured and unstructured data are mainly collected and analyzed for data analysis applicable to various systems. Such data have a dense distribution of space and usually contain outliers and noise data. There have been ongoing research studies on clustering algorithms to classify such data (outliers and noise data). The K-means algorithm is one of the most investigated clustering algorithms. Researchers have pointed out a couple of problems such as processing clustering for the number of clusters, K, by an analyst through his or her random choices, producing biased results in data classification through the connection of nodes in dense data, and higher implementation costs and lower accuracy according to the selection models of the initial centroids. Most K-means researchers have pointed out the disadvantage of outliers belonging to external or other clusters instead of the concerned ones when K is big or small. Thus, the present study analyzed problems with the selection of initial centroids in the existing K-means algorithm and investigated a new K-means algorithm of selecting initial centroids. The present study proposed a method of cutting down clustering calculation costs by applying an initial center point approach based on space division and outliers so that no objects would be subordinate to the initial cluster center for dependence lower from the initial cluster center. Since data containing outliers could lead to inappropriate results when they are reflected in the choice of a center point of a cluster, the study proposed an algorithm to minimize the error rates of outliers based on an improved algorithm for space division and distance measurement. The performance experiment results of the proposed algorithm show that it lowered the execution costs by about 13-14% compared with those of previous studies when there was an increase in the volume of clustering data or the number of clusters. It also recorded a lower frequency of outliers, a lower effectiveness index, which assesses performance deterioration with outliers, and a reduction of outliers by about 60%.
Collapse
|
107
|
Zhang XK, Lan YB, Huang Y, Zhao X, Duan CQ. Targeted metabolomics of anthocyanin derivatives during prolonged wine aging: Evolution, color contribution and aging prediction. Food Chem 2020; 339:127795. [PMID: 32836023 DOI: 10.1016/j.foodchem.2020.127795] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/29/2020] [Accepted: 08/05/2020] [Indexed: 02/02/2023]
Abstract
Anthocyanin derivatives and chromatic characteristics of 234 different-vintage red wine were investigated based on a targeted HPLC-MS/MS and CIELAB approach. The K-means cluster analysis showed that the evolution pattern varies amongst anthocyanin derivative classes. Their stabilities are: pinotins > flavanyl-pyranoanthocyanins, vitisin A > monomeric anthocyanin, direct anthocyanin-flavan-3-ols condensation products > vitisin B, anthocyanin ethyl-linked flavan-3-ols products. The proportion of most pyranoanthocyanins becomes more significant among all detected anthocyanin derivatives during wine aging, whereas flavanols-related anthocyanin derivatives (except for flavanyl-pyranoanthocyanins) decreased drastically. PLSR showed that aging tawny characteristics is related to pyranoanthocyanins except for vitisin B, especially pinotins, whereas monomeric anthocyanins and flavanol-related derivates (except for flavanyl-pyranoanthocyanins) contribute to red violet color. But aging color density is more associated with the content of vitisin A and flavanyl-pyranoanthocyanins. Two predictive models based on random forest and support vector machine modeling showed good performance in predicting the extent of wine aging.
Collapse
Affiliation(s)
- Xin-Ke Zhang
- Center for Viticulture and Enology, College of Food Science & Nutritional Engineering, China Agricultural University, Beijing 100083, China; Key Laboratory of Viticulture and Enology, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
| | - Yi-Bin Lan
- Center for Viticulture and Enology, College of Food Science & Nutritional Engineering, China Agricultural University, Beijing 100083, China; Key Laboratory of Viticulture and Enology, Ministry of Agriculture and Rural Affairs, Beijing 100083, China; Cool Climate Oenology and Viticulture Institute (CCOVI), Brock University, St. Catharines, Ontario L2S 3A1, Canada
| | - Yue Huang
- Center for Viticulture and Enology, College of Food Science & Nutritional Engineering, China Agricultural University, Beijing 100083, China; Key Laboratory of Viticulture and Enology, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
| | - Xu Zhao
- Center for Viticulture and Enology, College of Food Science & Nutritional Engineering, China Agricultural University, Beijing 100083, China; Key Laboratory of Viticulture and Enology, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
| | - Chang-Qing Duan
- Center for Viticulture and Enology, College of Food Science & Nutritional Engineering, China Agricultural University, Beijing 100083, China; Key Laboratory of Viticulture and Enology, Ministry of Agriculture and Rural Affairs, Beijing 100083, China.
| |
Collapse
|
108
|
Xu X, Li H, Yin F, Xi L, Qiao H, Ma Z, Shen S, Jiang B, Ma X. Wheat ear counting using K-means clustering segmentation and convolutional neural network. Plant Methods 2020; 16:106. [PMID: 32782453 PMCID: PMC7412807 DOI: 10.1186/s13007-020-00648-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 07/29/2020] [Indexed: 05/17/2023]
Abstract
BACKGROUND Wheat yield is influenced by the number of ears per unit area, and manual counting has traditionally been used to estimate wheat yield. To realize rapid and accurate wheat ear counting, K-means clustering was used for the automatic segmentation of wheat ear images captured by hand-held devices. The segmented data set was constructed by creating four categories of image labels: non-wheat ear, one wheat ear, two wheat ears, and three wheat ears, which was then was sent into the convolution neural network (CNN) model for training and testing to reduce the complexity of the model. RESULTS The recognition accuracy of non-wheat, one wheat, two wheat ears, and three wheat ears were 99.8, 97.5, 98.07, and 98.5%, respectively. The model R 2 reached 0.96, the root mean square error (RMSE) was 10.84 ears, the macro F1-score and micro F1-score both achieved 98.47%, and the best performance was observed during late grain-filling stage (R 2 = 0.99, RMSE = 3.24 ears). The model could also be applied to the UAV platform (R 2 = 0.97, RMSE = 9.47 ears). CONCLUSIONS The classification of segmented images as opposed to target recognition not only reduces the workload of manual annotation but also improves significantly the efficiency and accuracy of wheat ear counting, thus meeting the requirements of wheat yield estimation in the field environment.
Collapse
Affiliation(s)
- Xin Xu
- Henan Agricultural University, Zhengzhou, 450002 China
- Henan Grain Crops Collaborative Innovation Center, Zhengzhou, 450002 China
| | - Haiyang Li
- Henan Agricultural University, Zhengzhou, 450002 China
| | - Fei Yin
- Henan Agricultural University, Zhengzhou, 450002 China
- Henan Grain Crops Collaborative Innovation Center, Zhengzhou, 450002 China
| | - Lei Xi
- Henan Agricultural University, Zhengzhou, 450002 China
- Henan Grain Crops Collaborative Innovation Center, Zhengzhou, 450002 China
| | - Hongbo Qiao
- Henan Agricultural University, Zhengzhou, 450002 China
| | - Zhaowu Ma
- Henan Agricultural University, Zhengzhou, 450002 China
| | - Shuaijie Shen
- Henan Agricultural University, Zhengzhou, 450002 China
| | - Binchao Jiang
- Henan Agricultural University, Zhengzhou, 450002 China
| | - Xinming Ma
- Henan Agricultural University, Zhengzhou, 450002 China
- Henan Grain Crops Collaborative Innovation Center, Zhengzhou, 450002 China
| |
Collapse
|
109
|
Hao Z, Duan Y, Dang X, Liu Y, Zhang D. Wi-SL: Contactless Fine-Grained Gesture Recognition Uses Channel State Information. Sensors (Basel) 2020; 20:E4025. [PMID: 32698482 DOI: 10.3390/s20144025] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 07/14/2020] [Accepted: 07/17/2020] [Indexed: 11/23/2022]
Abstract
In recent years, with the development of wireless sensing technology and the widespread popularity of WiFi devices, human perception based on WiFi has become possible, and gesture recognition has become an active topic in the field of human-computer interaction. As a kind of gesture, sign language is widely used in life. The establishment of an effective sign language recognition system can help people with aphasia and hearing impairment to better interact with the computer and facilitate their daily life. For this reason, this paper proposes a contactless fine-grained gesture recognition method using Channel State Information (CSI), namely Wi-SL. This method uses a commercial WiFi device to establish the correlation mapping between the amplitude and phase difference information of the subcarrier level in the wireless signal and the sign language action, without requiring the user to wear any device. We combine an efficient denoising method to filter environmental interference with an effective selection of optimal subcarriers to reduce the computational cost of the system. We also use K-means combined with a Bagging algorithm to optimize the Support Vector Machine (SVM) classification (KSB) model to enhance the classification of sign language action data. We implemented the algorithms and evaluated them for three different scenarios. The experimental results show that the average accuracy of Wi-SL gesture recognition can reach 95.8%, which realizes device-free, non-invasive, high-precision sign language gesture recognition.
Collapse
|
110
|
Reuter C, Bellettiere J, Liles S, Di C, Sears DD, LaMonte MJ, Stefanick ML, LaCroix AZ, Natarajan L. Diurnal patterns of sedentary behavior and changes in physical function over time among older women: a prospective cohort study. Int J Behav Nutr Phys Act 2020; 17:88. [PMID: 32646435 PMCID: PMC7346671 DOI: 10.1186/s12966-020-00992-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 06/29/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Sedentary behavior (SB) is linked to negative health outcomes in older adults. Most studies use summary values, e.g., total sedentary minutes/day. Diurnal timing of SB accumulation may further elucidate SB-health associations. METHODS Six thousand two hundred four US women (mean age = 79 ± 7; 50% White, 34% African-American) wore accelerometers for 7-days at baseline, yielding 41,356 person-days with > 600 min/day of data. Annual follow-up assessments of health, including physical functioning, were collected from participants for 6 years. A novel two-phase clustering procedure discriminated participants' diurnal SB patterns: phase I grouped day-level SB trajectories using longitudinal k-means; phase II determined diurnal SB patterns based on proportion of phase I trajectories using hierarchical clustering. Mixed models tested associations between SB patterns and longitudinal physical functioning, adjusted for covariates including total sedentary time. Effect modification by moderate-vigorous-physical activity (MVPA) was tested. RESULTS Four diurnal SB patterns were identified: p1 = high-SB-throughout-the-day; p2 = moderate-SB-with-lower-morning-SB; p3 = moderate-SB-with-higher-morning-SB; p4 = low-SB-throughout-the-day. High MVPA mitigated physical functioning decline and correlated with better baseline and 6-year trajectory of physical functioning across patterns. In low MVPA, p2 had worse 6-year physical functioning decline compared to p1 and p4. In high MVPA, p2 had similar 6-year physical functioning decline compared to p1, p3, and p4. CONCLUSIONS In a large cohort of older women, diurnal SB patterns were associated with rates of physical functioning decline, independent of total sedentary time. In particular, we identified a specific diurnal SB subtype defined by less SB earlier and more SB later in the day, which had the steepest decline in physical functioning among participants with low baseline MVPA. Thus, diurnal timing of SB, complementary to total sedentary time and MVPA, may offer additional insights into associations between SB and physical health, and provide physicians with early warning of patients at high-risk of physical function decline.
Collapse
Affiliation(s)
- Chase Reuter
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California 92093 USA
| | - John Bellettiere
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California 92093 USA
- Center for Behavioral Epidemiology and Community Health (CBEACH), San Diego State University, San Diego, CA 92123 USA
| | - Sandy Liles
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California 92093 USA
- Center for Behavioral Epidemiology and Community Health (CBEACH), San Diego State University, San Diego, CA 92123 USA
| | - Chongzhi Di
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109 USA
| | - Dorothy D. Sears
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California 92093 USA
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004 USA
- Moores Cancer Center, University of California San Diego, 3855 Health Sciences Dr, La Jolla, CA 92037 USA
| | - Michael J. LaMonte
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo–SUNY, New York, NY 14214 USA
| | - Marcia L. Stefanick
- Stanford Prevention Research Center, Stanford University School of Medicine, Stanford University, Stanford, CA 94305 USA
| | - Andrea Z. LaCroix
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California 92093 USA
| | - Loki Natarajan
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California 92093 USA
- Moores Cancer Center, University of California San Diego, 3855 Health Sciences Dr, La Jolla, CA 92037 USA
| |
Collapse
|
111
|
Shi Z, Rundle A, Genkinger JM, Cheung YK, Ergas IJ, Roh JM, Kushi LH, Kwan ML, Greenlee H. Distinct trajectories of moderate to vigorous physical activity and sedentary behavior following a breast cancer diagnosis: the Pathways Study. J Cancer Surviv 2020; 14:393-403. [PMID: 32130627 PMCID: PMC7955660 DOI: 10.1007/s11764-020-00856-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 01/27/2020] [Indexed: 02/08/2023]
Abstract
PURPOSE To identify distinct trajectories of total moderate-to-vigorous physical activity (MVPA) and sedentary behavior following a breast cancer diagnosis and their correlates. METHODS The analysis examined 3000 female breast cancer survivors within Kaiser Permanente Northern California between 2006 and 2013. Self-reported time spent on total MVPA and sedentary behaviors were assessed at baseline (mean = 1.8 months post-diagnosis) and at 6 and 24 months follow up. Trajectory groups were identified using group-based trajectory modeling and K-means for longitudinal data analysis. Trajectory groups were named by baseline activity level (high, medium, or low) and direction of change (increaser, decreaser, or maintainer). RESULTS Trajectory analyses identified three MVPA trajectories [high decreaser (7%), medium decreaser (35%), low maintainer (58%)] and four sedentary behavior trajectories [high maintainer (18%), high decreaser (27%), low increaser (24%), and low maintainer (31%)]. Women with higher education (ORs: 1.63-4.37), income (OR: 1.37), dispositional optimism (ORs: 1.60-1.86), and social support (OR: 1.33) were more likely to be high or medium decreasers of MVPA (all P < 0.05). High maintainers and high decreasers of sedentary behavior were more likely to have higher education (OR: 1.84) and social support (ORs: 1.42-1.86), but lower income (OR: 0.66; all P < 0.05). CONCLUSIONS In the 24 months following breast cancer diagnosis, 42% of survivors decreased MVPA and 73% maintained or increased time on sedentary behavior. Socioeconomic status and stress coping at diagnosis predicted subsequent PA trajectory. IMPLICATIONS FOR CANCER SURVIVORS It is important to prioritize exercise intervention and counseling during early stage of breast cancer survivorship, especially in survivors who are at high risk of becoming physically inactive post-diagnosis.
Collapse
Affiliation(s)
- Zaixing Shi
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, China.
- Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, China.
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
- Columbia University Mailman School of Public Health, New York, NY, USA.
| | - Andrew Rundle
- Columbia University Mailman School of Public Health, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, New York, NY, USA
| | - Jeanine M Genkinger
- Columbia University Mailman School of Public Health, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, New York, NY, USA
| | - Ying Kuen Cheung
- Columbia University Mailman School of Public Health, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, New York, NY, USA
| | - Isaac J Ergas
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Janise M Roh
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Lawrence H Kushi
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Marilyn L Kwan
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Heather Greenlee
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Columbia University Mailman School of Public Health, New York, NY, USA
- Seattle Cancer Care Alliance, Seattle, WA, USA
- University of Washington, Seattle, WA, USA
| |
Collapse
|
112
|
Abstract
In cognitive diagnostic assessment (CDA), clustering analysis is an efficient approach to classify examinees into attribute-homogeneous groups. Many researchers have proposed different methods, such as the nonparametric method with Hamming distance, K-means method, and hierarchical agglomerative cluster analysis, to achieve the classification goal. In this paper, according to their responses, we introduce a spectral clustering algorithm (SCA) to cluster examinees. Simulation studies are used to compare the classification accuracy of the SCA, K-means algorithm, G-DINA model and its related reduced cognitive diagnostic models. A real data analysis is also conducted to evaluate the feasibility of the SCA. Some research directions are discussed in the final section.
Collapse
Affiliation(s)
- Lei Guo
- Faculty of Psychology, Southwest University, Chongqing, China
- Southwest University Branch, Collaborative Innovation Center of Assessment Toward Basic Education Quality, Chongqing, China
| | - Jing Yang
- School of Mathematics and Statistics, Northeast Normal University, Changchun, China
| | - Naiqing Song
- Southwest University Branch, Collaborative Innovation Center of Assessment Toward Basic Education Quality, Chongqing, China
- Basic Education Research Center, Southwest University, Chongqing, China
- Urban and Rural Education Research Center, Southwest University, Chongqing, China
| |
Collapse
|
113
|
Wu Q, Wang Y, Gao Z, Ni J, Zheng C. MSCHLMDA: Multi-Similarity Based Combinative Hypergraph Learning for Predicting MiRNA-Disease Association. Front Genet 2020; 11:354. [PMID: 32351545 PMCID: PMC7174776 DOI: 10.3389/fgene.2020.00354] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/23/2020] [Indexed: 12/17/2022] Open
Abstract
Accumulating biological and clinical evidence has confirmed the important associations between microRNAs (miRNAs) and a variety of human diseases. Predicting disease-related miRNAs is beneficial for understanding the molecular mechanisms of pathological conditions at the miRNA level, and facilitating the finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. However, the challenge for researchers is to establish methods that can effectively combine different datasets and make reliable predictions. In this work, we propose the method of Multi-Similarity based Combinative Hypergraph Learning for Predicting MiRNA-disease Association (MSCHLMDA). To establish this method, complex features were extracted by two measures for each miRNA-disease pair. Then, K-nearest neighbor (KNN) and K-means algorithm were used to construct two different hypergraphs. Finally, results from combinative hypergraph learning were used for predicting miRNA-disease association. In order to evaluate the prediction performance of our method, leave-one-out cross validation and 5-fold cross validation was implemented, showing that our method had significantly improved prediction performance compared to previously used methods. Moreover, three case studies on different human complex diseases were performed, which further demonstrated the predictive performance of MSCHLMDA. It is anticipated that MSCHLMDA would become an excellent complement to the biomedical research field in the future.
Collapse
Affiliation(s)
- Qingwen Wu
- School of Software, Qufu Normal University, Qufu, China
| | - Yutian Wang
- School of Software, Qufu Normal University, Qufu, China
| | - Zhen Gao
- School of Software, Qufu Normal University, Qufu, China
| | - Jiancheng Ni
- School of Software, Qufu Normal University, Qufu, China
| | - Chunhou Zheng
- School of Software, Qufu Normal University, Qufu, China.,School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
114
|
Lebel K, Duval C, Goubault E, Bogard S, Blanchet PJ. Can We Predict the Motor Performance of Patients With Parkinson's Disease Based on Their Symptomatology? Front Bioeng Biotechnol 2020; 8:189. [PMID: 32266228 PMCID: PMC7105871 DOI: 10.3389/fbioe.2020.00189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Accepted: 02/27/2020] [Indexed: 12/04/2022] Open
Abstract
Introduction: Parkinson's disease hinders the ability of a person to perform daily activities. However, the varying impact of specific symptoms and their interactions on a person's motor repertoire is not understood. The current study investigates the possibility to predict global motor disabilities based on the patient symptomatology and medication. Methods: A cohort of 115 patients diagnosed with Parkinson's disease (mean age = 67.0 ± 8.7 years old) participated in the study. Participants performed different tasks, including the Timed-Up & Go, eating soup and the Purdue Pegboard test. Performance on these tasks was judged using timing, number of errors committed, and count achieved. K-means method was used to cluster the overall performance and create different motor performance groups. Symptomatology was objectively assessed for each participant from a combination of wearable inertial sensors (bradykinesia, tremor, dyskinesia) and clinical assessment (rigidity, postural instability). A multinomial regression model was derived to predict the performance cluster membership based on the patients' symptomatology, socio-demographics information and medication. Results: Clustering exposed four distinct performance groups: normal behavior, slightly affected in fine motor tasks, affected only in TUG, and affected in all areas. The statistical model revealed that low to moderate level of dyskinesia increased the likelihood of being in the normal group. A rise in postural instability and rest tremor increase the chance to be affected in TUG. Finally, LEDD did not help distinguishing between groups, but the presence of Amantadine as part of the medication regimen appears to decrease the likelihood of being part of the groups affected in TUG. Conclusion: The approach allowed to demonstrate the potential of using clinical symptoms to predict the impact of Parkinson's disease on a person's mobility performance.
Collapse
Affiliation(s)
- Karina Lebel
- Département de Génie électrique et de Génie Informatique, Faculté de Génie, Université de Sherbrooke, Sherbrooke, QC, Canada.,Centre de Recherche sur le Vieillissement, Sherbrooke, QC, Canada
| | - Christian Duval
- Laboratoire de Simulation et Modélisation du Mouvement, École de Kinésiologie et des Sciences de l'activité physique, Université de Montréal, Montreal, QC, Canada.,Centre de Recherche Institut Universitaire de Gériatrie de Montréal, Montreal, QC, Canada
| | - Etienne Goubault
- Laboratoire de Simulation et Modélisation du Mouvement, École de Kinésiologie et des Sciences de l'activité physique, Université de Montréal, Montreal, QC, Canada.,Centre de Recherche Institut Universitaire de Gériatrie de Montréal, Montreal, QC, Canada
| | - Sarah Bogard
- Laboratoire de Simulation et Modélisation du Mouvement, École de Kinésiologie et des Sciences de l'activité physique, Université de Montréal, Montreal, QC, Canada.,Centre de Recherche Institut Universitaire de Gériatrie de Montréal, Montreal, QC, Canada
| | - Pierre J Blanchet
- Faculté de Médecine Dentaire, Université de Montréal, Montreal, QC, Canada.,Centre Hospitalier de l'Université de Montréal (C.H.U. Montreal), Montreal, QC, Canada
| |
Collapse
|
115
|
Jang JY, Oh HS, Lim Y, Cheung YK. Ensemble clustering for step data via binning. Biometrics 2020; 77:293-304. [PMID: 32150282 DOI: 10.1111/biom.13258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 02/26/2020] [Accepted: 02/27/2020] [Indexed: 11/29/2022]
Abstract
This paper considers the clustering problem of physical step count data recorded on wearable devices. Clustering step data give an insight into an individual's activity status and further provide the groundwork for health-related policies. However, classical methods, such as K-means clustering and hierarchical clustering, are not suitable for step count data that are typically high-dimensional and zero-inflated. This paper presents a new clustering method for step data based on a novel combination of ensemble clustering and binning. We first construct multiple sets of binned data by changing the size and starting position of the bin, and then merge the clustering results from the binned data using a voting method. The advantage of binning, as a critical component, is that it substantially reduces the dimension of the original data while preserving the essential characteristics of the data. As a result, combining clustering results from multiple binned data can provide an improved clustering result that reflects both local and global structures of the data. Simulation studies and real data analysis were carried out to evaluate the empirical performance of the proposed method and demonstrate its general utility.
Collapse
Affiliation(s)
- Ja-Yoon Jang
- Department of Statistics, Stanford University, Stanford, California
| | - Hee-Seok Oh
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Yaeji Lim
- Department of Applied Statistics, Chung-Ang University, Seoul, Korea
| | | |
Collapse
|
116
|
Meirmans PG. genodive version 3.0: Easy-to-use software for the analysis of genetic data of diploids and polyploids. Mol Ecol Resour 2020; 20:1126-1131. [PMID: 32061017 PMCID: PMC7496249 DOI: 10.1111/1755-0998.13145] [Citation(s) in RCA: 128] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 01/09/2020] [Accepted: 02/10/2020] [Indexed: 12/28/2022]
Abstract
genodive version 3.0 is a user-friendly program for the analysis of population genetic data. This version presents a major update from the previous version and now offers a wide spectrum of different types of analyses. genodive has an intuitive graphical user interface that allows direct manipulation of the data through transformation, imputation of missing data, and exclusion and inclusion of individuals, population and/or loci. Furthermore, genodive seamlessly supports 15 different file formats for importing or exporting data from or to other programs. One major feature of genodive is that it supports both diploid and polyploid data, up to octaploidy (2n = 8x) for some analyses, but up to hexadecaploidy (2n = 16x) for other analyses. The different types of analyses offered by genodive include multiple statistics for estimating population differentiation (φST , FST , F'ST , GST , G'ST , G''ST , Dest , RST , ρ), analysis of molecular variance-based K-means clustering, Hardy-Weinberg equilibrium, hybrid index, population assignment, clone assignment, Mantel test, Spatial Autocorrelation, 23 ways of calculating genetic distances, and both principal components and principal coordinates analyses. A unique feature of genodive is that it can also open data sets with nongenetic variables, for example environmental data or geographical coordinates that can be included in the analysis. In addition, genodive makes it possible to run several external programs (lfmm, structure, instruct and vegan) directly from its own user interface, avoiding the need for data reformatting and use of the command line. genodive is available for computers running Mac OS X 10.7 or higher and can be downloaded freely from: http://www.patrickmeirmans.com/software.
Collapse
Affiliation(s)
- Patrick G Meirmans
- Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
117
|
Nakamura K, Ogura K, Nakano H, Naraba H, Takahashi Y, Sonoo T, Hashimoto H, Morimura N. C-reactive protein clustering to clarify persistent inflammation, immunosuppression and catabolism syndrome. Intensive Care Med 2020; 46:437-443. [PMID: 31919541 DOI: 10.1007/s00134-019-05851-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 02/06/2023]
Abstract
PURPOSE Among patients surviving treatment in intensive care units (ICU), some cases exist for which inflammation persisted with prolonged hospital stays, referred as persistent inflammatory, immunosuppressed, catabolic syndrome (PIICS). C reactive protein (CRP) is regarded as the most important marker for PIICS. Nevertheless, the applicable cut-off of CRP for PIICS has never been described in the literature. METHODS Data of patients admitted to the ICU/Emergency ward from May 2015 through June 2019 were analyzed retrospectively. Using K-means clustering, a 14-day CRP transition dataset was analyzed and categorized finally into 7 classes: 4 PIICS classes and 3 non-PIICS classes. Outcomes and the other PIICS characteristics were evaluated. RESULTS From all 5513 admitted patients, this study examined data of 539 patients who had been admitted for more than 14 days, and for whom 14 day CRP transition analysis could be performed. By the CRP transitions of 7 categorized classes, the CRP cut-off for PIICS was regarded as 3.0 mg/dl on day 14. The Barthel Index at discharge, albumin, and total lymphocyte counts on day 14 were significantly lower in PIICS classes than those of non-PIICS classes. Creatinine kinase, antithrombin activity and thrombomodulin on admission were regarded as independent risk factors for PIICS. CONCLUSIONS Among patients with prolonged hospital stay, the PIICS population had elevated CRP, but lower Barthel Index, albumin, and total lymphocyte counts. The criterion of day 14 CRP for PIICS should be 3.0 mg/dl.
Collapse
Affiliation(s)
- Kensuke Nakamura
- Department of Emergency and Critical Care MHiedicine, Hitachi General Hospital, 2-1-1, Jonan-cho, Hitachi, Ibaraki, 317-0077, Japan.
| | - Kentaro Ogura
- Faculty of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo, 113-8655, Japan.,TXP Medical Co. Ltd, 3-13 Nihonbashiyokoyamacho, Chuo-ku, Tokyo, 103-0003, Japan
| | - Hidehiko Nakano
- Department of Emergency and Critical Care MHiedicine, Hitachi General Hospital, 2-1-1, Jonan-cho, Hitachi, Ibaraki, 317-0077, Japan
| | - Hiromu Naraba
- Department of Emergency and Critical Care MHiedicine, Hitachi General Hospital, 2-1-1, Jonan-cho, Hitachi, Ibaraki, 317-0077, Japan
| | - Yuji Takahashi
- Department of Emergency and Critical Care MHiedicine, Hitachi General Hospital, 2-1-1, Jonan-cho, Hitachi, Ibaraki, 317-0077, Japan
| | - Tomohiro Sonoo
- Department of Emergency and Critical Care MHiedicine, Hitachi General Hospital, 2-1-1, Jonan-cho, Hitachi, Ibaraki, 317-0077, Japan.,TXP Medical Co. Ltd, 3-13 Nihonbashiyokoyamacho, Chuo-ku, Tokyo, 103-0003, Japan
| | - Hideki Hashimoto
- Department of Emergency and Critical Care MHiedicine, Hitachi General Hospital, 2-1-1, Jonan-cho, Hitachi, Ibaraki, 317-0077, Japan
| | - Naoto Morimura
- Department of Emergency and Critical Care Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo, Tokyo, 113-8655, Japan
| |
Collapse
|
118
|
Liu J, Wang X. Tomato Diseases and Pests Detection Based on Improved Yolo V3 Convolutional Neural Network. Front Plant Sci 2020; 11:898. [PMID: 32612632 PMCID: PMC7309963 DOI: 10.3389/fpls.2020.00898] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 06/02/2020] [Indexed: 05/18/2023]
Abstract
Tomato is affected by various diseases and pests during its growth process. If the control is not timely, it will lead to yield reduction or even crop failure. How to control the diseases and pests effectively and help the vegetable farmers to improve the yield of tomato is very important, and the most important thing is to accurately identify the diseases and insect pests. Compared with the traditional pattern recognition method, the diseases and pests recognition method based on deep learning can directly input the original image. Instead of the tedious steps such as image preprocessing, feature extraction and feature classification in the traditional method, the end-to-end structure is adopted to simplify the recognition process and solve the problem that the feature extractor designed manually is difficult to obtain the feature expression closest to the natural attribute of the object. Based on the application of deep learning object detection, not only can save time and effort, but also can achieve real-time judgment, greatly reduce the huge loss caused by diseases and pests, which has important research value and significance. Based on the latest research results of detection theory based on deep learning object detection and the characteristics of tomato diseases and pests images, this study will build the dataset of tomato diseases and pests under the real natural environment, optimize the feature layer of Yolo V3 model by using image pyramid to achieve multi-scale feature detection, improve the detection accuracy and speed of Yolo V3 model, and detect the location and category of diseases and pests of tomato accurately and quickly. Through the above research, the key technology of tomato pest image recognition in natural environment is broken through, which provides reference for intelligent recognition and engineering application of plant diseases and pests detection.
Collapse
|
119
|
Muhammad MU, Jiadong R, Muhammad NS, Nawaz B. Stratified Diabetes Mellitus Prevalence for the Northwestern Nigerian States, a Data Mining Approach. Int J Environ Res Public Health 2019; 16:E4089. [PMID: 31652912 PMCID: PMC6928643 DOI: 10.3390/ijerph16214089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Revised: 10/20/2019] [Accepted: 10/21/2019] [Indexed: 01/09/2023]
Abstract
An accurate classification for diabetes mellitus (DBM) allows for the adequate treatment and handling of its menace, particularly in developing countries like Nigeria. This study proposes data mining techniques for the classification and identification of the prevalence of diagnosed diabetes cases, stratified by age, gender, diabetic conditions and residential area in the northwestern states of Nigeria, based on the real-life data derived from government-owned hospitals in the region. A K-mean assessment was used to cluster the instances, after 12 iterations the instances classified out of 3022: 2662 (88.09%) non-insulin dependent (NID), 176 (5.82%) insulin-dependent (IND) and 184 (6.09%) gestational diabetes (GTD). The total number of diagnosed diabetes cases was 3022: 1380 males (45.66%) and 1642 females (54.33%). The higher prevalence was found to be in females compared to males, and in cities and towns, rather than in villages (36.5%, 34.2%, and 29.3%, respectively). The highest prevalence among the age groups was in the age group 50-69 years, which constituted 43.9% of the total diagnosed cases. Furthermore, the NID condition had the highest prevalence of cases (88.09%). These were the first findings of the stratified prevalence in the region, and the figures have been of utmost significance to the healthcare authorities, policymakers, clinicians, and non-governmental organizations for the proper planning and management of diabetes mellitus.
Collapse
Affiliation(s)
- Musa Uba Muhammad
- Department of Information sciences and Technology, Yanshan University, Qinhuangdao 066000, China.
| | - Ren Jiadong
- Department of Information sciences and Technology, Yanshan University, Qinhuangdao 066000, China.
| | - Noman Sohail Muhammad
- Department of Information sciences and Technology, Yanshan University, Qinhuangdao 066000, China.
| | - Bilal Nawaz
- State Key Laboratory of Metastable Materials Science and Technology, Yanshan University, Qinhuangdao 066004, China.
| |
Collapse
|
120
|
Al-Yacoub A, Zhao Y, Lohse N, Goh M, Kinnell P, Ferreira P, Hubbard EM. Symbolic-Based Recognition of Contact States for Learning Assembly Skills. Front Robot AI 2019; 6:99. [PMID: 33501114 PMCID: PMC7805827 DOI: 10.3389/frobt.2019.00099] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 09/26/2019] [Indexed: 11/26/2022] Open
Abstract
Imitation learning is gaining more attention because it enables robots to learn skills from human demonstrations. One of the major industrial activities that can benefit from imitation learning is the learning of new assembly processes. An essential characteristic of an assembly skill is its different contact states (CS). They determine how to adjust movements in order to perform the assembly task successfully. Humans can recognize CSs through haptic feedback. They execute complex assembly tasks accordingly. Hence, CSs are generally recognized using force and torque information. This process is not straightforward due to the variations in assembly tasks, signal noise and ambiguity in interpreting force/torque (F/T) information. In this research, an investigation has been conducted to recognize the CSs during an assembly process with a geometrical variation on the mating parts. The F/T data collected from several human trials were pre-processed, segmented and represented as symbols. Those symbols were used to train a probabilistic model. Then, the trained model was validated using unseen datasets. The primary goal of the proposed approach aims to improve recognition accuracy and reduce the computational effort by employing symbolic and probabilistic approaches. The model successfully recognized CS based only on force information. This shows that such models can assist in imitation learning.
Collapse
Affiliation(s)
- Ali Al-Yacoub
- Intelligent Automation Centre, Loughborough University, Loughborough, United Kingdom
| | - Yuchen Zhao
- Beijing Ewaybot Technology LLC, Beijing, China
| | - Niels Lohse
- Intelligent Automation Centre, Loughborough University, Loughborough, United Kingdom
| | - Mey Goh
- Intelligent Automation Centre, Loughborough University, Loughborough, United Kingdom
| | - Peter Kinnell
- Intelligent Automation Centre, Loughborough University, Loughborough, United Kingdom
| | - Pedro Ferreira
- Intelligent Automation Centre, Loughborough University, Loughborough, United Kingdom
| | - Ella-Mae Hubbard
- Intelligent Automation Centre, Loughborough University, Loughborough, United Kingdom
| |
Collapse
|
121
|
Mas S, Torro A, Bec N, Fernández L, Erschov G, Gongora C, Larroque C, Martineau P, de Juan A, Marco S. Use of physiological information based on grayscale images to improve mass spectrometry imaging data analysis from biological tissues. Anal Chim Acta 2019; 1074:69-79. [PMID: 31159941 DOI: 10.1016/j.aca.2019.04.074] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 03/21/2019] [Accepted: 04/30/2019] [Indexed: 10/26/2022]
Abstract
The characterization of cancer tissues by matrix-assisted laser desorption ionization-mass spectrometry images (MALDI-MSI) is of great interest because of the power of MALDI-MS to understand the composition of biological samples and the imaging side that allows for setting spatial boundaries among tissues of different nature based on their compositional differences. In tissue-based cancer research, information on the spatial location of necrotic/tumoral cell populations can be approximately known from grayscale images of the scanned tissue slices. This study proposes as a major novelty the introduction of this physiologically-based information to help in the performance of unmixing methods, oriented to extract the MS signatures and distribution maps of the different tissues present in biological samples. Specifically, the information gathered from grayscale images will be used as a local rank constraint in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS) for the analysis of MALDI-MSI of cancer tissues. The use of this constraint, setting absence of certain kind of tissues only in clear zones of the image, will help to improve the performance of MCR-ALS and to provide a more reliable definition of the chemical MS fingerprint and location of the tissues of interest. The general strategy to address the analysis of MALDI-MSI of cancer tissues will involve the study of the MCR-ALS results and the posterior use of MCR-ALS scores as dimensionality reduction for image segmentation based on K-means clustering. The resolution method will provide the MS signatures and their distribution maps for each tissue in the sample. Then, the resolved distribution maps for each biological component (MCR scores) will be submitted as initial information to K-means clustering for image segmentation to obtain information on the boundaries of the different tissular regions in the samples studied. MCR-ALS prior to K-means not only provides the desired dimensionality reduction, but additionally resolved non-biological signal contributions are not used and the weight given to the different biological components in the segmentation process can be modulated by suitable preprocessing methods.
Collapse
Affiliation(s)
- S Mas
- Signal and Information Processing for Sensing Systems, Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Baldiri Reixac 10-12, 08028, Barcelona, Spain; Chemometrics Group, Department of Chemical Engineering and Analytical Chemistry, Universitat de Barcelona, B. Av. Diagonal, 645, 08028, Barcelona, Spain.
| | - A Torro
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Régional du Cancer de Montpellier (ICM), Montpellier, F-34298, France
| | - N Bec
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Régional du Cancer de Montpellier (ICM), Montpellier, F-34298, France; Institute for Regenerative Medicine & Biotherapy (IRMB), INSERM U1183, CHRU of Montpellier, 80 Rue Augustin Fiche, Montpellier, F-34295, France
| | - L Fernández
- Signal and Information Processing for Sensing Systems, Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Baldiri Reixac 10-12, 08028, Barcelona, Spain; Department of Electronics and Biomedical Engineering, Universitat de Barcelona, Marti i Franqués 1, Barcelona, 08028, Spain
| | - G Erschov
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Régional du Cancer de Montpellier (ICM), Montpellier, F-34298, France
| | - C Gongora
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Régional du Cancer de Montpellier (ICM), Montpellier, F-34298, France
| | - C Larroque
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Régional du Cancer de Montpellier (ICM), Montpellier, F-34298, France; Supportive Care Unit, Institut du Cancer de Montpellier (ICM), 208 Rue des Apothicaires, Montpellier, F-34298, France
| | - P Martineau
- Institut de Recherche en Cancérologie de Montpellier (IRCM), INSERM U1194, Université de Montpellier, Institut Régional du Cancer de Montpellier (ICM), Montpellier, F-34298, France
| | - A de Juan
- Chemometrics Group, Department of Chemical Engineering and Analytical Chemistry, Universitat de Barcelona, B. Av. Diagonal, 645, 08028, Barcelona, Spain
| | - S Marco
- Signal and Information Processing for Sensing Systems, Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Baldiri Reixac 10-12, 08028, Barcelona, Spain; Department of Electronics and Biomedical Engineering, Universitat de Barcelona, Marti i Franqués 1, Barcelona, 08028, Spain
| |
Collapse
|
122
|
Cho J, Zhang S, Kee Y, Spincemaille P, Nguyen TD, Hubertus S, Gupta A, Wang Y. Cluster analysis of time evolution (CAT) for quantitative susceptibility mapping (QSM) and quantitative blood oxygen level-dependent magnitude (qBOLD)-based oxygen extraction fraction (OEF) and cerebral metabolic rate of oxygen (CMRO 2 ) mapping. Magn Reson Med 2019; 83:844-857. [PMID: 31502723 DOI: 10.1002/mrm.27967] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 07/07/2019] [Accepted: 08/04/2019] [Indexed: 01/01/2023]
Abstract
PURPOSE To improve the accuracy of QSM plus quantitative blood oxygen level-dependent magnitude (QSM + qBOLD or QQ)-based mapping of the oxygen extraction fraction (OEF) and cerebral metabolic rate of oxygen (CMRO2 ) using cluster analysis of time evolution (CAT). METHODS 3D multi-echo gradient echo and arterial spin labeling images were acquired in 11 healthy subjects and 5 ischemic stroke patients. DWI was also carried out on patients. CAT was developed for analyzing signal evolution over TE. QQ-based OEF and CMRO2 were reconstructed with and without CAT, and results were compared using region of interest analysis and a paired t-test. RESULTS Simulations demonstrated that CAT substantially reduced noise error in QQ-based OEF. In healthy subjects, QQ-based OEF appeared less noisy and more uniform with CAT than without CAT; average OEF with and without CAT in cortical gray matter was 32.7 ± 4.0% and 37.9 ± 4.5%, with corresponding CMRO2 of 148.4 ± 23.8 and 171.4 ± 22.4 μmol/100 g/min, respectively. In patients, regions of low OEF were confined within the ischemic lesions defined on DWI when using CAT, which was not observed without CAT. CONCLUSION The cluster analysis of time evolution (CAT) significantly improves the robustness of QQ-based OEF against noise.
Collapse
Affiliation(s)
- Junghun Cho
- Department of Biomedical Engineering, Cornell University, Ithaca, New York
| | - Shun Zhang
- Department of Radiology, Weill Cornell Medical College, New York, New York
- Department of Radiology, Tongji Hospital, Wuhan, China
| | - Youngwook Kee
- Department of Radiology, Weill Cornell Medical College, New York, New York
| | | | - Thanh D Nguyen
- Department of Radiology, Weill Cornell Medical College, New York, New York
| | - Simon Hubertus
- Computer Assisted Clinical Medicine, Heidelberg University, Mannheim, Germany
| | - Ajay Gupta
- Department of Radiology, Weill Cornell Medical College, New York, New York
| | - Yi Wang
- Department of Biomedical Engineering, Cornell University, Ithaca, New York
- Department of Radiology, Weill Cornell Medical College, New York, New York
| |
Collapse
|
123
|
Abstract
Clustering is a vital task in magnetic resonance imaging (MRI) brain imaging and plays an important role in the reliability of brain disease detection, diagnosis, and effectiveness of the treatment. Clustering is used in processing and analysis of brain images for different tasks, including segmentation of brain regions and tissues (grey matter, white matter, and cerebrospinal fluid) and clustering of the atrophy in different parts of the brain. This paper presents a state-of-the-art review of brain MRI studies that use clustering techniques for different tasks.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- Department of Computer Science and Engineering, The Ohio State University, Marion, OH 43302, USA
| | - Hojjat Adeli
- Departments of Biomedical Informatics, Neurology, Neuroscience, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
124
|
He M, Zha W, Tan F, Rankine L, Fain S, Driehuys B. A Comparison of Two Hyperpolarized 129Xe MRI Ventilation Quantification Pipelines: The Effect of Signal to Noise Ratio. Acad Radiol 2019; 26:949-959. [PMID: 30269957 DOI: 10.1016/j.acra.2018.08.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/15/2018] [Accepted: 08/28/2018] [Indexed: 12/25/2022]
Abstract
RATIONALE Hyperpolarized 129Xe MRI enables quantitative evaluation of regional ventilation. To this end, multiple classifiers have been proposed to determine ventilation defect percentage (VDP) as well as other cluster populations. However, consensus has not yet been reached regarding which of these methods to deploy for multicenter clinical trials. Here, we compare two published classification techniques-linear-binning and adaptive K-means-to establish their limits of agreement and their robustness against reduced signal-to-noise ratio (SNR). METHODS A total of 29 subjects (age: 38.4 ± 19.0 years) were retrospectively identified for inter-method comparison. For each 129Xe ventilation image, 7 images with reduced SNR were generated with equal decrements relative to the native SNR. All 8 sets of images were then analyzed using both methods independently to classify all lung voxels into four clusters: VDP, low-, medium-, and high-ventilation-percentage (LVP, MVP and HVP). For each cluster, the percentage of the lung it comprised was compared between the two methods, as well as how these values persisted as SNR was degraded. RESULTS The limits of agreement for calculating VDP were [+0.2%, +4.0%] with a +1.5% bias for binning relative to K-means. However, the inter-method agreement for the other clusters was moderate, with biases of -5.7%, 8.1%, and -4.0% for LVP, MVP, and HVP, respectively. As SNR decreased below ∼4, both methods began reporting values that deviated substantially from the native image. By requiring VDP to remain within ≤1.8% of that calculated from the native image, the minimum tolerable SNR values were 2.4 ± 1.0 for the linear-binning, and 3.5 ± 1.5 for the K-means. CONCLUSIONS Both methods agree well in quantifying VDP, but agreement for LVP and MVP remains variable. We suggest a required SNR threshold be two standard deviations above the minimum value of 3.5 ± 1.5 for robust determination of VDP, suggesting a minimum SNR of 6.6. However, robust quantification of the ventilated clusters required an SNR of 13.4.
Collapse
|
125
|
Núñez P, García A, Mazarrasa I, Juanes JA, Abascal AJ, Méndez F, Castanedo S, Medina R. A methodology to assess the probability of marine litter accumulation in estuaries. Mar Pollut Bull 2019; 144:309-324. [PMID: 31180001 DOI: 10.1016/j.marpolbul.2019.04.077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 04/29/2019] [Accepted: 04/29/2019] [Indexed: 06/09/2023]
Abstract
In this study, a general methodology that is based on numerical models and statistical analysis is developed to assist in the definition of marine litter cleanup and mitigation strategies at an estuarine scale. The methodology includes four main steps: k-means clustering to identify representative metocean scenarios; dynamic downscaling to obtain high-resolution drivers with which to force a transport model; numerical transport modelling to generate a database of potential litter trajectories; and a statistical analysis of this database to obtain probabilities of litter accumulation. The efficacy of this methodology is demonstrated by its application to an estuary along the northern coast of Spain by comparing the numerical results with field data. The necessary criteria to ensure its applicability to any other estuary were provided. As the main conclusion, the developed methodology successfully assesses the litter distribution in estuaries with minimum computational effort.
Collapse
Affiliation(s)
- Paula Núñez
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain.
| | - Andrés García
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain
| | - Inés Mazarrasa
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain
| | - José A Juanes
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain
| | - Ana J Abascal
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain
| | - Fernando Méndez
- Departamento de Ciencias y Técnicas del Agua y del Medio Ambiente, E.T.S.I. de Caminos Canales y Puertos, Universidad de Cantabria, Avda. de los Castros s/n, 39005 Santander, Spain
| | - Sonia Castanedo
- Departamento de Ciencias y Técnicas del Agua y del Medio Ambiente, E.T.S.I. de Caminos Canales y Puertos, Universidad de Cantabria, Avda. de los Castros s/n, 39005 Santander, Spain
| | - Raúl Medina
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011 Santander, Spain
| |
Collapse
|
126
|
Yan F, Liu M, Ding C, Wang Y, Yan L. Driving Style Recognition Based on Electroencephalography Data From a Simulated Driving Experiment. Front Psychol 2019; 10:1254. [PMID: 31191419 PMCID: PMC6549479 DOI: 10.3389/fpsyg.2019.01254] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 05/13/2019] [Indexed: 11/13/2022] Open
Abstract
Driving style is a very important indicator and a crucial measurement of a driver's performance and ability to drive in a safe and protective manner. A dangerous driving style would possibly result in dangerous behaviors. If the driving styles can be recognized by some appropriate classification methods, much attention could be paid to the drivers with dangerous driving styles. The driving style recognition module can be integrated into the advanced driving assistance system (ADAS), which integrates different modules to improve driving automation, safety and comfort, and then the driving safety could be enhanced by pre-warning the drivers or adjusting the vehicle's controlling parameters when the dangerous driving style is detected. In most previous studies, driver's questionnaire data and vehicle's objective driving data were utilized to recognize driving styles. And promising results were obtained. However, these methods were indirect or subjective in driving style evaluation. In this paper a method based on objective driving data and electroencephalography (EEG) data was presented to classify driving styles. A simulated driving system was constructed and the EEG data and the objective driving data were collected synchronously during the simulated driving. The driving style of each participant was classified by clustering the driving data via K-means. Then the EEG data was denoised and the amplitude and the Power Spectral Density (PSD) of four frequency bands were extracted as the EEG features by Fast Fourier transform and Welch. Finally, the EEG features, combined with the classification results of the driving data were used to train a Support Vector Machine (SVM) model and a leave-one-subject-out cross validation was utilized to evaluate the performance. The SVM classification accuracy was about 80.0%. Conservative drivers showed higher PSDs in the parietal and occipital areas in the alpha and beta bands, aggressive drivers showed higher PSD in the temporal area in the delta and theta bands. These results imply that different driving styles were related with different driving strategies and mental states and suggest the feasibility of driving style recognition from EEG patterns.
Collapse
Affiliation(s)
- Fuwu Yan
- Hubei Key Laboratory of Advanced Technology for Automotive Components, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China.,Hubei Collaborative Innovation Center for Automotive Components Technology, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China
| | - Mutian Liu
- Hubei Key Laboratory of Advanced Technology for Automotive Components, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China.,Hubei Collaborative Innovation Center for Automotive Components Technology, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China
| | - Changhao Ding
- Hubei Key Laboratory of Advanced Technology for Automotive Components, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China.,Hubei Collaborative Innovation Center for Automotive Components Technology, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China
| | - Yi Wang
- Hubei Key Laboratory of Advanced Technology for Automotive Components, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China.,Hubei Collaborative Innovation Center for Automotive Components Technology, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China
| | - Lirong Yan
- Hubei Key Laboratory of Advanced Technology for Automotive Components, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China.,Hubei Collaborative Innovation Center for Automotive Components Technology, School of Automotive Engineering, Wuhan University of Technology, Wuhan, China
| |
Collapse
|
127
|
Hurme E, Gurarie E, Greif S, Herrera M. LG, Flores-Martínez JJ, Wilkinson GS, Yovel Y. Acoustic evaluation of behavioral states predicted from GPS tracking: a case study of a marine fishing bat. Mov Ecol 2019; 7:21. [PMID: 31223482 PMCID: PMC6567457 DOI: 10.1186/s40462-019-0163-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 05/23/2019] [Indexed: 06/09/2023]
Abstract
BACKGROUND Multiple methods have been developed to infer behavioral states from animal movement data, but rarely has their accuracy been assessed from independent evidence, especially for location data sampled with high temporal resolution. Here we evaluate the performance of behavioral segmentation methods using acoustic recordings that monitor prey capture attempts. METHODS We recorded GPS locations and ultrasonic audio during the foraging trips of 11 Mexican fish-eating bats, Myotis vivesi, using miniature bio-loggers. We then applied five different segmentation algorithms (k-means clustering, expectation-maximization and binary clustering, first-passage time, hidden Markov models, and correlated velocity change point analysis) to infer two behavioral states, foraging and commuting, from the GPS data. To evaluate the inference, we independently identified characteristic patterns of biosonar calls ("feeding buzzes") that occur during foraging in the audio recordings. We then compared segmentation methods on how well they correctly identified the two behaviors and if their estimates of foraging movement parameters matched those for locations with buzzes. RESULTS While the five methods differed in the median percentage of buzzes occurring during predicted foraging events, or true positive rate (44-75%), a two-state hidden Markov model had the highest median balanced accuracy (67%). Hidden Markov models and first-passage time predicted foraging flight speeds and turn angles similar to those measured at locations with feeding buzzes and did not differ in the number or duration of predicted foraging events. CONCLUSION The hidden Markov model method performed best at identifying fish-eating bat foraging segments; however, first-passage time was not significantly different and gave similar parameter estimates. This is the first attempt to evaluate segmentation methodologies in echolocating bats and provides an evaluation framework that can be used on other species.
Collapse
Affiliation(s)
- Edward Hurme
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - Eliezer Gurarie
- Department of Biology, University of Maryland, College Park, MD 20742 USA
| | - Stefan Greif
- School of Zoology, Faculty of Life Sciences, Tel-Aviv University, 6997801 Tel-Aviv, Israel
- Sagol School of Neuroscience, Tel-Aviv University, 6997801 Tel-Aviv, Israel
| | - L. Gerardo Herrera M.
- Estación de Biología de Chamela, Instituto de Biología, Universidad Nacional Autónoma de México, 48980 San Patricio, Mexico
| | - José Juan Flores-Martínez
- Laboratorio de Sistemas de Información Geográfica, Departamento de Zoología, Instituto de Biología, Universidad Nacional Autónoma de México, 04510 Ciudad de México, Mexico
| | | | - Yossi Yovel
- School of Zoology, Faculty of Life Sciences, Tel-Aviv University, 6997801 Tel-Aviv, Israel
- Sagol School of Neuroscience, Tel-Aviv University, 6997801 Tel-Aviv, Israel
| |
Collapse
|
128
|
Hoyos FT, Martín-Landrove M, Navarro RB, Villadiego JV, Cardenas JC. Study of cervical cancer through fractals and a method of clustering based on quantum mechanics. Appl Radiat Isot 2019; 150:182-191. [PMID: 31174008 DOI: 10.1016/j.apradiso.2019.05.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 05/08/2019] [Accepted: 05/08/2019] [Indexed: 12/09/2022]
Abstract
Tumor growth in the cervix is a complex process. Understanding this phenomena is quite relevant in order to establish proper diagnosis and therapy strategies and a possible startpoint is to evaluate its complexity through the scaling analysis, which define the tumor growth geometry. In this work, tumor interface from primary tumors of squamous cells and adenocarcinomas for cervical cancer were extracted. Fractal dimension and local roughness exponent (Barabási and Stanley (1996)), αloc, were calculated to characterize the in vivo 3-D tumor growth. Image acquisition was carried out according to the standard protocol used for cervical cancer radiotherapy, i.e., axial, magnetic resonance T1 - weighted contrast enhanced images comprising the cervix volume for image registration. Image processing was carried out by a classification scheme based on quantum clustering algorithm (Mussa et al. (2015)) combined with the application of the K-means procedure upon contrasted images (Demirkaya et al. (2008)). The results show significant variations of the parameters depending on the tumor stage and its histological origin.
Collapse
Affiliation(s)
- F Torres Hoyos
- Department of Physics, Universidad de Córdoba, 230002, Montería, Colombia; Department of Systems Engineering, Universidad Cooperativa de Colombia, 230002, Montería, Colombia.
| | - M Martín-Landrove
- Center for Medical Visualization, National Institute for Bioengineering, Universidad Central de Venezuela, 1040, Caracas, Venezuela.
| | - R Baena Navarro
- Department of Systems Engineering, Universidad Cooperativa de Colombia, 230002, Montería, Colombia.
| | - J Vergara Villadiego
- Department of Systems Engineering, Universidad Cooperativa de Colombia, 230002, Montería, Colombia.
| | - J Causil Cardenas
- Department of Physics, Universidad de Córdoba, 230002, Montería, Colombia.
| |
Collapse
|
129
|
Li J, Tang S, Zhang H, Li Z, Deng W, Zhao C, Fan L, Wang G, Liu J, Yin P, Xu G, Zhang L, Tang P. Clustering of morphological fracture lines for identifying intertrochanteric fracture classification with Hausdorff distance-based K-means approach. Injury 2019; 50:939-949. [PMID: 31003702 DOI: 10.1016/j.injury.2019.03.032] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 03/17/2019] [Indexed: 02/02/2023]
Abstract
OBJECTIVES The aim of this study was to develop a systematic three-dimensional (3D) classification of intertrochanteric fractures by clustering the morphological features of fracture lines using the Hausdorff distance-based K-means approach and assess the usefulness of it in the clinical setting. METHODS We retrospectively analyzed the data of 504 patients with intertrochanteric fractures who underwent closed reduction and intramedullary internal fixation. The morphological fracture lines of all patients extracted from computed tomography were transcribed freehand onto the template. All fracture lines were then clustered into five distinct types using the Hausdorff distance-based K-means clustering method. Five radiographic parameters and four functional parameters were used to evaluate the postoperative functional states and mobilization levels. Postoperative complications were also recorded. RESULTS Intertrochanteric fractures were classified into five types: type I (108/504, 21.4%), simple fracture with intact lateral femoral wall and greater trochanter fragment; type II (85/504, 16.9%), simple fracture with intact lateral femoral wall with/without lesser trochanter detachment; type III (147/504, 29.2%), fractures with intertrochanteric crest detachment involving the lesser trochanter and greater trochanter with an intact lateral femoral wall; type IV (113/504, 22.4%), fractures with large intertrochanteric crest detachment and large lesser trochanter and greater trochanter detachment partially involving the lateral femoral wall and less medial cortical support; type V (51/504, 10.1%), a combination of pertrochanteric and lateral fracture line involving the entire lateral femoral wall and lesser trochanter detachment. Parameters of femoral neck-shaft angle and sliding distance of the cephalic nail were significantly different among types. The complication rate generally increased from type I to type V (P = 0.035). CONCLUSIONS The unsupervised clustering can achieve identification of the type of intertrochanteric fractures with clinical significance. The Tang classification can be used to describe fracture morphology, predict the possibility of achieving stable reduction and the risk of complications following intramedullary fixation.
Collapse
Affiliation(s)
- Jiantao Li
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China
| | - Shaojie Tang
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, Shanxi, 710121, China
| | - Hao Zhang
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China
| | - Zhirui Li
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China
| | - Wanyu Deng
- School of Computer, Xi'an University of Posts and Telecommunications, Xi'an, Shanxi, 710121, China
| | - Chen Zhao
- School of Computer, Xi'an University of Posts and Telecommunications, Xi'an, Shanxi, 710121, China
| | - Lianghui Fan
- School of Computer, Xi'an University of Posts and Telecommunications, Xi'an, Shanxi, 710121, China
| | - Guoqi Wang
- Department of Pediatrics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing 100853, China
| | - Jianheng Liu
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China
| | - Peng Yin
- Department of Orthopedics, Beijing Chaoyang Hospital, Capital Medical University, No. 8 Gong Ren Ti Yu Chang Nan Lu Rd, Beijing 100020, China
| | - Gaoxiang Xu
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China
| | - Licheng Zhang
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China.
| | - Peifu Tang
- Department of Orthopaedics, Chinese PLA General Hospital, No. 28 Fuxing Road, Beijing, 100853, China.
| |
Collapse
|
130
|
Abstract
BACKGROUND Autism prevalence continues to grow, yet a universally agreed upon etiology is lacking despite manifold evidence of abnormalities especially in terms of genetics and epigenetics. The authors postulate that the broad definition of an omnibus 'spectrum disorder' may inhibit delineation of meaningful clinical correlations. This paper presents evidence that an objectively defined, EEG based brain measure may be helpful in illuminating the autism spectrum versus subgroups (clusters) question. METHODS Forty objectively defined EEG coherence factors created in prior studies demonstrated reliable separation of neuro-typical controls from subjects with autism, and reliable separation of subjects with Asperger's syndrome from all other subjects within the autism spectrum and from neurotypical controls. In the current study, these forty previously defined EEG coherence factors were used prospectively within a large (N = 430) population of subjects with autism in order to determine quantitatively the potential existence of separate clusters within this population. RESULTS By use of a recently published software package, NbClust, the current investigation determined that the 40 EEG coherence factors reliably identified two distinct clusters within the larger population of subjects with autism. These two clusters demonstrated highly significant differences. Of interest, many more subjects with Asperger's syndrome fell into one rather than the other cluster. CONCLUSIONS EEG coherence factors provide evidence of two highly significant separate clusters within the subject population with autism. The establishment of a unitary "Autism Spectrum Disorder" does a disservice to patients and clinicians, hinders much needed scientific exploration, and likely leads to less than optimal educational and/or interventional efforts.
Collapse
Affiliation(s)
- Frank H Duffy
- Department of Neurology, Boston Children's Hospital and Harvard Medical School, Boston, USA.
| | - Heidelise Als
- Department of Psychiatry, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Enders 107, Boston, MA, 02115, USA
| |
Collapse
|
131
|
Abstract
Background Gene signatures are important to represent the molecular changes in the disease genomes or the cells in specific conditions, and have been often used to separate samples into different groups for better research or clinical treatment. While many methods and applications have been available in literature, there still lack powerful ones that can take account of the complex data and detect the most informative signatures. Methods In this article, we present a new framework for identifying gene signatures using Pareto-optimal cluster size identification for RNA-seq data. We first performed pre-filtering steps and normalization, then utilized the empirical Bayes test in Limma package to identify the differentially expressed genes (DEGs). Next, we used a multi-objective optimization technique, “Multi-objective optimization for collecting cluster alternatives” (MOCCA in R package) on these DEGs to find Pareto-optimal cluster size, and then applied k-means clustering to the RNA-seq data based on the optimal cluster size. The best cluster was obtained through computing the average Spearman’s Correlation Score among all the genes in pair-wise manner belonging to the module. The best cluster is treated as the signature for the respective disease or cellular condition. Results We applied our framework to a cervical cancer RNA-seq dataset, which included 253 squamous cell carcinoma (SCC) samples and 22 adenocarcinoma (ADENO) samples. We identified a total of 582 DEGs by Limma analysis of SCC versus ADENO samples. Among them, 260 are up-regulated genes and 322 are down-regulated genes. Using MOCCA, we obtained seven Pareto-optimal clusters. The best cluster has a total of 35 DEGs consisting of all-upregulated genes. For validation, we ran PAMR (prediction analysis for microarrays) classifier on the selected best cluster, and assessed the classification performance. Our evaluation, measured by sensitivity, specificity, precision, and accuracy, showed high confidence. Conclusions Our framework identified a multi-objective based cluster that is treated as a signature that can classify the disease and control group of samples with higher classification performance (accuracy 0.935) for the corresponding disease. Our method is useful to find signature for any RNA-seq or microarray data.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, 77030, TX, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, 77030, TX, USA. .,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, 37232, TN, USA.
| |
Collapse
|
132
|
Lai X, Wang H. Combined Channel Estimation with Interference Suppression in CPSS. Sensors (Basel) 2018; 18:E3823. [PMID: 30413015 DOI: 10.3390/s18113823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 11/04/2018] [Accepted: 11/05/2018] [Indexed: 11/16/2022]
Abstract
With social characteristics integrated into cyber-physical systems (CPS), the wireless channel has been a complex electromagnetic environment due to the subjectivity of human behaviour. For the low-power and resource-constrained nodes in cyber-physical-social systems (CPSS), minimum research is available focusing on conquering the issues of computational complexity, external interference and transmission fading simultaneously. This study aims to explore channel estimation with interference suppression based on machine learning. A novel channel estimation scheme is proposed, which combined interference suppression in channel impulse response (CIR) of frequency domain with K-means algorithm and noise cancellation in CIR of time domain with K-nearest neighbor (KNN) algorithm into an integrated process. Complexity analysis and simulation results showed that the proposed scheme has relatively lower complexity and the performance is proven better than traditional schemes, which meets the requirements of CPSS in complex electromagnetic environments.
Collapse
|
133
|
Abu-Jamous B, Kelly S. Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data. Genome Biol 2018; 19:172. [PMID: 30359297 PMCID: PMC6203272 DOI: 10.1186/s13059-018-1536-8] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/11/2018] [Indexed: 01/24/2023] Open
Abstract
Identifying co-expressed gene clusters can provide evidence for genetic or physical interactions. Thus, co-expression clustering is a routine step in large-scale analyses of gene expression data. We show that commonly used clustering methods produce results that substantially disagree and that do not match the biological expectations of co-expressed gene clusters. We present clust, a method that solves these problems by extracting clusters matching the biological expectations of co-expressed genes and outperforms widely used methods. Additionally, clust can simultaneously cluster multiple datasets, enabling users to leverage the large quantity of public expression data for novel comparative analysis. Clust is available at https://github.com/BaselAbujamous/clust.
Collapse
Affiliation(s)
- Basel Abu-Jamous
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
134
|
Su Y, Reedy J, Carroll RJ. Clustering in General Measurement Error Models. Stat Sin 2018; 28:2337-2351. [PMID: 30636855 PMCID: PMC6329467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper is dedicated to the memory of Peter G. Hall. It concerns a deceptively simple question: if one observes variables corrupted with measurement error of possibly very complex form, can one recreate asymptotically the clusters that would have been found had there been no measurement error? We show that the answer is yes, and that the solution is surprisingly simple and general. The method itself is to simulate, by computer, realizations with the same distribution as that of the true variables, and then to apply clustering to these realizations. Technically, we show that if one uses K-means clustering or any other risk minimizing clustering, and a multivariate deconvolution device with certain smoothness and convergence properties, then, in the limit, the cluster means based on our method converge to the same cluster means as if there is no measurement error. Along with the method and its technical justification, we analyze two important nutrition data sets, finding patterns that make sense nutritionally.
Collapse
Affiliation(s)
- Ya Su
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143
| | - Jill Reedy
- Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD 20892
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, and School of Mathematical and Physical Sciences, University of Technology, Sydney, Broadway NSW 2007, Australia
| |
Collapse
|
135
|
Gaudet S, Begon M, Tremblay J. Cluster analysis using physical performance and self-report measures to identify shoulder injury in overhead female athletes. J Sci Med Sport 2019; 22:269-74. [PMID: 30253926 DOI: 10.1016/j.jsams.2018.09.224] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 09/04/2018] [Accepted: 09/07/2018] [Indexed: 10/28/2022]
Abstract
OBJECTIVES To evaluate the diagnostic validity of the Kerlan-Jobe orthopedic clinic shoulder and elbow score (KJOC) and the Closed kinetic upper extremity stability test (CKCUEST) to assess functional impairments associated with shoulder injury in overhead female athletic populations. DESIGN Cross-sectional design. METHODS Thirty-four synchronized swimming and team handball female athletes completed the KJOC and the CKCUEST during their respective team selection trials. Unsupervised learning using k-means algorithm was used on collected data to perform group clustering and classify athletes as Injured or Not Injured. Odds ratios, likelihood ratios, sensitivity and specificity were computed based on the self-reported presence of shoulder injury at the time of testing or during the previous year. RESULTS Seven of the 34 athletes were injured or had suffered a time-loss injury in the previous year, representing a 20.5% prevalence rate. Clustering method using KJOC data resulted in a sensitivity of 86%, a specificity of 100% and a 229.67 diagnostic odds ratio. Clustering method using CKCUEST data resulted in a sensitivity of 86%, a specificity of 37% and a 3.53 diagnostic odds ratio. CONCLUSIONS KJOC had good diagnostic validity to assess shoulder function and differentiate between injured and non-injured elite synchronized swimming and team handball female athletes. The CKCUEST seemed to be a poor screening test but may be an interesting test to evaluate functional upper extremity strength and plyometric capacity. Unsupervised learning methods allow to make decisions based on numerous variables which is an advantage when considering the usually substantial overlap in screening test scores between high- and low-risk athletes.
Collapse
|
136
|
Liu B, Fu Z, Wang P, Liu L, Gao M, Liu J. Big-Data-Mining-Based Improved K-Means Algorithm for Energy Use Analysis of Coal-Fired Power Plant Units: A Case Study. Entropy (Basel) 2018; 20:e20090702. [PMID: 33265791 PMCID: PMC7513223 DOI: 10.3390/e20090702] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 09/07/2018] [Accepted: 09/12/2018] [Indexed: 11/16/2022]
Abstract
The energy use analysis of coal-fired power plant units is of significance for energy conservation and consumption reduction. One of the most serious problems attributed to Chinese coal-fired power plants is coal waste. Several units in one plant may experience a practical rated output situation at the same time, which may increase the coal consumption of the power plant. Here, we propose a new hybrid methodology for plant-level load optimization to minimize coal consumption for coal-fired power plants. The proposed methodology includes two parts. One part determines the reference value of the controllable operating parameters of net coal consumption under typical load conditions, based on an improved K-means algorithm and the Hadoop platform. The other part utilizes a support vector machine to determine the sensitivity coefficients of various operating parameters for the net coal consumption under different load conditions. Additionally, the fuzzy rough set attribute reduction method was employed to obtain the minimalist properties reduction method parameters to reduce the complexity of the dataset. This work is based on continuously-measured information system data from a 600 MW coal-fired power plant in China. The results show that the proposed strategy achieves high energy conservation performance. Taking the 600 MW load optimization value as an example, the optimized power supply coal consumption is 307.95 g/(kW·h) compared to the actual operating value of 313.45 g/(kW·h). It is important for coal-fired power plants to reduce their coal consumption.
Collapse
Affiliation(s)
- Binghan Liu
- School of Energy, Power and Mechanical Engineering, North China Electric Power University, Beijing 102206, China
- Correspondence: (B.L.); (Z.F.); Tel.: +86-10-6177-2361 (Z.F.)
| | - Zhongguang Fu
- School of Energy, Power and Mechanical Engineering, North China Electric Power University, Beijing 102206, China
- Correspondence: (B.L.); (Z.F.); Tel.: +86-10-6177-2361 (Z.F.)
| | - Pengkai Wang
- School of Energy, Power and Mechanical Engineering, North China Electric Power University, Beijing 102206, China
| | - Lu Liu
- National Engineering Laboratory for Biomass Power Generation Equipment, North China Electric Power University, Beijing 102206, China
| | - Manda Gao
- School of Energy, Power and Mechanical Engineering, North China Electric Power University, Beijing 102206, China
| | - Ji Liu
- National Engineering Laboratory for Biomass Power Generation Equipment, North China Electric Power University, Beijing 102206, China
| |
Collapse
|
137
|
Yang L, Ma R, Zhang HM, Guan W, Jiang S. Driving behavior recognition using EEG data from a simulated car-following experiment. Accid Anal Prev 2018; 116:30-40. [PMID: 29174606 DOI: 10.1016/j.aap.2017.11.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 10/27/2017] [Accepted: 11/08/2017] [Indexed: 05/13/2023]
Abstract
Driving behavior recognition is the foundation of driver assistance systems, with potential applications in automated driving systems. Most prevailing studies have used subjective questionnaire data and objective driving data to classify driving behaviors, while few studies have used physiological signals such as electroencephalography (EEG) to gather data. To bridge this gap, this paper proposes a two-layer learning method for driving behavior recognition using EEG data. A simulated car-following driving experiment was designed and conducted to simultaneously collect data on the driving behaviors and EEG data of drivers. The proposed learning method consists of two layers. In Layer I, two-dimensional driving behavior features representing driving style and stability were selected and extracted from raw driving behavior data using K-means and support vector machine recursive feature elimination. Five groups of driving behaviors were classified based on these two-dimensional driving behavior features. In Layer II, the classification results from Layer I were utilized as inputs to generate a k-Nearest-Neighbor classifier identifying driving behavior groups using EEG data. Using independent component analysis, a fast Fourier transformation, and linear discriminant analysis sequentially, the raw EEG signals were processed to extract two core EEG features. Classifier performance was enhanced using the adaptive synthetic sampling approach. A leave-one-subject-out cross validation was conducted. The results showed that the average classification accuracy for all tested traffic states was 69.5% and the highest accuracy reached 83.5%, suggesting a significant correlation between EEG patterns and car-following behavior.
Collapse
Affiliation(s)
- Liu Yang
- MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Rui Ma
- Department of Civil and Environmental Engineering, University of California Davis, Davis, CA 95616, USA.
| | - H Michael Zhang
- Department of Civil and Environmental Engineering, University of California Davis, Davis, CA 95616, USA.
| | - Wei Guan
- MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Shixiong Jiang
- MOE Key Laboratory for Urban Transportation Complex Systems Theory and Technology, Beijing Jiaotong University, Beijing 100044, China.
| |
Collapse
|
138
|
Le V, Yang D, Zhu Y, Zheng B, Bai C, Shi H, Hu J, Zhai C, Lu S. Quantitative CT analysis of pulmonary nodules for lung adenocarcinoma risk classification based on an exponential weighted grey scale angular density distribution feature. Comput Methods Programs Biomed 2018; 160:141-151. [PMID: 29728241 DOI: 10.1016/j.cmpb.2018.04.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 02/21/2018] [Accepted: 04/02/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVES To improve lung nodule classification efficiency, we propose a lung nodule CT image characterization method. We propose a multi-directional feature extraction method to effectively represent nodules of different risk levels. The proposed feature combined with pattern recognition model to classify lung adenocarcinomas risk to four categories: Atypical Adenomatous Hyperplasia (AAH), Adenocarcinoma In Situ (AIS), Minimally Invasive Adenocarcinoma (MIA), and Invasive Adenocarcinoma (IA). METHODS First, we constructed the reference map using an integral image and labelled this map using a K-means approach. The density distribution map of the lung nodule image was generated after scanning all pixels in the nodule image. An exponential function was designed to weight the angular histogram for each component of the distribution map, and the features of the image were described. Then, quantitative measurement was performed using a Random Forest classifier. The evaluation data were obtained from the LIDC-IDRI database and the CT database which provided by Shanghai Zhongshan hospital (ZSDB). In the LIDC-IDRI, the nodules are categorized into three configurations with five ranks of malignancy ("1" to "5"). In the ZSDB, the nodule categories are AAH, AIS, MIA, and IA. RESULTS The average of Student's t-test p-values were less than 0.02. The AUCs for the LIDC-IDRI database were 0.9568, 0.9320, and 0.8288 for Configurations 1, 2, and 3, respectively. The AUCs for the ZSDB were 0.9771, 0.9917, 0.9590, and 0.9971 for AAH, AIS, MIA and IA, respectively. CONCLUSION The experimental results demonstrate that the proposed method outperforms the state-of-the-art and is robust for different lung CT image datasets.
Collapse
Affiliation(s)
- Vanbang Le
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, Postcode 200237, China
| | - Dawei Yang
- Department of Pulmonary Medicine, ZhongShan Hospital, Fudan University, Shanghai, Postcode 200032, China; Shanghai Respiratory Research Institute, Shanghai, Postcode 200032, China
| | - Yu Zhu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, Postcode 200237, China.
| | - Bingbing Zheng
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, Postcode 200237, China.
| | - Chunxue Bai
- Department of Pulmonary Medicine, ZhongShan Hospital, Fudan University, Shanghai, Postcode 200032, China; Shanghai Respiratory Research Institute, Shanghai, Postcode 200032, China.
| | - Hongcheng Shi
- Department of Nuclear Medicine, ZhongShan Hospital, Fudan University, Shanghai, Postcode 200032, China.
| | - Jie Hu
- Department of Pulmonary Medicine, ZhongShan Hospital, Fudan University, Shanghai, Postcode 200032, China; Shanghai Respiratory Research Institute, Shanghai, Postcode 200032, China
| | - Changwen Zhai
- Department of Pathology, ZhongShan Hospital, Fudan University, Shanghai, Postcode 200032, China.
| | - Shaohua Lu
- Department of Pathology, ZhongShan Hospital, Fudan University, Shanghai, Postcode 200032, China
| |
Collapse
|
139
|
Ouabida E, Essadike A, Bouzid A. Automated segmentation of ophthalmological images by an optical based approach for early detection of eye tumor growing. Phys Med 2018; 48:37-46. [PMID: 29728227 DOI: 10.1016/j.ejmp.2018.03.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 03/15/2018] [Accepted: 03/23/2018] [Indexed: 11/27/2022] Open
Abstract
PURPOSE Iris neoplasm is a non-symptom cancer that causes a gradual loss of sight. The first purpose of this study was to present a novel and automatic method for segmenting the iris tumors and detecting the corresponding areas changing along time. The second aim of this work was to investigate several recently published methods after being applied for the iris tumors segmentation. METHODS Our approach consists firstly in segmenting the iris region by using the Vander Lugt correlator based active contour method. Secondly, by treating only the iris region, a K-means clustering model was used to assign the tumorous tissue to one pixel-cluster. This model is quite sensitive to the center initialization and to the choice of the distance measure. To solve these problems, a proportional probability based approach was introduced for the cluster center initialization, and the impact of several distance measure was investigated. The proposed method and the different comparative methods were evaluated on two databases: the Eye Cancer and the Miles Research. RESULTS Results reported using several performance metrics reveal that the first step assures the detection of all iris tumors with an accuracy of 100%. Additionally, the proposed method yields better performance compared to the recently published methods.
Collapse
Affiliation(s)
- Elhoussaine Ouabida
- Moulay Ismail University, Faculty of Sciences, Department of Physics, BP 11201 Zitoune, Meknes, Morocco.
| | - Abdelaziz Essadike
- Moulay Ismail University, Faculty of Sciences, Department of Physics, BP 11201 Zitoune, Meknes, Morocco.
| | - Abdenbi Bouzid
- Moulay Ismail University, Faculty of Sciences, Department of Physics, BP 11201 Zitoune, Meknes, Morocco.
| |
Collapse
|
140
|
Katiyar P, Divine MR, Kohlhofer U, Quintanilla-Martinez L, Schölkopf B, Pichler BJ, Disselhorst JA. A Novel Unsupervised Segmentation Approach Quantifies Tumor Tissue Populations Using Multiparametric MRI: First Results with Histological Validation. Mol Imaging Biol 2018; 19:391-397. [PMID: 27734253 PMCID: PMC5332060 DOI: 10.1007/s11307-016-1009-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Purpose We aimed to precisely estimate intra-tumoral heterogeneity using spatially regularized spectral clustering (SRSC) on multiparametric MRI data and compare the efficacy of SRSC with the previously reported segmentation techniques in MRI studies. Procedures Six NMRI nu/nu mice bearing subcutaneous human glioblastoma U87 MG tumors were scanned using a dedicated small animal 7T magnetic resonance imaging (MRI) scanner. The data consisted of T2 weighted images, apparent diffusion coefficient maps, and pre- and post-contrast T2 and T2* maps. Following each scan, the tumors were excised into 2–3-mm thin slices parallel to the axial field of view and processed for histological staining. The MRI data were segmented using SRSC, K-means, fuzzy C-means, and Gaussian mixture modeling to estimate the fractional population of necrotic, peri-necrotic, and viable regions and validated with the fractional population obtained from histology. Results While the aforementioned methods overestimated peri-necrotic and underestimated viable fractions, SRSC accurately predicted the fractional population of all three tumor tissue types and exhibited strong correlations (rnecrotic = 0.92, rperi-necrotic = 0.82 and rviable = 0.98) with the histology. Conclusions The precise identification of necrotic, peri-necrotic and viable areas using SRSC may greatly assist in cancer treatment planning and add a new dimension to MRI-guided tumor biopsy procedures. Electronic supplementary material The online version of this article (doi:10.1007/s11307-016-1009-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Prateek Katiyar
- Werner Siemens Imaging Center, Department of Preclinical Imaging and Radiopharmacy, Eberhard Karls University Tuebingen, Roentgenweg 13, 72076, Tuebingen, Germany.
- Max Planck Institute for Intelligent Systems, Tuebingen, Germany.
| | - Mathew R Divine
- Werner Siemens Imaging Center, Department of Preclinical Imaging and Radiopharmacy, Eberhard Karls University Tuebingen, Roentgenweg 13, 72076, Tuebingen, Germany
| | - Ursula Kohlhofer
- Institute of Pathology and Neuropathology, Eberhard Karls University Tuebingen and Comprehensive Cancer Center, University Hospital Tuebingen, Tuebingen, Germany
| | - Leticia Quintanilla-Martinez
- Institute of Pathology and Neuropathology, Eberhard Karls University Tuebingen and Comprehensive Cancer Center, University Hospital Tuebingen, Tuebingen, Germany
| | | | - Bernd J Pichler
- Werner Siemens Imaging Center, Department of Preclinical Imaging and Radiopharmacy, Eberhard Karls University Tuebingen, Roentgenweg 13, 72076, Tuebingen, Germany
| | - Jonathan A Disselhorst
- Werner Siemens Imaging Center, Department of Preclinical Imaging and Radiopharmacy, Eberhard Karls University Tuebingen, Roentgenweg 13, 72076, Tuebingen, Germany
| |
Collapse
|
141
|
Abstract
This paper extends the recently proposed and theoretically justified iterative thresholding and K residual means (ITKrM) algorithm to learning dictionaries from incomplete/masked training data (ITKrMM). It further adapts the algorithm to the presence of a low-rank component in the data and provides a strategy for recovering this low-rank component again from incomplete data. Several synthetic experiments show the advantages of incorporating information about the corruption into the algorithm. Further experiments on image data confirm the importance of considering a low-rank component in the data and show that the algorithm compares favourably to its closest dictionary learning counterparts, wKSVD and BPFA, either in terms of computational complexity or in terms of consistency between the dictionaries learned from corrupted and uncorrupted data. To further confirm the appropriateness of the learned dictionaries, we explore an application to sparsity-based image inpainting. There the ITKrMM dictionaries show a similar performance to other learned dictionaries like wKSVD and BPFA and a superior performance to other algorithms based on pre-defined/analytic dictionaries.
Collapse
Affiliation(s)
- Valeriya Naumova
- Simula Metropolitan Center for Digital Engineering, Martin Linges 25, Fornebu, 1325 Norway
| | - Karin Schnass
- Department of Mathematics, University of Innsbruck, Technikerstraße 13, Innsbruck, 6020 Austria
| |
Collapse
|
142
|
Khouj Y, Dawson J, Coad J, Vona-Davis L. Hyperspectral Imaging and K-Means Classification for Histologic Evaluation of Ductal Carcinoma In Situ. Front Oncol 2018; 8:17. [PMID: 29468139 PMCID: PMC5808285 DOI: 10.3389/fonc.2018.00017] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 01/17/2018] [Indexed: 11/13/2022] Open
Abstract
Hyperspectral imaging (HSI) is a non-invasive optical imaging modality that shows the potential to aid pathologists in breast cancer diagnoses cases. In this study, breast cancer tissues from different patients were imaged by a hyperspectral system to detect spectral differences between normal and breast cancer tissues. Tissue samples mounted on slides were identified from 10 different patients. Samples from each patient included both normal and ductal carcinoma tissue, both stained with hematoxylin and eosin stain and unstained. Slides were imaged using a snapshot HSI system, and the spectral reflectance differences were evaluated. Analysis of the spectral reflectance values indicated that wavelengths near 550 nm showed the best differentiation between tissue types. This information was used to train image processing algorithms using supervised and unsupervised data. The K-means method was applied to the hyperspectral data cubes, and successfully detected spectral tissue differences with sensitivity of 85.45%, and specificity of 94.64% with true negative rate of 95.8%, and false positive rate of 4.2%. These results were verified by ground-truth marking of the tissue samples by a pathologist. In the hyperspectral image analysis, the image processing algorithm, K-means, shows the greatest potential for building a semi-automated system that could identify and sort between normal and ductal carcinoma in situ tissues.
Collapse
Affiliation(s)
- Yasser Khouj
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, United States
| | - Jeremy Dawson
- Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, United States
| | - James Coad
- Department of Pathology, West Virginia University, Morgantown, WV, United States.,West Virginia University Cancer Institute, Morgantown, WV, United States
| | - Linda Vona-Davis
- West Virginia University Cancer Institute, Morgantown, WV, United States.,Department of Surgery, West Virginia University, Morgantown, WV, United States
| |
Collapse
|
143
|
Mendizabal-Ruiz G, Román-Godínez I, Torres-Ramos S, Salido-Ruiz RA, Vélez-Pérez H, Morales JA. Genomic signal processing for DNA sequence clustering. PeerJ 2018; 6:e4264. [PMID: 29379686 PMCID: PMC5786891 DOI: 10.7717/peerj.4264] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Accepted: 12/24/2017] [Indexed: 11/20/2022] Open
Abstract
Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.
Collapse
Affiliation(s)
| | - Israel Román-Godínez
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - Sulema Torres-Ramos
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - Ricardo A Salido-Ruiz
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - Hugo Vélez-Pérez
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| | - J Alejandro Morales
- Departamento de Ciencias Computacionales, Universidad de Guadalajara, Guadalajara, Mexico
| |
Collapse
|
144
|
de la Hoz CF, Ramos E, Puente A, Méndez F, Menéndez M, Juanes JA, Losada ÍJ. Ecological typologies of large areas. An application in the Mediterranean Sea. J Environ Manage 2018; 205:59-72. [PMID: 28964975 DOI: 10.1016/j.jenvman.2017.09.058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Revised: 09/20/2017] [Accepted: 09/21/2017] [Indexed: 06/07/2023]
Abstract
One approach to identifying and mapping the state of marine biophysical conditions is the identification of large-scale ecological units for which conditions are similar and the strategies of management may also be similar. Because biological processes are difficult to directly record over large areas, abiotic characteristics are used as surrogate parameters. In this work, the Mediterranean Sea was classified into homogeneous spatial areas based on abiotic variables. Eight parameters were selected based on salinity, sea surface temperature, photosynthetically active radiation, sea-wave heights and depth variables. The parameters were gathered in grid points of 0.5° spatial resolution in the open sea and 0.125° in coastal areas. The typologies were obtained by data mining the eight parameters throughout the Mediterranean and combining two clustering techniques: self-organizing maps and the k-means algorithm. The result is a division of the Mediterranean Sea into seven typologies. For these typologies, the classification recognizes differences in temperature, salinity and radiation. In addition, it separates coastal from deep areas. The influence of river discharges and the entrance of water from other seas are also reflected. These results are consistent with the ecological requirements of the five studied seagrasses (Posidonia oceanica, Zostera marina, Zostera noltei, Cymodocea nodosa, Halophila stipulacea), supporting the suitability of the resulting classification and the proposed methodology. The approach thus provides a tool for the sustainable management of large marine areas and the ability to address not only present threats but also future conditions, such as climate change.
Collapse
Affiliation(s)
- Camino F de la Hoz
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain
| | - Elvira Ramos
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain
| | - Araceli Puente
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain.
| | - Fernando Méndez
- Departamento Ciencias y Tecnicas del Agua y del Medio Ambiente, Universidad de Cantabria, Santander, Spain
| | - Melisa Menéndez
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain
| | - José A Juanes
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain
| | - Íñigo J Losada
- Environmental Hydraulics Institute, Universidad de Cantabria, Avda. Isabel Torres, 15, Parque Científico y Tecnológico de Cantabria, 39011, Santander, Spain
| |
Collapse
|
145
|
Su J, Liu S, Song J. A segmentation method based on HMRF for the aided diagnosis of acute myeloid leukemia. Comput Methods Programs Biomed 2017; 152:115-123. [PMID: 29054251 DOI: 10.1016/j.cmpb.2017.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 09/11/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVES The diagnosis of acute myeloid leukemia (AML) is purely dependent on counting the percentages of blasts (>20%) in the peripheral blood or bone marrow. Manual microscopic examination of peripheral blood or bone marrow aspirate smears is time consuming and less accurate. The first and very important step in blast recognition is the segmentation of the cells from the background for further cell feature extraction and cell classification. In this paper, we aimed to utilize computer technologies in image analysis and artificial intelligence to develop an automatic program for blast recognition and counting in the aspirate smears. METHODS We proposed a method to analyze the aspirate smear images, which first performs segmentation of the cells by k-means cluster, then builds cell image representing model by HMRF (Hidden-Markov Random Field), estimates model parameters through probability of EM (expectation maximization), carries out convergence iteration until optimal value, and finally achieves second stage refined segmentation. Furthermore, the segmentation results are compared with several other methods using six classes of cells respectively. RESULTS The proposed method was applied to six groups of cells from 61 bone marrow aspirate images, and compared with other algorithms for its performance on the analysis of the whole images, the segmentation of nucleus, and the efficiency of calculation. It showed improved segmentation results in both the cropped images and the whole images, which provide the base for down-stream cell feature extraction and identification. CONCLUSIONS Segmentation of the aspirate smear images using the proposed method helps the analyst in differentiating six groups of cells and in the determination of blasts counting, which will be of great significance for the diagnosis of acute myeloid leukemia.
Collapse
Affiliation(s)
- Jie Su
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang, China.
| | - Shuai Liu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang, China
| | - Jinming Song
- Department of Hematopathology and Lab Medicines, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| |
Collapse
|
146
|
Abstract
Background Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells. Results We have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression. Conclusions In summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4019-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lu Yang
- Integrative Genomics Core, Beckman Research Institute, City of Hope, Duarte, CA, 91010, USA
| | - Jiancheng Liu
- Department of Developmental and Stem Cell Biology, Beckman Research Institute, City of Hope, Duarte, CA, 91010, USA
| | - Qiang Lu
- Department of Developmental and Stem Cell Biology, Beckman Research Institute, City of Hope, Duarte, CA, 91010, USA
| | - Arthur D Riggs
- Diabetes and Metabolism Research Institute, City of Hope, Duarte, CA, 91010, USA
| | - Xiwei Wu
- Integrative Genomics Core, Beckman Research Institute, City of Hope, Duarte, CA, 91010, USA. .,Department of Molecular and Cellular Biology, Beckman Research Institute, City of Hope, Duarte, CA, 91010, USA.
| |
Collapse
|
147
|
Abstract
We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.
Collapse
|
148
|
Kakushadze Z, Yu W. * K-means and cluster models for cancer signatures. Biomol Detect Quantif 2017; 13:7-31. [PMID: 29021969 DOI: 10.1016/j.bdq.2017.07.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 07/18/2017] [Accepted: 07/18/2017] [Indexed: 01/03/2023]
Abstract
We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.
Collapse
|
149
|
Zou YB, Chen YM, Gao MK, Liu Q, Jiang SY, Lu JH, Huang C, Li ZY, Zhang DH. Coronary Heart Disease Preoperative Gesture Interactive Diagnostic System Based on Augmented Reality. J Med Syst 2017; 41:126. [PMID: 28718051 DOI: 10.1007/s10916-017-0768-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 06/29/2017] [Indexed: 10/19/2022]
Abstract
Coronary heart disease preoperative diagnosis plays an important role in the treatment of vascular interventional surgery. Actually, most doctors are used to diagnosing the position of the vascular stenosis and then empirically estimating vascular stenosis by selective coronary angiography images instead of using mouse, keyboard and computer during preoperative diagnosis. The invasive diagnostic modality is short of intuitive and natural interaction and the results are not accurate enough. Aiming at above problems, the coronary heart disease preoperative gesture interactive diagnostic system based on Augmented Reality is proposed. The system uses Leap Motion Controller to capture hand gesture video sequences and extract the features which that are the position and orientation vector of the gesture motion trajectory and the change of the hand shape. The training planet is determined by K-means algorithm and then the effect of gesture training is improved by multi-features and multi-observation sequences for gesture training. The reusability of gesture is improved by establishing the state transition model. The algorithm efficiency is improved by gesture prejudgment which is used by threshold discriminating before recognition. The integrity of the trajectory is preserved and the gesture motion space is extended by employing space rotation transformation of gesture manipulation plane. Ultimately, the gesture recognition based on SRT-HMM is realized. The diagnosis and measurement of the vascular stenosis are intuitively and naturally realized by operating and measuring the coronary artery model with augmented reality and gesture interaction techniques. All of the gesture recognition experiments show the distinguish ability and generalization ability of the algorithm and gesture interaction experiments prove the availability and reliability of the system.
Collapse
|
150
|
Vera JF, Macías R. Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data. Psychometrika 2017; 82:275-294. [PMID: 28194550 DOI: 10.1007/s11336-017-9561-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Revised: 04/28/2016] [Indexed: 06/06/2023]
Abstract
One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.
Collapse
Affiliation(s)
- J Fernando Vera
- Department of Statistics and O.R., Faculty of Sciences, University of Granada, 18071, Granada, Spain.
| | - Rodrigo Macías
- Centro de Investigación en Matemáticas, Unidad Monterrey, Monterrey, Mexico
| |
Collapse
|