101
|
Cecilia JM, Cano JC, Morales-García J, Llanes A, Imbernón B. Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms. SENSORS 2020; 20:s20216335. [PMID: 33172017 PMCID: PMC7664181 DOI: 10.3390/s20216335] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 10/30/2020] [Accepted: 11/03/2020] [Indexed: 11/16/2022]
Abstract
Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as "dark data", i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importance for performing exploratory data analysis to identify a set (a.k.a., cluster) of similar objects. Clustering algorithms are computationally heavy workloads and require to be executed on high-performance computing clusters, especially to deal with large datasets. This execution on HPC infrastructures is an energy hungry procedure with additional issues, such as high-latency communications or privacy. Edge computing is a paradigm to enable light-weight computations at the edge of the network that has been proposed recently to solve these issues. In this paper, we provide an in-depth analysis of emergent edge computing architectures that include low-power Graphics Processing Units (GPUs) to speed-up these workloads. Our analysis includes performance and power consumption figures of the latest Nvidia's AGX Xavier to compare the energy-performance ratio of these low-cost platforms with a high-performance cloud-based counterpart version. Three different clustering algorithms (i.e., k-means, Fuzzy Minimals (FM), and Fuzzy C-Means (FCM)) are designed to be optimally executed on edge and cloud platforms, showing a speed-up factor of up to 11× for the GPU code compared to sequential counterpart versions in the edge platforms and energy savings of up to 150% between the edge computing and HPC platforms.
Collapse
Affiliation(s)
- José M. Cecilia
- Computer Engineering Department (DISCA), Universitat Politécnica de Valencia (UPV), 46022 Valencia, Spain;
- Correspondence:
| | - Juan-Carlos Cano
- Computer Engineering Department (DISCA), Universitat Politécnica de Valencia (UPV), 46022 Valencia, Spain;
| | - Juan Morales-García
- Computer Science Department, Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.M.-G.); (A.L.); (B.I.)
| | - Antonio Llanes
- Computer Science Department, Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.M.-G.); (A.L.); (B.I.)
| | - Baldomero Imbernón
- Computer Science Department, Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.M.-G.); (A.L.); (B.I.)
| |
Collapse
|
102
|
Blumenberg L, Ruggles KV. Hypercluster: a flexible tool for parallelized unsupervised clustering optimization. BMC Bioinformatics 2020; 21:428. [PMID: 32993491 PMCID: PMC7525959 DOI: 10.1186/s12859-020-03774-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 09/22/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Unsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. RESULTS We present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Users can efficiently evaluate a huge range of clustering results from multiple models and hyperparameters to identify an optimal model. CONCLUSIONS Hypercluster improves ease of use, robustness and reproducibility for unsupervised clustering application for high throughput biology. Hypercluster is available on pip and bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster .
Collapse
Affiliation(s)
- Lili Blumenberg
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Kelly V. Ruggles
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| |
Collapse
|
103
|
Vaura FC, Salomaa VV, Kantola IM, Kaaja R, Lahti L, Niiranen TJ. Unsupervised hierarchical clustering identifies a metabolically challenged subgroup of hypertensive individuals. J Clin Hypertens (Greenwich) 2020; 22:1546-1553. [PMID: 33460260 PMCID: PMC8029868 DOI: 10.1111/jch.13984] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 06/15/2020] [Accepted: 06/27/2020] [Indexed: 11/29/2022]
Abstract
The current classification of hypertension does not reflect the heterogeneity in characteristics or cardiovascular outcomes of hypertensive individuals. Our objective was to identify distinct phenotypes of hypertensive individuals with potentially different cardiovascular risk profiles using data-driven cluster analysis. We performed clustering, a procedure that identifies groups with similar characteristics, in 3726 individuals (mean age 59.4 years, 49% women) with grade 2 hypertension (blood pressure ≥160/100 mmHg or antihypertensive medication) selected from FINRISK 1997, 2002, and 2007 cohorts. We computed clusters based on eight factors associated with hypertension: mean arterial pressure, pulse pressure, non-high-density lipoprotein cholesterol, blood glucose, BMI, C-reactive protein, estimated glomerular filtration rate, and alcohol. After that, we used Cox regression models adjusted for age and sex to assess the relative risk of cardiovascular disease (CVD) outcomes between the clusters and a reference group of 11 020 individuals. We observed two comparable clusters in both men and women. The Metabolically Challenged (MC) cluster was characterized by high blood glucose (Z-score 4.4 ± 1.1 vs 0.2 ± 0.8, men; 3.5 ± 1.1 vs 0.0 ± 0.6, women) and elevated BMI (30.4 ± 4.1 vs 28.9 ± 4.3, men; 32.7 ± 4.9 vs 29.3 ± 5.5, women). Over a 10-year follow-up (1034 CVD events), MC had 1.6-fold (95% CI 1.1-2.4) CVD risk compared to non-MC and 2.5-fold (95% CI 1.7-3.7) CVD risk compared to the reference group (P ≤ .009 for both). Using unsupervised hierarchical clustering, we found two phenotypically distinct hypertension subgroups with different risks of CVD complications. This substratification could be used to design studies that explore the differential effects of antihypertensive therapies among subgroups of hypertensive individuals.
Collapse
Affiliation(s)
| | | | | | - Risto Kaaja
- Department of MedicineUniversity of TurkuTurkuFinland
- Division of MedicineTurku University HospitalTurkuFinland
| | - Leo Lahti
- Department of Future TechnologiesUniversity of TurkuTurkuFinland
| | - Teemu J. Niiranen
- Department of MedicineUniversity of TurkuTurkuFinland
- Finnish Institute for Health and Welfare (THL)HelsinkiFinland
- Division of MedicineTurku University HospitalTurkuFinland
| |
Collapse
|
104
|
Kumar S, Suhaib M, Asjad M. Narrowing the barriers to Industry 4.0 practices through PCA-Fuzzy AHP-K means. JOURNAL OF ADVANCES IN MANAGEMENT RESEARCH 2020. [DOI: 10.1108/jamr-06-2020-0098] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe study aims to analyze the barriers in the adoption of Industry 4.0 (I4.0) practices in terms of prioritization, cluster formation and clustering of empirical responses, and then narrowing them with identification of the most influential barriers for further managerial implications in the adoption of I4.0 practices by developing an enhanced understanding of I4.0.Design/methodology/approachFor the survey-based empirical research, barriers to I.40 are synthesized from the review of relevant literature and further discussions with academician and industry persons. Three widely acclaimed statistical techniques, viz. principal component analysis (PCA), fuzzy analytical hierarchical process (fuzzy AHP) and K-means clustering are applied.FindingsThe novel integrated approach shows that lack of transparent cost-benefit analysis with clear comprehension about benefits is the major barrier for the adoption of I4.0, followed by “IT infrastructure,” “Missing standards,” “Lack of properly skilled manpower,” “Fitness of present machines/equipment in the new regime” and “Concern to data security” which are other prominent barriers in adoption of I4.0 practices. The availability of funds, transparent cost-benefit analysis and clear comprehension about benefits will motivate the business owners to adopt it, overcoming the other barriers.Research limitations/implicationsThe present study brings out the new fundamental insights from the barriers to I4.0. The new insights developed here will be helpful for managers and policymakers to understand the concept and barriers hindering its smooth implementation. The factors identified are the major thrust areas for a manager to focus on for the smooth implementation of I4.0 practices. The removal of these barriers will act as a booster in the way of implementing I4.0. Real-world testing of findings is not available yet, and this will be the new direction for further research.Practical implicationsThe new production paradigm is highly complex and evolving. The study will act as a handy tool for the implementing manager for what to push first and what to push later while implementing the I4.0 practices. It will also empower a manager to assess the implementation capabilities of the industry in advance.Originality/valuePCA, fuzzy AHP and K means are deployed for identifying the significant barriers to I4.0 first time. The paper is the result of the original conceptual work of integrating the three techniques in the domain of prioritizing and narrowing the barriers from 16 to 6.
Collapse
|
105
|
Hwang Y, Um JS, Schlüter S. Evaluating the Mutual Relationship between IPAT/Kaya Identity Index and ODIAC-Based GOSAT Fossil-Fuel CO 2 Flux: Potential and Constraints in Utilizing Decomposed Variables. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17165976. [PMID: 32824606 PMCID: PMC7459989 DOI: 10.3390/ijerph17165976] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/12/2020] [Accepted: 08/14/2020] [Indexed: 11/21/2022]
Abstract
The IPAT/Kaya identity is the most popular index used to analyze the driving forces of individual factors on CO2 emissions. It represents the CO2 emissions as a product of factors, such as the population, gross domestic product (GDP) per capita, energy intensity of the GDP, and carbon footprint of energy. In this study, we evaluated the mutual relationship of the factors of the IPAT/Kaya identity and their decomposed variables with the fossil-fuel CO2 flux, as measured by the Greenhouse Gases Observing Satellite (GOSAT). We built two regression models to explain this flux; one using the IPAT/Kaya identity factors as the explanatory variables and the other one using their decomposed factors. The factors of the IPAT/Kaya identity have less explanatory power than their decomposed variables and comparably low correlation with the fossil-fuel CO2 flux. However, the model using the decomposed variables shows significant multicollinearity. We performed a multivariate cluster analysis for further investigating the benefits of using the decomposed variables instead of the original factors. The results of the cluster analysis showed that except for the M factor, the IPAT/Kaya identity factors are inadequate for explaining the variations in the fossil-fuel CO2 flux, whereas the decomposed variables produce reasonable clusters that can help identify the relevant drivers of this flux.
Collapse
Affiliation(s)
- YoungSeok Hwang
- Department of Climate Change, Kyungpook National University, Daegu 41566, Korea;
| | - Jung-Sup Um
- Department of Geography, Kyungpook National University, Daegu 41566, Korea;
| | - Stephan Schlüter
- Department of Mathematics, Natural and Economic Sciences, Ulm University of Applied Sciences, 89075 Ulm, Germany
- Correspondence:
| |
Collapse
|
106
|
Mustafa HMJ, Ayob M, Albashish D, Abu-Taleb S. Solving text clustering problem using a memetic differential evolution algorithm. PLoS One 2020; 15:e0232816. [PMID: 32525869 PMCID: PMC7289410 DOI: 10.1371/journal.pone.0232816] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Accepted: 04/22/2020] [Indexed: 12/03/2022] Open
Abstract
The text clustering is considered as one of the most effective text document analysis methods, which is applied to cluster documents as a consequence of the expanded big data and online information. Based on the review of the related work of the text clustering algorithms, these algorithms achieved reasonable clustering results for some datasets, while they failed on a wide variety of benchmark datasets. Furthermore, the performance of these algorithms was not robust due to the inefficient balance between the exploitation and exploration capabilities of the clustering algorithm. Accordingly, this research proposes a Memetic Differential Evolution algorithm (MDETC) to solve the text clustering problem, which aims to address the effect of the hybridization between the differential evolution (DE) mutation strategy with the memetic algorithm (MA). This hybridization intends to enhance the quality of text clustering and improve the exploitation and exploration capabilities of the algorithm. Our experimental results based on six standard text clustering benchmark datasets (i.e. the Laboratory of Computational Intelligence (LABIC)) have shown that the MDETC algorithm outperformed other compared clustering algorithms based on AUC metric, F-measure, and the statistical analysis. Furthermore, the MDETC is compared with the state of art text clustering algorithms and obtained almost the best results for the standard benchmark datasets.
Collapse
Affiliation(s)
- Hossam M. J. Mustafa
- Data Mining and Optimization Research Group, Center of Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Malaysia
- * E-mail:
| | - Masri Ayob
- Data Mining and Optimization Research Group, Center of Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Malaysia
| | - Dheeb Albashish
- Computer Science Department, Prince Abdullah bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Salt, Jordan
| | - Sawsan Abu-Taleb
- Computer Science Department, Prince Abdullah bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Salt, Jordan
| |
Collapse
|
107
|
Gong Z, Cai T, Thill JC, Hale S, Graham M. Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election. PLoS One 2020; 15:e0233660. [PMID: 32442212 PMCID: PMC7244148 DOI: 10.1371/journal.pone.0233660] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 05/10/2020] [Indexed: 11/19/2022] Open
Abstract
Social media has become an emerging alternative to opinion polls for public opinion collection, while it is still posing many challenges as a passive data source, such as structurelessness, quantifiability, and representativeness. Social media data with geotags provide new opportunities to unveil the geographic locations of users expressing their opinions. This paper aims to answer two questions: 1) whether quantifiable measurement of public opinion can be obtained from social media and 2) whether it can produce better or complementary measures compared to opinion polls. This research proposes a novel approach to measure the relative opinion of Twitter users towards public issues in order to accommodate more complex opinion structures and take advantage of the geography pertaining to the public issues. To ensure that this new measure is technically feasible, a modeling framework is developed including building a training dataset by adopting a state-of-the-art approach and devising a new deep learning method called Opinion-Oriented Word Embedding. With a case study of tweets on the 2016 U.S. presidential election, we demonstrate the predictive superiority of our relative opinion approach and we show how it can aid visual analytics and support opinion predictions. Although the relative opinion measure is proved to be more robust than polling, our study also suggests that the former can advantageously complement the latter in opinion prediction.
Collapse
Affiliation(s)
- Zhaoya Gong
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
- * E-mail:
| | - Tengteng Cai
- Public Policy Program, University of North Carolina at Charlotte, Charlotte, NC, United States of America
| | - Jean-Claude Thill
- Public Policy Program, University of North Carolina at Charlotte, Charlotte, NC, United States of America
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC, United States of America
| | - Scott Hale
- Oxford Internet Institute, University of Oxford, Oxford, England, United Kingdom
| | - Mark Graham
- Oxford Internet Institute, University of Oxford, Oxford, England, United Kingdom
| |
Collapse
|
108
|
Bremer PL, De Boer D, Alvarado W, Martinez X, Sorin EJ. Overcoming the Heuristic Nature of k-Means Clustering: Identification and Characterization of Binding Modes from Simulations of Molecular Recognition Complexes. J Chem Inf Model 2020; 60:3081-3092. [PMID: 32383869 DOI: 10.1021/acs.jcim.9b01137] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The accurate and reproducible detection and description of thermodynamic states in computational data is a nontrivial problem, particularly when the number of states is unknown a priori and for large, flexible chemical systems and complexes. To this end, we report a novel clustering protocol that combines high-resolution structural representation, brute-force repeat clustering, and optimization of clustering statistics to reproducibly identify the number of clusters present in a data set (k) for simulated ensembles of butyrylcholinesterase in complex with two previously studied organophosphate inhibitors. Each structure within our simulated ensembles was depicted as a high-dimensionality vector with components defined by specific protein-inhibitor contacts at the chemical group level and the magnitudes of these components defined by their respective extents of pair-wise atomic contact, thus allowing for algorithmic differentiation between varying degrees of interaction. These surface-weighted interaction fingerprints were tabulated for each of over 1 million structures from more than 100 μs of all-atom molecular dynamics simulation per complex and used as the input for repetitive k-means clustering. Minimization of cluster population variance and range afforded accurate and reproducible identification of k, thereby allowing for the characterization of discrete binding modes from molecular simulation data in the form of contact tables that concisely encapsulate the observed intermolecular contact motifs. While the protocol presented herein to determine k and achieve non-heuristic clustering is demonstrated on data from massive atomistic simulation, our approach is generalizable to other data types and clustering algorithms, and is tractable with limited computational resources.
Collapse
|
109
|
Profiling of Chlorogenic Acids from Bidens pilosa and Differentiation of Closely Related Positional Isomers with the Aid of UHPLC-QTOF-MS/MS-Based In-Source Collision-Induced Dissociation. Metabolites 2020; 10:metabo10050178. [PMID: 32365739 PMCID: PMC7281500 DOI: 10.3390/metabo10050178] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/20/2020] [Accepted: 04/21/2020] [Indexed: 12/14/2022] Open
Abstract
Bidens pilosa is an edible herb from the Asteraceae family which is traditionally consumed as a leafy vegetable. B. pilosa has many bioactivities owing to its diverse phytochemicals, which include aliphatics, terpenoids, tannins, alkaloids, hydroxycinnamic acid (HCA) derivatives and other phenylpropanoids. The later include compounds such as chlorogenic acids (CGAs), which are produced as either regio- or geometrical isomers. To profile the CGA composition of B. pilosa, methanol extracts from tissues, callus and cell suspensions were utilized for liquid chromatography coupled to mass spectrometric detection (UHPLC-QTOF-MS/MS). An optimized in-source collision-induced dissociation (ISCID) method capable of discriminating between closely related HCA derivatives of quinic acids, based on MS-based fragmentation patterns, was applied. Careful control of collision energies resulted in fragment patterns similar to MS2 and MS3 fragmentation, obtainable by a typical ion trap MSn approach. For the first time, an ISCID approach was shown to efficiently discriminate between positional isomers of chlorogenic acids containing two different cinnamoyl moieties, such as a mixed di-ester of feruloyl-caffeoylquinic acid (m/z 529) and coumaroyl-caffeoylquinic acid (m/z 499). The results indicate that tissues and cell cultures of B. pilosa contained a combined total of 30 mono-, di-, and tri-substituted chlorogenic acids with positional isomers dominating the composition thereof. In addition, the tartaric acid esters, caftaric- and chicoric acids were also identified. Profiling revealed that these HCA derivatives were differentially distributed across tissues types and cell culture lines derived from leaf and stem explants.
Collapse
|
110
|
Nwadiugwu MC. Gene-Based Clustering Algorithms: Comparison Between Denclue, Fuzzy-C, and BIRCH. Bioinform Biol Insights 2020; 14:1177932220909851. [PMID: 32284672 PMCID: PMC7133071 DOI: 10.1177/1177932220909851] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 02/02/2020] [Indexed: 11/17/2022] Open
Abstract
The current study seeks to compare 3 clustering algorithms that can be used in gene-based bioinformatics research to understand disease networks, protein-protein interaction networks, and gene expression data. Denclue, Fuzzy-C, and Balanced Iterative and Clustering using Hierarchies (BIRCH) were the 3 gene-based clustering algorithms selected. These algorithms were explored in relation to the subfield of bioinformatics that analyzes omics data, which include but are not limited to genomics, proteomics, metagenomics, transcriptomics, and metabolomics data. The objective was to compare the efficacy of the 3 algorithms and determine their strength and drawbacks. Result of the review showed that unlike Denclue and Fuzzy-C which are more efficient in handling noisy data, BIRCH can handle data set with outliers and have a better time complexity.
Collapse
Affiliation(s)
- Martin C Nwadiugwu
- Department of Biomedical Informatics, University of Nebraska Omaha, Omaha, NE, USA
| |
Collapse
|
111
|
Licen S, Di Gilio A, Palmisani J, Petraccone S, de Gennaro G, Barbieri P. Pattern Recognition and Anomaly Detection by Self-Organizing Maps in a Multi Month E-nose Survey at an Industrial Site. SENSORS 2020; 20:s20071887. [PMID: 32235302 PMCID: PMC7180849 DOI: 10.3390/s20071887] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Revised: 03/21/2020] [Accepted: 03/23/2020] [Indexed: 11/29/2022]
Abstract
Currently people are aware of the risk related to pollution exposure. Thus odor annoyances are considered a warning about the possible presence of toxic volatile compounds. Malodor often generates immediate alarm among citizens, and electronic noses are convenient instruments to detect mixture of odorant compounds with high monitoring frequency. In this paper we present a study on pattern recognition on ambient air composition in proximity of a gas and oil pretreatment plant by elaboration of data from an electronic nose implementing 10 metal-oxide-semiconductor (MOS) sensors and positioned outdoor continuously during three months. A total of 80,017 e-nose vectors have been elaborated applying the self-organizing map (SOM) algorithm and then k-means clustering on SOM outputs on the whole data set evidencing an anomalous data cluster. Retaining data characterized by dynamic responses of the multisensory system, a SOM with 264 recurrent sensor responses to air mixture sampled at the site and four main air type profiles (clusters) have been identified. One of this sensor profiles has been related to the odor fugitive emissions of the plant, by using ancillary data from a total volatile organic compound (VOC) detector and wind speed and direction data. The overall and daily cluster frequencies have been evaluated, allowing us to identify the daily duration of presence at the monitoring site of air related to industrial emissions. The refined model allowed us to confirm the anomaly detection of the sensor responses.
Collapse
Affiliation(s)
- Sabina Licen
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, Via L. Giorgieri 1, 34127 Trieste, Italy;
| | - Alessia Di Gilio
- Department of Biology, University of Bari “Aldo Moro”, Via Orabona 4, 70126 Bari, Italy; (J.P.); (S.P.); (G.d.G.)
- Correspondence: (A.D.G.); (P.B.)
| | - Jolanda Palmisani
- Department of Biology, University of Bari “Aldo Moro”, Via Orabona 4, 70126 Bari, Italy; (J.P.); (S.P.); (G.d.G.)
| | - Stefania Petraccone
- Department of Biology, University of Bari “Aldo Moro”, Via Orabona 4, 70126 Bari, Italy; (J.P.); (S.P.); (G.d.G.)
| | - Gianluigi de Gennaro
- Department of Biology, University of Bari “Aldo Moro”, Via Orabona 4, 70126 Bari, Italy; (J.P.); (S.P.); (G.d.G.)
| | - Pierluigi Barbieri
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, Via L. Giorgieri 1, 34127 Trieste, Italy;
- Correspondence: (A.D.G.); (P.B.)
| |
Collapse
|
112
|
Brito ACM, Silva FN, Amancio DR. A complex network approach to political analysis: Application to the Brazilian Chamber of Deputies. PLoS One 2020; 15:e0229928. [PMID: 32191720 PMCID: PMC7081992 DOI: 10.1371/journal.pone.0229928] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 02/17/2020] [Indexed: 11/29/2022] Open
Abstract
In this paper, we introduce a network-based methodology to study how political entities evolve over time. We constructed networks of voting data from the Brazilian Chamber of Deputies, where deputies are nodes and edges are represented by voting similarity among deputies. The Brazilian Chamber of deputies is characterized by a multi-party political system. Thus, we would expect a broad spectrum of ideas to be represented. Our results, however, revealed that plurality of ideas is not present at all: the effective number of communities representing ideas based on agreement/disagreement in propositions is about 3 over the entire studied time span. The obtained results also revealed different patterns of coalitions between distinct parties. Finally, we also found signs of early party isolation before presidential impeachment proceedings effectively started. We believe that the proposed framework could be used to complement the study of political dynamics and even applied in similar social networks where individuals are organized in a complex manner.
Collapse
Affiliation(s)
| | - Filipi Nascimento Silva
- São Carlos Institute of Physics, University of São Paulo, São Carlos, SP, Brazil.,Indiana University Network Science Institute, Bloomington, Indiana, United States of America
| | - Diego Raphael Amancio
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, SP, Brazil
| |
Collapse
|
113
|
Personalized prediction of smartphone-based psychotherapeutic micro-intervention success using machine learning. J Affect Disord 2020; 264:430-437. [PMID: 31787419 DOI: 10.1016/j.jad.2019.11.071] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 09/18/2019] [Accepted: 11/12/2019] [Indexed: 12/29/2022]
Abstract
BACKGROUND Tailoring healthcare to patients' individual needs is a central goal of precision medicine. Combining smartphone-based interventions with machine learning approaches may help attaining this goal. The aim of our study was to explore the predictability of the success of smartphone-based psychotherapeutic micro-interventions in eliciting mood changes using machine learning. METHODS Participants conducted daily smartphone-based psychotherapeutic micro-interventions, guided by short video clips, for 13 consecutive days. Participants chose one of four intervention techniques used in psychotherapeutic approaches. Mood changes were assessed using the Multidimensional Mood State Questionnaire. Micro-intervention success was predicted using random forest (RF) tree-based mixed-effects logistic regression models. Data from 27 participants were used, totaling 324 micro-interventions, randomly split 100 times into training and test samples, using within-subject and between-subject sampling. RESULTS Mood improved from pre- to post-intervention in 137 sessions (initial success-rate: 42.3%). The RF approach resulted in predictions of micro-intervention success significantly better than the initial success-rate within and between subjects (positive predictive value: 0.732 (95%-CI: 0.607; 0.820) and 0.698 (95%-CI: 0.564; 0.805), respectively). Prediction quality was highest using the RF approach within subjects (rand accuracy: 0.75 (95%-CI: 0.641; 0.840), Matthew's correlation coefficient: 0.483 (95%-CI: 0.323; 0.723)). LIMITATIONS The RF approach does not allow firm conclusions about the exact contribution of each factor to the algorithm's predictions. We included a limited number of predictors and did not compare whether predictability differed between psychotherapeutic techniques. CONCLUSIONS Our findings may pave the way for translation and encourage scrutinizing personalized prediction in the psychotherapeutic context to improve treatment efficacy.
Collapse
|
114
|
Kotiang S, Eslami A. A probabilistic graphical model for system-wide analysis of gene regulatory networks. Bioinformatics 2020; 36:3192-3199. [DOI: 10.1093/bioinformatics/btaa122] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 01/15/2020] [Accepted: 02/18/2020] [Indexed: 01/28/2023] Open
Abstract
Abstract
Motivation
The inference of gene regulatory networks (GRNs) from DNA microarray measurements forms a core element of systems biology-based phenotyping. In the recent past, numerous computational methodologies have been formalized to enable the deduction of reliable and testable predictions in today’s biology. However, little focus has been aimed at quantifying how well existing state-of-the-art GRNs correspond to measured gene-expression profiles.
Results
Here, we present a computational framework that combines the formulation of probabilistic graphical modeling, standard statistical estimation, and integration of high-throughput biological data to explore the global behavior of biological systems and the global consistency between experimentally verified GRNs and corresponding large microarray compendium data. The model is represented as a probabilistic bipartite graph, which can handle highly complex network systems and accommodates partial measurements of diverse biological entities, e.g. messengerRNAs, proteins, metabolites and various stimulators participating in regulatory networks. This method was tested on microarray expression data from the M3D database, corresponding to sub-networks on one of the best researched model organisms, Escherichia coli. Results show a surprisingly high correlation between the observed states and the inferred system’s behavior under various experimental conditions.
Availability and implementation
Processed data and software implementation using Matlab are freely available at https://github.com/kotiang54/PgmGRNs. Full dataset available from the M3D database.
Collapse
Affiliation(s)
- Stephen Kotiang
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA
| | - Ali Eslami
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260, USA
| |
Collapse
|
115
|
Weißer T, Saßmannshausen T, Ohrndorf D, Burggräf P, Wagner J. A clustering approach for topic filtering within systematic literature reviews. MethodsX 2020; 7:100831. [PMID: 32195145 PMCID: PMC7078380 DOI: 10.1016/j.mex.2020.100831] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 02/12/2020] [Indexed: 11/23/2022] Open
Abstract
Within a systematic literature review (SLR), researchers are confronted with vast amounts of articles from scientific databases, which have to be manually evaluated regarding their relevance for a certain field of observation. The evaluation and filtering phase of prevalent SLR methodologies is therefore time consuming and hardly expressible to the intended audience. The proposed method applies natural language processing (NLP) on article meta data and a k-means clustering algorithm to automatically convert large article corpora into a distribution of focal topics. This allows efficient filtering as well as objectifying the process through the discussion of the clustering results. Beyond that, it allows to quickly identify scientific communities and therefore provides an iterative perspective for the so far linear SLR methodology.NLP and k-means clustering to filter large article corpora during systematic literature reviews. Automated clustering allows filtering very efficiently as well as effectively compared to manual selection. Presentation and discussion of the clustering results helps to objectify the nontransparent filtering step in systematic literature reviews.
Collapse
Affiliation(s)
- Tim Weißer
- Chair for International Production Engineering and Management, University of Siegen
| | - Till Saßmannshausen
- Chair for International Production Engineering and Management, University of Siegen
| | - Dennis Ohrndorf
- Chair for International Production Engineering and Management, University of Siegen
| | - Peter Burggräf
- Chair for International Production Engineering and Management, University of Siegen
| | - Johannes Wagner
- Chair for International Production Engineering and Management, University of Siegen
| |
Collapse
|
116
|
Rich-Griffin C, Stechemesser A, Finch J, Lucas E, Ott S, Schäfer P. Single-Cell Transcriptomics: A High-Resolution Avenue for Plant Functional Genomics. TRENDS IN PLANT SCIENCE 2020; 25:186-197. [PMID: 31780334 DOI: 10.1016/j.tplants.2019.10.008] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 09/30/2019] [Accepted: 10/17/2019] [Indexed: 05/19/2023]
Abstract
Plant function is the result of the concerted action of single cells in different tissues. Advances in RNA-seq technologies and tissue processing allow us now to capture transcriptional changes at single-cell resolution. The incredible potential of single-cell RNA-seq lies in the novel ability to study and exploit regulatory processes in complex tissues based on the behaviour of single cells. Importantly, the independence from reporter lines allows the analysis of any given tissue in any plant. While there are challenges associated with the handling and analysis of complex datasets, the opportunities are unique to generate knowledge of tissue functions in unprecedented detail and to facilitate the application of such information by mapping cellular functions and interactions in a plant cell atlas.
Collapse
Affiliation(s)
| | - Annika Stechemesser
- Warwick Mathematics Institute, The University of Warwick, Coventry CV4 7AL, UK
| | - Jessica Finch
- School of Life Sciences, The University of Warwick, Coventry CV4 7AL, UK
| | - Emma Lucas
- Warwick Medical School, The University of Warwick, Coventry CV4 7AL, UK
| | - Sascha Ott
- Department of Computer Science, The University of Warwick, Coventry CV4 7AL, UK.
| | - Patrick Schäfer
- School of Life Sciences, The University of Warwick, Coventry CV4 7AL, UK; Warwick Integrative Synthetic Biology Centre, The University of Warwick, Coventry CV4 7AL, UK.
| |
Collapse
|
117
|
Kabir KL, Akhter N, Shehu A. From molecular energy landscapes to equilibrium dynamics via landscape analysis and markov state models. J Bioinform Comput Biol 2020; 17:1940014. [DOI: 10.1142/s0219720019400146] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Molecular dynamics (MD) simulation software allows probing the equilibrium structural dynamics of a molecule of interest, revealing how a molecule navigates its structure space one structure at a time. To obtain a broader view of dynamics, typically one needs to launch many such simulations, obtaining many trajectories. A summarization of the equilibrium dynamics requires integrating the information in the various trajectories, and Markov State Models (MSM) are increasingly being used for this task. At its core, the task involves organizing the structures accessed in simulation into structural states, and then constructing a transition probability matrix revealing the transitions between states. While now considered a mature technology and widely used to summarize equilibrium dynamics, the underlying computational process in the construction of an MSM ignores energetics even though the transition of a molecule between two nearby structures in an MD trajectory is governed by the corresponding energies. In this paper, we connect theory with simulation and analysis of equilibrium dynamics. A molecule navigates the energy landscape underlying the structure space. The structural states that are identified via off-the-shelf clustering algorithms need to be connected to thermodynamically-stable and semi-stable (macro)states among which transitions can then be quantified. Leveraging recent developments in the analysis of energy landscapes that identify basins in the landscape, we evaluate the hypothesis that basins, directly tied to stable and semi-stable states, lead to better models of dynamics. Our analysis indicates that basins lead to MSMs of better quality and thus can be useful to further advance this widely-used technology for summarization of molecular equilibrium dynamics.
Collapse
Affiliation(s)
- Kazi Lutful Kabir
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA
| | - Nasrin Akhter
- Department of Computer Science, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA
| | - Amarda Shehu
- Department of Computer Science, Department of Bioengineering, School of Systems Biology, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA
| |
Collapse
|
118
|
Defne Z, Aretxabaleta AL, Ganju NK, Kalra TS, Jones DK, Smith KEL. A geospatially resolved wetland vulnerability index: Synthesis of physical drivers. PLoS One 2020; 15:e0228504. [PMID: 31999806 PMCID: PMC6992177 DOI: 10.1371/journal.pone.0228504] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 01/16/2020] [Indexed: 11/18/2022] Open
Abstract
Assessing wetland vulnerability to chronic and episodic physical drivers is fundamental for establishing restoration priorities. We synthesized multiple data sets from E.B. Forsythe National Wildlife Refuge, New Jersey, to establish a wetland vulnerability metric that integrates a range of physical processes, anthropogenic impact and physical/biophysical features. The geospatial data are based on aerial imagery, remote sensing, regulatory information, and hydrodynamic modeling; and include elevation, tidal range, unvegetated to vegetated marsh ratio (UVVR), shoreline erosion, potential exposure to contaminants, residence time, marsh condition change, change in salinity, salinity exposure and sediment concentration. First, we delineated the wetland complex into individual marsh units based on surface contours, and then defined a wetland vulnerability index that combined contributions from all parameters. We applied principal component and cluster analyses to explore the interrelations between the data layers, and separate regions that exhibited common characteristics. Our analysis shows that the spatial variation of vulnerability in this domain cannot be explained satisfactorily by a smaller subset of the variables. The most influential factor on the vulnerability index was the combined effect of elevation, tide range, residence time, and UVVR. Tide range and residence time had the highest correlation, and similar bay-wide spatial variation. Some variables (e.g., shoreline erosion) had no significant correlation with the rest of the variables. The aggregated index based on the complete dataset allows us to assess the overall state of a given marsh unit and quickly locate the most vulnerable units in a larger marsh complex. The application of geospatially complete datasets and consideration of chronic and episodic physical drivers represents an advance over traditional point-based methods for wetland assessment.
Collapse
Affiliation(s)
- Zafer Defne
- Woods Hole Coastal and Marine Science Center, U.S. Geological Survey, Woods Hole, MA, United States of America
- * E-mail:
| | - Alfredo L. Aretxabaleta
- Woods Hole Coastal and Marine Science Center, U.S. Geological Survey, Woods Hole, MA, United States of America
| | - Neil K. Ganju
- Woods Hole Coastal and Marine Science Center, U.S. Geological Survey, Woods Hole, MA, United States of America
| | - Tarandeep S. Kalra
- Integrated Statistics, U.S. Geological Survey, Woods Hole, MA, United States of America
| | - Daniel K. Jones
- Utah Water Science Center, U.S. Geological Survey, Salt Lake City, UT, United States of America
| | - Kathryn E. L. Smith
- St. Petersburg Coastal and Marine Science Center, U.S. Geological Survey, St. Petersburg, FL, United States of America
| |
Collapse
|
119
|
Liu J, Zhao M, Kong W. Sub-Graph Regularization on Kernel Regression for Robust Semi-Supervised Dimensionality Reduction. ENTROPY 2019; 21:1125. [PMCID: PMC7514469 DOI: 10.3390/e21111125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 11/07/2019] [Indexed: 06/17/2023]
Abstract
Dimensionality reduction has always been a major problem for handling huge dimensionality datasets. Due to the utilization of labeled data, supervised dimensionality reduction methods such as Linear Discriminant Analysis tend achieve better classification performance compared with unsupervised methods. However, supervised methods need sufficient labeled data in order to achieve satisfying results. Therefore, semi-supervised learning (SSL) methods can be a practical selection rather than utilizing labeled data. In this paper, we develop a novel SSL method by extending anchor graph regularization (AGR) for dimensionality reduction. In detail, the AGR is an accelerating semi-supervised learning method to propagate the class labels to unlabeled data. However, it cannot handle new incoming samples. We thereby improve AGR by adding kernel regression on the basic objective function of AGR. Therefore, the proposed method can not only estimate the class labels of unlabeled data but also achieve dimensionality reduction. Extensive simulations on several benchmark datasets are conducted, and the simulation results verify the effectiveness for the proposed work.
Collapse
Affiliation(s)
- Jiao Liu
- School of Management Studies, Shanghai University of Engineering Science, Shanghai 201600, China;
| | - Mingbo Zhao
- School of Information Science and Technology, Donghua University, Shanghai 201620, China
| | - Weijian Kong
- School of Information Science and Technology, Donghua University, Shanghai 201620, China
| |
Collapse
|
120
|
Permutation Entropy: Enhancing Discriminating Power by Using Relative Frequencies Vector of Ordinal Patterns Instead of Their Shannon Entropy. ENTROPY 2019. [PMCID: PMC7514234 DOI: 10.3390/e21101013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Many measures to quantify the nonlinear dynamics of a time series are based on estimating the probability of certain features from their relative frequencies. Once a normalised histogram of events is computed, a single result is usually derived. This process can be broadly viewed as a nonlinear IRn mapping into IR, where n is the number of bins in the histogram. However, this mapping might entail a loss of information that could be critical for time series classification purposes. In this respect, the present study assessed such impact using permutation entropy (PE) and a diverse set of time series. We first devised a method of generating synthetic sequences of ordinal patterns using hidden Markov models. This way, it was possible to control the histogram distribution and quantify its influence on classification results. Next, real body temperature records are also used to illustrate the same phenomenon. The experiments results confirmed the improved classification accuracy achieved using raw histogram data instead of the PE final values. Thus, this study can provide a very valuable guidance for the improvement of the discriminating capability not only of PE, but of many similar histogram-based measures.
Collapse
|
121
|
Classifying fishing behavioral diversity using high-frequency movement data. Proc Natl Acad Sci U S A 2019; 116:16811-16816. [PMID: 31399551 DOI: 10.1073/pnas.1906766116] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Effective management of social-ecological systems (SESs) requires an understanding of human behavior. In many SESs, there are hundreds of agents or more interacting with governance and regulatory institutions, driving management outcomes through collective behavior. Agents in these systems often display consistent behavioral characteristics over time that can help reduce the dimensionality of SES data by enabling the assignment of types. Typologies of resource-user behavior both enrich our knowledge of user cultures and provide critical information for management. Here, we develop a data-driven framework to identify resource-user typologies in SESs with high-dimensional data. To demonstrate policy applications, we apply the framework to a tightly coupled SES, commercial fishing. We leverage large fisheries-dependent datasets that include mandatory vessel logbooks, observer datasets, and high-resolution geospatial vessel tracking technologies. We first quantify vessel and behavioral characteristics using data that encode fishers' spatial decisions and behaviors. We then use clustering to classify these characteristics into discrete fishing behavioral types (FBTs), determining that 3 types emerge in our case study. Finally, we investigate how a series of disturbances applied selection pressure on these FBTs, causing the disproportionate loss of one group. Our framework not only provides an efficient and unbiased method for identifying FBTs in near real time, but it can also improve management outcomes by enabling ex ante investigation of the consequences of disturbances such as policy actions.
Collapse
|
122
|
Mustafa HMJ, Ayob M, Nazri MZA, Kendall G. An improved adaptive memetic differential evolution optimization algorithms for data clustering problems. PLoS One 2019; 14:e0216906. [PMID: 31137034 PMCID: PMC6538400 DOI: 10.1371/journal.pone.0216906] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 04/30/2019] [Indexed: 11/23/2022] Open
Abstract
The performance of data clustering algorithms is mainly dependent on their ability to balance between the exploration and exploitation of the search process. Although some data clustering algorithms have achieved reasonable quality solutions for some datasets, their performance across real-life datasets could be improved. This paper proposes an adaptive memetic differential evolution optimisation algorithm (AMADE) for addressing data clustering problems. The memetic algorithm (MA) employs an adaptive differential evolution (DE) mutation strategy, which can offer superior mutation performance across many combinatorial and continuous problem domains. By hybridising an adaptive DE mutation operator with the MA, we propose that it can lead to faster convergence and better balance the exploration and exploitation of the search. We would also expect that the performance of AMADE to be better than MA and DE if executed separately. Our experimental results, based on several real-life benchmark datasets, shows that AMADE outperformed other compared clustering algorithms when compared using statistical analysis. We conclude that the hybridisation of MA and the adaptive DE is a suitable approach for addressing data clustering problems and can improve the balance between global exploration and local exploitation of the optimisation algorithm.
Collapse
Affiliation(s)
- Hossam M. J. Mustafa
- Data Mining and Optimization Research Group, Center of Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Malaysia
- * E-mail:
| | - Masri Ayob
- Data Mining and Optimization Research Group, Center of Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Malaysia
| | - Mohd Zakree Ahmad Nazri
- Data Mining and Optimization Research Group, Center of Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Malaysia
| | - Graham Kendall
- ASAP Research Group, University of Nottingham Malaysia, Malaysia
| |
Collapse
|