1
|
Manjunatha KKH, Baron G, Benozzo D, Silvestri E, Corbetta M, Chiuso A, Bertoldo A, Suweis S, Allegra M. Controlling target brain regions by optimal selection of input nodes. PLoS Comput Biol 2024; 20:e1011274. [PMID: 38215166 PMCID: PMC10810536 DOI: 10.1371/journal.pcbi.1011274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 01/25/2024] [Accepted: 12/04/2023] [Indexed: 01/14/2024] Open
Abstract
The network control theory framework holds great potential to inform neurostimulation experiments aimed at inducing desired activity states in the brain. However, the current applicability of the framework is limited by inappropriate modeling of brain dynamics, and an overly ambitious focus on whole-brain activity control. In this work, we leverage recent progress in linear modeling of brain dynamics (effective connectivity) and we exploit the concept of target controllability to focus on the control of a single region or a small subnetwork of nodes. We discuss when control may be possible with a reasonably low energy cost and few stimulation loci, and give general predictions on where to stimulate depending on the subset of regions one wishes to control. Importantly, using the robustly asymmetric effective connectome instead of the symmetric structural connectome (as in previous research), we highlight the fundamentally different roles in- and out-hubs have in the control problem, and the relevance of inhibitory connections. The large degree of inter-individual variation in the effective connectome implies that the control problem is best formulated at the individual level, but we discuss to what extent group results may still prove useful.
Collapse
Affiliation(s)
- Karan Kabbur Hanumanthappa Manjunatha
- Physics and Astronomy Department “Galileo Galilei”, University of Padova, Padova, Italy
- Modeling and Engineering Risk and Complexity, Scuola Superiore Meridionale, Napoli, Italy
| | - Giorgia Baron
- Information Engineering Department, University of Padova, Padova, Italy
| | - Danilo Benozzo
- Information Engineering Department, University of Padova, Padova, Italy
| | - Erica Silvestri
- Information Engineering Department, University of Padova, Padova, Italy
| | - Maurizio Corbetta
- Neuroscience Department, University of Padova, Padova, Italy
- Venetian Institute of Molecular Medicine (VIMM), Padova, Italy
- Padova Neuroscience Center, University of Padova, Padova, Italy
| | - Alessandro Chiuso
- Information Engineering Department, University of Padova, Padova, Italy
| | - Alessandra Bertoldo
- Information Engineering Department, University of Padova, Padova, Italy
- Padova Neuroscience Center, University of Padova, Padova, Italy
| | - Samir Suweis
- Physics and Astronomy Department “Galileo Galilei”, University of Padova, Padova, Italy
- Padova Neuroscience Center, University of Padova, Padova, Italy
| | - Michele Allegra
- Physics and Astronomy Department “Galileo Galilei”, University of Padova, Padova, Italy
- Padova Neuroscience Center, University of Padova, Padova, Italy
| |
Collapse
|
2
|
Li W, Wang T, Ng WWY. Population-Based Hyperparameter Tuning With Multitask Collaboration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5719-5731. [PMID: 34878983 DOI: 10.1109/tnnls.2021.3130896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Population-based optimization methods are widely used for hyperparameter (HP) tuning for a given specific task. In this work, we propose the population-based hyperparameter tuning with multitask collaboration (PHTMC), which is a general multitask collaborative framework with parallel and sequential phases for population-based HP tuning methods. In the parallel HP tuning phase, a shared population for all tasks is kept and the intertask relatedness is considered to both yield a better generalization ability and avoid data bias to a single task. In the sequential HP tuning phase, a surrogate model is built for each new-added task so that the metainformation from the existing tasks can be extracted and used to help the initialization for the new task. Experimental results show significant improvements in generalization abilities yielded by neural networks trained using the PHTMC and better performances achieved by multitask metalearning. Moreover, a visualization of the solution distribution and the autoencoder's reconstruction of both the PHTMC and a single-task population-based HP tuning method is compared to analyze the property with the multitask collaboration.
Collapse
|
3
|
Harris CS, Miaskowski CA, Conley YP, Hammer MJ, Dunn LB, Dhruva AA, Levine JD, Olshen AB, Kober KM. Epigenetic Regulation of Inflammatory Mechanisms and a Psychological Symptom Cluster in Patients Receiving Chemotherapy. Nurs Res 2023; 72:200-210. [PMID: 36929768 PMCID: PMC10121746 DOI: 10.1097/nnr.0000000000000643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
BACKGROUND A psychological symptom cluster is the most common cluster identified in oncology patients. Although inflammatory mechanisms are hypothesized to underlie this cluster, epigenetic contributions are unknown. OBJECTIVES This study's purpose was to evaluate associations between the occurrence of a psychological symptom cluster and levels of DNA methylation for inflammatory genes in a heterogeneous sample of patients with cancer receiving chemotherapy. METHODS Prior to their second or third cycle of chemotherapy, 1,071 patients reported the occurrence of 38 symptoms using the Memorial Symptom Assessment Scale. A psychological cluster was identified using exploratory factor analysis. Differential methylation analyses were performed in two independent samples using Illumina Infinium 450K and EPIC microarrays. Expression-associated CpG (eCpG) loci in the promoter region of 114 inflammatory genes on the 450K and 112 genes on the EPIC microarray were evaluated for associations with the psychological cluster. Robust rank aggregation was used to identify differentially methylated genes across both samples. Significance was assessed using a false discovery rate of 0.05 under the Benjamini-Hochberg procedure. RESULTS Cluster of differentiation 40 ( CD40 ) was differentially methylated across both samples. All six promoter eCpGs for CD40 that were identified across both samples were hypomethylated in the psychological cluster group. CONCLUSIONS This study is the first to suggest associations between a psychological symptom cluster and differential DNA methylation of a gene involved in tissue inflammation and cell-mediated immunity. Our findings suggest that increased CD40 expression through hypomethylation of promoter eCpG loci is involved in the occurrence of a psychological symptom cluster in patients receiving chemotherapy. These findings suggest a direction for mechanistic studies.
Collapse
|
4
|
Jiang X, Luo D, Fern Ndez E, Yang J, Li H, Jin KW, Zhan Y, Yao B, Bedi S, Xiao G, Zhan X, Li Q, Xie Y. Spatial Transcriptomics Arena (STAr): an Integrated Platform for Spatial Transcriptomics Methodology Research. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.10.532127. [PMID: 36945650 PMCID: PMC10028992 DOI: 10.1101/2023.03.10.532127] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
Abstract
The emerging field of spatially resolved transcriptomics (SRT) has revolutionized biomedical research. SRT quantifies expression levels at different spatial locations, providing a new and powerful tool to interrogate novel biological insights. An essential question in the analysis of SRT data is to identify spatially variable (SV) genes; the expression levels of such genes have spatial variation across different tissues. SV genes usually play an important role in underlying biological mechanisms and tissue heterogeneity. Currently, several computational methods have been developed to detect such genes; however, there is a lack of unbiased assessment of these approaches to guide researchers in selecting the appropriate methods for their specific biomedical applications. In addition, it is difficult for researchers to implement different existing methods for either biological study or methodology development. Furthermore, currently available public SRT datasets are scattered across different websites and preprocessed in different ways, posing additional obstacles for quantitative researchers developing computational methods for SRT data analysis. To address these challenges, we designed Spatial Transcriptomics Arena (STAr), an open platform comprising 193 curated datasets from seven technologies, seven statistical methods, and analysis results. This resource allows users to retrieve high-quality datasets, apply or develop spatial gene detection methods, as well as browse and compare spatial gene analysis results. It also enables researchers to comprehensively evaluate SRT methodology research in both simulated and real datasets. Altogether, STAr is an integrated research resource intended to promote reproducible research and accelerate rigorous methodology development, which can eventually lead to an improved understanding of biological processes and diseases. STAr can be accessed at https://lce.biohpc.swmed.edu/star/ .
Collapse
|
5
|
Bałchanowski M, Boryczka U. A Comparative Study of Rank Aggregation Methods in Recommendation Systems. ENTROPY (BASEL, SWITZERLAND) 2023; 25:132. [PMID: 36673273 PMCID: PMC9857885 DOI: 10.3390/e25010132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 06/17/2023]
Abstract
The aim of a recommender system is to suggest to the user certain products or services that most likely will interest them. Within the context of personalized recommender systems, a number of algorithms have been suggested to generate a ranking of items tailored to individual user preferences. However, these algorithms do not generate identical recommendations, and for this reason it has been suggested in the literature that the results of these algorithms can be combined using aggregation techniques, hoping that this will translate into an improvement in the quality of the final recommendation. In order to see which of these techniques increase the quality of recommendations to the greatest extent, the authors of this publication conducted experiments in which they considered five recommendation algorithms and 20 aggregation methods. The research was carried out on the popular and publicly available MovieLens 100k and MovieLens 1M datasets, and the results were confirmed by statistical tests.
Collapse
|
6
|
Aghayerashti M, Bahrami Samani E, Ganjali M. Bayesian joint modeling of binomial and rank response with non-ignorable missing data for primate cognition. COMMUN STAT-THEOR M 2023. [DOI: 10.1080/03610926.2022.2163367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Maryam Aghayerashti
- Department of Statistics, Faculty of Mathematical Science, Shahid Beheshti University, Tehran, Iran
| | - Ehsan Bahrami Samani
- Department of Statistics, Faculty of Mathematical Science, Shahid Beheshti University, Tehran, Iran
| | - Mojtaba Ganjali
- Department of Statistics, Faculty of Mathematical Science, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
7
|
Influence of context on users’ views about explanations for decision-tree predictions. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2023.101483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
8
|
Wang B, Law A, Regan T, Parkinson N, Cole J, Russell CD, Dockrell DH, Gutmann MU, Baillie JK. Systematic comparison of ranking aggregation methods for gene lists in experimental results. Bioinformatics 2022; 38:4927-4933. [PMID: 36094347 PMCID: PMC9620830 DOI: 10.1093/bioinformatics/btac621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 06/24/2022] [Accepted: 09/09/2022] [Indexed: 11/17/2022] Open
Abstract
MOTIVATION A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect key features, including heterogeneity of quality, relevance and the inclusion of unranked lists. RESULTS In this study, a group of existing methods and their variations that are suitable for meta-analysis of gene lists are compared using simulated and real data. Simulated data were used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic data, with various heterogeneity of quality, noise level and a mix of unranked and ranked data using 20 000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (non-small cell lung cancer) and bacteria (macrophage apoptosis) was performed. We summarize the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content algorithm to infer heterogeneity of data quality across input datasets. AVAILABILITY AND IMPLEMENTATION The code for simulated data generation and running edited version of algorithms: https://github.com/baillielab/comparison_of_RA_methods. Code to perform an optimal selection of methods based on the results of this review, using the MAIC algorithm to infer the characteristics of an input dataset, can be downloaded here: https://github.com/baillielab/maic. An online service for running MAIC: https://baillielab.net/maic. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bo Wang
- Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Andy Law
- Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK
| | - Tim Regan
- Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, UK
| | | | - Joby Cole
- University of Sheffield, Sheffield S10 2NT, UK
| | - Clark D Russell
- Centre for Inflammation Research, The Queen’s Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK
| | - David H Dockrell
- Centre for Inflammation Research, The Queen’s Medical Research Institute, University of Edinburgh, Edinburgh EH16 4TJ, UK
| | - Michael U Gutmann
- School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK
| | | |
Collapse
|
9
|
Newman MEJ. Ranking with multiple types of pairwise comparisons. Proc Math Phys Eng Sci 2022. [DOI: 10.1098/rspa.2022.0517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The task of ranking individuals or teams, based on a set of comparisons between pairs, arises in various contexts, including sporting competitions and the analysis of dominance hierarchies among animals and humans. Given data on which competitors beat which others, the challenge is to rank the competitors from best to worst. Here we study the problem of computing rankings when there are multiple, potentially conflicting types of comparison, such as multiple types of dominance behaviours among animals. We assume that we do not know
a priori
what information each behaviour conveys about the ranking, or even whether they convey any information at all. Nonetheless, we show that it is possible to compute a ranking in this situation and present a fast method for doing so, based on a combination of an expectation–maximization algorithm and a modified Bradley–Terry model. We give a selection of example applications to both animal and human competition.
Collapse
Affiliation(s)
- Mark E. J. Newman
- Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
10
|
Abstract
Low accuracy in the estimation of construction costs at early stages of projects has driven the research on alternative costing methods that take advantage of computing advances, however, direct implications in their use for practice is not clear. The purpose of this study was to investigate how predictive analytics could enhance cost estimation of buildings at early stages by performing a systematic literature review on predictive analytics implementations for the early-stage cost estimation of building projects. The outputs of the study are: (1) an extensive database; (2) a list of cost drivers; and (3) a comparison between the various techniques. The findings suggest that predictive analytic techniques are appropriate for practice due to their higher level of accuracy. The discussion has three main implications: (a) predictive analytics for cost estimation have not followed the best practices and standard methodologies; (b) predictive analytics techniques are ready for industry adoption; and (c) the study can be a reference for high-level decision-makers to implement predictive analytics in cost estimation. Knowledge of predictive analytics could assist stakeholders in playing a key role in improving the accuracy of cost forecast in the construction market, thus, enabling pro-active management of the project owner’s budget.
Collapse
|
11
|
Review: Preference elicitation methods for appropriate breeding objectives. Animal 2022; 16:100535. [PMID: 35588584 DOI: 10.1016/j.animal.2022.100535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022] Open
Abstract
Breeding objectives of livestock and other agricultural species are usually profit maximising. The selection emphasis placed on specific traits to achieve a breeding objective is often informed by the financial value of a trait to a farm system. However, there are alternative, and complementary approaches to defining both the breeding objective and the selection emphasis placed on traits that are included in associated selection tools. These are based on the preferences of stakeholders, which are often heterogeneous and include broader values and motivations than profit. In this regard, stated preference methods are useful when considering traits that have either no discernible market value or whose value is not fully transferred via the market. Such approaches can guide more appropriate breeding decisions that are amenable to changing societal values, for example with reduced negative environmental externalities. However, while stated preference methods offer promising conceptualisations of value in genetic improvement programmes, there is still a substantial knowledge gap in terms of the current state of research and a catalogue of publications to date. This paper reviews publications of stated preference approaches in the field of livestock breeding (and some relevant crop breeding examples), providing a knowledge base of published applications and promoting their continued development and implementation towards the formulation of appropriate breeding objectives and selection indices. A systematic review of 84 peer-reviewed publications and an aggregate ranking of traits for the most commonly studied subject (cattle) reveals uncertainty in preference estimates which may be driven by (i) a diverse set of non-standardised methodologies, (ii) common oversights in the selection, inclusion and description of traits, and (iii) inaccurate representations of the respondent population. We discuss key considerations to help overcome these limitations, including avoiding methodological confinement to a disciplinary silo and reducing complexity so that the values of broader respondent groups may be accounted for.
Collapse
|
12
|
Hussain A, Chun J. Cloud service scrutinization and selection framework (C3SF): A novel unified approach to cloud service selection with consensus. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.11.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Li X, Yi D, Liu JS. Bayesian Analysis of Rank Data with Covariates and Heterogeneous Rankers. Stat Sci 2022. [DOI: 10.1214/20-sts818] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Xinran Li
- Xinran Li is Assistant Professor, Department of Statistics, University of Illinois, Champaign, Illinois 61820, USA
| | - Dingdong Yi
- Dingdong Yi is Quantitative Researcher, Citadel Americas LLC, New York, New York 10022, USA
| | - Jun S. Liu
- Jun S. Liu is Professor, Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
14
|
Yoo Y, Escobedo AR. A New Binary Programming Formulation and Social Choice Property for Kemeny Rank Aggregation. DECISION ANALYSIS 2021. [DOI: 10.1287/deca.2021.0433] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Rank aggregation is widely used in group decision making and many other applications, where it is of interest to consolidate heterogeneous ordered lists. Oftentimes, these rankings may involve a large number of alternatives, contain ties, and/or be incomplete, all of which complicate the use of robust aggregation methods. In particular, these characteristics have limited the applicability of the aggregation framework based on the Kemeny-Snell distance, which satisfies key social choice properties that have been shown to engender improved decisions. This work introduces a binary programming formulation for the generalized Kemeny rank aggregation problem—whose ranking inputs may be complete and incomplete, with and without ties. Moreover, it leverages the equivalence of two ranking aggregation problems, namely, that of minimizing the Kemeny-Snell distance and of maximizing the Kendall-τ correlation, to compare the newly introduced binary programming formulation to a modified version of an existing integer programming formulation associated with the Kendall-τ distance. The new formulation has fewer variables and constraints, which leads to faster solution times. Moreover, we develop a new social choice property, the nonstrict extended Condorcet criterion, which can be regarded as a natural extension of the well-known Condorcet criterion and the Extended Condorcet criterion. Unlike its parent properties, the new property is adequate for handling complete rankings with ties. The property is leveraged to develop a structural decomposition algorithm, through which certain large instances of the NP-hard Kemeny rank aggregation problem can be solved exactly in a practical amount of time. To test the practical implications of the new formulation and social choice property, we work with instances constructed from a probabilistic distribution and with benchmark instances from PrefLib, a library of preference data.
Collapse
Affiliation(s)
- Yeawon Yoo
- Department of Applied Mathematics and Statistics, SNF Agora Institute, Johns Hopkins University, Baltimore, Maryland 21218
| | - Adolfo R. Escobedo
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, Arizona 85281
| |
Collapse
|
15
|
Buzzelli M, Erba I. On the evaluation of temporal and spatial stability of color constancy algorithms. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2021; 38:1349-1356. [PMID: 34613142 DOI: 10.1364/josaa.434860] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 08/10/2021] [Indexed: 06/13/2023]
Abstract
Computational color constancy algorithms are commonly evaluated only through angular error analysis on annotated datasets of static images. The widespread use of videos in consumer devices motivated us to define a richer methodology for color constancy evaluation. To this extent, temporal and spatial stability are defined here to determine the degree of sensitivity of color constancy algorithms to variations in the scene that do not depend on the illuminant source, such as moving subjects or a moving camera. Our evaluation methodology is applied to compare several color constancy algorithms on stable sequences belonging to the Gray Ball and Burst Color Constancy video datasets. The stable sequences, identified using a general-purpose procedure, are made available for public download to encourage future research. Our investigation proves the importance of evaluating color constancy algorithms according to multiple metrics, instead of angular error alone. For example, the popular fully convolutional color constancy with confidence-weighted pooling algorithm is consistently the best performing solution for error evaluation, but it is often surpassed in terms of stability by the traditional gray edge algorithm, and by the more recent sensor-independent illumination estimation algorithm.
Collapse
|
16
|
|
17
|
Nguyen QP, Karagas MR, Madan JC, Dade E, Palys TJ, Morrison HG, Pathmasiri WW, McRitche S, Sumner SJ, Frost HR, Hoen AG. Associations between the gut microbiome and metabolome in early life. BMC Microbiol 2021; 21:238. [PMID: 34454437 PMCID: PMC8400760 DOI: 10.1186/s12866-021-02282-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 07/14/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The infant intestinal microbiome plays an important role in metabolism and immune development with impacts on lifelong health. The linkage between the taxonomic composition of the microbiome and its metabolic phenotype is undefined and complicated by redundancies in the taxon-function relationship within microbial communities. To inform a more mechanistic understanding of the relationship between the microbiome and health, we performed an integrative statistical and machine learning-based analysis of microbe taxonomic structure and metabolic function in order to characterize the taxa-function relationship in early life. RESULTS Stool samples collected from infants enrolled in the New Hampshire Birth Cohort Study (NHBCS) at approximately 6-weeks (n = 158) and 12-months (n = 282) of age were profiled using targeted and untargeted nuclear magnetic resonance (NMR) spectroscopy as well as DNA sequencing of the V4-V5 hypervariable region from the bacterial 16S rRNA gene. There was significant inter-omic concordance based on Procrustes analysis (6 weeks: p = 0.056; 12 months: p = 0.001), however this association was no longer significant when accounting for phylogenetic relationships using generalized UniFrac distance metric (6 weeks: p = 0.376; 12 months: p = 0.069). Sparse canonical correlation analysis showed significant correlation, as well as identifying sets of microbe/metabolites driving microbiome-metabolome relatedness. Performance of machine learning models varied across different metabolites, with support vector machines (radial basis function kernel) being the consistently top ranked model. However, predictive R2 values demonstrated poor predictive performance across all models assessed (avg: - 5.06% -- 6 weeks; - 3.7% -- 12 months). Conversely, the Spearman correlation metric was higher (avg: 0.344-6 weeks; 0.265-12 months). This demonstrated that taxonomic relative abundance was not predictive of metabolite concentrations. CONCLUSIONS Our results suggest a degree of overall association between taxonomic profiles and metabolite concentrations. However, lack of predictive capacity for stool metabolic signatures reflects, in part, the possible role of functional redundancy in defining the taxa-function relationship in early life as well as the bidirectional nature of the microbiome-metabolome association. Our results provide evidence in favor of a multi-omic approach for microbiome studies, especially those focused on health outcomes.
Collapse
Affiliation(s)
- Quang P. Nguyen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
| | - Margaret R. Karagas
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
- Children’s Environmental Health & Disease Prevention Research Center at Dartmouth, Lebanon, NH USA
| | - Juliette C. Madan
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
- Children’s Environmental Health & Disease Prevention Research Center at Dartmouth, Lebanon, NH USA
- Department of Pediatrics, Children’s Hospital at Dartmouth, Hanover, NH USA
| | - Erika Dade
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
| | - Thomas J. Palys
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
| | - Hilary G. Morrison
- Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, MA USA
| | - Wimal W. Pathmasiri
- Department of Nutrition, Nutrition Research Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Susan McRitche
- Department of Nutrition, Nutrition Research Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - Susan J. Sumner
- Department of Nutrition, Nutrition Research Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - H. Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
| | - Anne G. Hoen
- Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, NH USA
- Children’s Environmental Health & Disease Prevention Research Center at Dartmouth, Lebanon, NH USA
| |
Collapse
|
18
|
Klotz S, Westner M, Strahringer S. Critical Success Factors of Business-managed IT: It Takes Two to Tango. INFORMATION SYSTEMS MANAGEMENT 2021. [DOI: 10.1080/10580530.2021.1938300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Stefan Klotz
- Chair of Business Informatics, esp. Information Systems in Trade and Industry, TU Dresden, Dresden, Germany
| | - Markus Westner
- Faculty of Computer Science and Mathematics, OTH Regensburg, Regensburg, Germany
| | - Susanne Strahringer
- Chair of Business Informatics, esp. Information Systems in Trade and Industry, TU Dresden, Dresden, Germany
| |
Collapse
|
19
|
Miebs G, Kadziński M. Heuristic algorithms for aggregation of incomplete rankings in multiple criteria group decision making. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.01.055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
Lyu Y, Gao F, Wu IS, Lim BY. Imma Sort by Two or More Attributes With Interpretable Monotonic Multi-Attribute Sorting. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2369-2384. [PMID: 33296304 DOI: 10.1109/tvcg.2020.3043487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Many choice problems often involve multiple attributes which are mentally challenging, because only one attribute is neatly sorted while others could be randomly arranged. We hypothesize that perceiving approximately monotonic trends across multiple attributes is key to the overall interpretability of sorted results, because users can easily predict the attribute values of the next items. We extend a ranking principal curve model to tune monotonic trends in attributes and present Imma Sort to sort items by multiple attributes simultaneously by trading-off the monotonicity in the primary sorted attribute to increase the human predictability for other attributes. We characterize how it performs for varying attribute correlations, attribute preferences, list lengths and number of attributes. We further extend Imma Sort with ImmaAnchor and ImmaCenter to improve the learnability and efficiency to search sorted items with conflicting attributes. We demonstrate usage scenarios for two applications and evaluate its learnability, usability, interpretability, and user performance in prediction and search tasks. We find that Imma Sort improves the interpretability and satisfaction of sorting by ≥ 2 attributes. We discuss why, when, where, and how to deploy Imma Sort for real-world applications.
Collapse
|
21
|
Velásquez-Zapata V, Elmore JM, Banerjee S, Dorman KS, Wise RP. Next-generation yeast-two-hybrid analysis with Y2H-SCORES identifies novel interactors of the MLA immune receptor. PLoS Comput Biol 2021; 17:e1008890. [PMID: 33798202 PMCID: PMC8046355 DOI: 10.1371/journal.pcbi.1008890] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 04/14/2021] [Accepted: 03/17/2021] [Indexed: 12/21/2022] Open
Abstract
Protein-protein interaction networks are one of the most effective representations of cellular behavior. In order to build these models, high-throughput techniques are required. Next-generation interaction screening (NGIS) protocols that combine yeast two-hybrid (Y2H) with deep sequencing are promising approaches to generate interactome networks in any organism. However, challenges remain to mining reliable information from these screens and thus, limit its broader implementation. Here, we present a computational framework, designated Y2H-SCORES, for analyzing high-throughput Y2H screens. Y2H-SCORES considers key aspects of NGIS experimental design and important characteristics of the resulting data that distinguish it from RNA-seq expression datasets. Three quantitative ranking scores were implemented to identify interacting partners, comprising: 1) significant enrichment under selection for positive interactions, 2) degree of interaction specificity among multi-bait comparisons, and 3) selection of in-frame interactors. Using simulation and an empirical dataset, we provide a quantitative assessment to predict interacting partners under a wide range of experimental scenarios, facilitating independent confirmation by one-to-one bait-prey tests. Simulation of Y2H-NGIS enabled us to identify conditions that maximize detection of true interactors, which can be achieved with protocols such as prey library normalization, maintenance of larger culture volumes and replication of experimental treatments. Y2H-SCORES can be implemented in different yeast-based interaction screenings, with an equivalent or superior performance than existing methods. Proof-of-concept was demonstrated by discovery and validation of novel interactions between the barley nucleotide-binding leucine-rich repeat (NLR) immune receptor MLA6, and fourteen proteins, including those that function in signaling, transcriptional regulation, and intracellular trafficking.
Collapse
Affiliation(s)
- Valeria Velásquez-Zapata
- Program in Bioinformatics & Computational Biology, Iowa State University, Ames, Iowa, United States of America
- Department of Plant Pathology & Microbiology, Iowa State University, Ames, Iowa, United States of America
| | - J. Mitch Elmore
- Department of Plant Pathology & Microbiology, Iowa State University, Ames, Iowa, United States of America
- Corn Insects and Crop Genetics Research, USDA-Agricultural Research Service, Ames, Iowa, United States of America
| | - Sagnik Banerjee
- Program in Bioinformatics & Computational Biology, Iowa State University, Ames, Iowa, United States of America
- Department of Statistics, Iowa State University, Ames, Iowa, United States of America
| | - Karin S. Dorman
- Program in Bioinformatics & Computational Biology, Iowa State University, Ames, Iowa, United States of America
- Department of Statistics, Iowa State University, Ames, Iowa, United States of America
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, United States of America
| | - Roger P. Wise
- Program in Bioinformatics & Computational Biology, Iowa State University, Ames, Iowa, United States of America
- Department of Plant Pathology & Microbiology, Iowa State University, Ames, Iowa, United States of America
- Corn Insects and Crop Genetics Research, USDA-Agricultural Research Service, Ames, Iowa, United States of America
| |
Collapse
|
22
|
Wiesenfarth M, Reinke A, Landman BA, Eisenmann M, Saiz LA, Cardoso MJ, Maier-Hein L, Kopp-Schneider A. Methods and open-source toolkit for analyzing and visualizing challenge results. Sci Rep 2021; 11:2369. [PMID: 33504883 PMCID: PMC7841186 DOI: 10.1038/s41598-021-82017-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 01/11/2021] [Indexed: 01/12/2023] Open
Abstract
Grand challenges have become the de facto standard for benchmarking image analysis algorithms. While the number of these international competitions is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. Given these shortcomings, the contribution of this paper is two-fold: (1) we present a set of methods to comprehensively analyze and visualize the results of single-task and multi-task challenges and apply them to a number of simulated and real-life challenges to demonstrate their specific strengths and weaknesses; (2) we release the open-source framework challengeR as part of this work to enable fast and wide adoption of the methodology proposed in this paper. Our approach offers an intuitive way to gain important insights into the relative and absolute performance of algorithms, which cannot be revealed by commonly applied visualization techniques. This is demonstrated by the experiments performed in the specific context of biomedical image analysis challenges. Our framework could thus become an important tool for analyzing and visualizing challenge results in the field of biomedical image analysis and beyond.
Collapse
Affiliation(s)
- Manuel Wiesenfarth
- Division of Biostatistics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 581, Heidelberg, 69120, Germany.
| | - Annika Reinke
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120, Heidelberg, Germany
| | - Bennett A Landman
- Electrical Engineering, Vanderbilt University, Nashville, TN, 37235-1679, USA
| | - Matthias Eisenmann
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120, Heidelberg, Germany
| | - Laura Aguilera Saiz
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120, Heidelberg, Germany
| | - M Jorge Cardoso
- School of Biomedical Engineering and Imaging Sciences, King's College London, London, WC2R 2LS, UK
| | - Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, 69120, Heidelberg, Germany.
| | - Annette Kopp-Schneider
- Division of Biostatistics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 581, Heidelberg, 69120, Germany
| |
Collapse
|
23
|
Venkatesh B, Anuradha J. A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2021. [DOI: 10.3233/kes-190134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.
Collapse
|
24
|
Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, Kulakovskiy IV, Kel A, Kolpakov F. GTRD: an integrated view of transcription regulation. Nucleic Acids Res 2021; 49:D104-D111. [PMID: 33231677 PMCID: PMC7778956 DOI: 10.1093/nar/gkaa1057] [Citation(s) in RCA: 122] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/18/2020] [Accepted: 11/03/2020] [Indexed: 12/24/2022] Open
Abstract
The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.
Collapse
Affiliation(s)
- Semyon Kolmykov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Federal Research Center Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russian Federation
| | - Ivan Yevshin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Mikhail Kulyashov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Ruslan Sharipov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Yury Kondrakhin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics RAS, Moscow 119991, Russian Federation
- Moscow Institute of Physics and Technology (State University), Dolgoprudny 141700, Russian Federation
- NRC «Kurchatov Institute» - GOSNIIGENETIKA, Kurchatov Genomic Center, Moscow 123182, Russian Federation
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russian Federation
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics RAS, Moscow 119991, Russian Federation
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russian Federation
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russian Federation
| | - Alexander Kel
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- geneXplain GmbH, 38302 Wolfenbüttel, Germany
- Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk 630090, Russian Federation
| | - Fedor Kolpakov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| |
Collapse
|
25
|
Brown D, Van den Bergh I, de Bruin S, Machida L, van Etten J. Data synthesis for crop variety evaluation. A review. AGRONOMY FOR SUSTAINABLE DEVELOPMENT 2020; 40:25. [PMID: 32863892 PMCID: PMC7440334 DOI: 10.1007/s13593-020-00630-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/23/2020] [Indexed: 05/12/2023]
Abstract
Crop varieties should fulfill multiple requirements, including agronomic performance and product quality. Variety evaluations depend on data generated from field trials and sensory analyses, performed with different levels of participation from farmers and consumers. Such multi-faceted variety evaluation is expensive and time-consuming; hence, any use of these data should be optimized. Data synthesis can help to take advantage of existing and new data, combining data from different sources and combining it with expert knowledge to produce new information and understanding that supports decision-making. Data synthesis for crop variety evaluation can partly build on extant experiences and methods, but it also requires methodological innovation. We review the elements required to achieve data synthesis for crop variety evaluation, including (1) data types required for crop variety evaluation, (2) main challenges in data management and integration, (3) main global initiatives aiming to solve those challenges, (4) current statistical approaches to combine data for crop variety evaluation and (5) existing data synthesis methods used in evaluation of varieties to combine different datasets from multiple data sources. We conclude that currently available methods have the potential to overcome existing barriers to data synthesis and could set in motion a virtuous cycle that will encourage researchers to share data and collaborate on data-driven research.
Collapse
Affiliation(s)
- David Brown
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Droevendaalsesteeg 3, 6708 PB Wageningen, The Netherlands
- Bioversity International, Turrialba, 30501 Costa Rica
| | - Inge Van den Bergh
- Bioversity International, C/O KU Leuven, W. De Croylaan 42, P.O. Box 2455, 3001 Leuven, Belgium
| | - Sytze de Bruin
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Droevendaalsesteeg 3, 6708 PB Wageningen, The Netherlands
| | - Lewis Machida
- Bioversity International, C/O International Institute of Tropical Agriculture (IITA), Nelson Mandela African Institute of Science and Technology, P.O. Box 447, Arusha, Tanzania
| | | |
Collapse
|
26
|
Rolland A, Cugliari J. Sensitivity index to measure dependence on parameters for rankings and top- k rankings. J Appl Stat 2020; 47:1191-1207. [DOI: 10.1080/02664763.2019.1671963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Antoine Rolland
- ERIC EA 3083, Université de Lyon, Universit Lumière Lyon 2, Lyon, France
| | - Jairo Cugliari
- ERIC EA 3083, Université de Lyon, Universit Lumière Lyon 2, Lyon, France
| |
Collapse
|
27
|
Chen ATY, Biglari-Abhari M, Wang KIK. Fusing Appearance and Spatio-Temporal Models for Person Re-Identification and Tracking. J Imaging 2020; 6:jimaging6050027. [PMID: 34460729 PMCID: PMC8321031 DOI: 10.3390/jimaging6050027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 04/26/2020] [Accepted: 04/29/2020] [Indexed: 11/16/2022] Open
Abstract
Knowing who is where is a common task for many computer vision applications. Most of the literature focuses on one of two approaches: determining who a detected person is (appearance-based re-identification) and collating positions into a list, or determining the motion of a person (spatio-temporal-based tracking) and assigning identity labels based on tracks formed. This paper presents a model fusion approach, aiming towards combining both sources of information together in order to increase the accuracy of determining identity classes for detected people using re-ranking. First, a Sequential k-Means re-identification approach is presented, followed by a Kalman filter-based spatio-temporal tracking approach. A linear weighting approach is used to fuse the outputs from these models together, with modification of the weights using a decay function and a rule-based system to reflect the strengths and weaknesses of the models under different conditions. Preliminary experimental results with two different person detection algorithms on an indoor person tracking dataset show that fusing the appearance and spatio-temporal models significantly increases the overall accuracy of the classification operation.
Collapse
|
28
|
Oliveira SEL, Diniz V, Lacerda A, Merschmanm L, Pappa GL. Is Rank Aggregation Effective in Recommender Systems? An Experimental Analysis. ACM T INTEL SYST TEC 2020. [DOI: 10.1145/3365375] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Recommender Systems are tools designed to help users find relevant information from the myriad of content available online. They work by actively suggesting items that are relevant to users according to their historical preferences or observed actions. Among recommender systems, top-
N
recommenders work by suggesting a ranking of
N
items that can be of interest to a user. Although a significant number of top-
N
recommenders have been proposed in the literature, they often disagree in their returned rankings, offering an opportunity for improving the final recommendation ranking by aggregating the outputs of different algorithms.
Rank aggregation was successfully used in a significant number of areas, but only a few rank aggregation methods have been proposed in the recommender systems literature. Furthermore, there is a lack of studies regarding rankings’ characteristics and their possible impacts on the improvements achieved through rank aggregation. This work presents an extensive two-phase experimental analysis of rank aggregation in recommender systems. In the first phase, we investigate the characteristics of rankings recommended by 15 different top-
N
recommender algorithms regarding agreement and diversity. In the second phase, we look at the results of 19 rank aggregation methods and identify different scenarios where they perform best or worst according to the input rankings’ characteristics.
Our results show that supervised rank aggregation methods provide improvements in the results of the recommended rankings in six out of seven datasets. These methods provide robustness even in the presence of a big set of weak recommendation rankings. However, in cases where there was a set of non-diverse high-quality input rankings, supervised and unsupervised algorithms produced similar results. In these cases, we can avoid the cost of the former in favor of the latter.
Collapse
Affiliation(s)
| | - Victor Diniz
- Computer Science Department, Universidade Federal de Minas Gerais
| | - Anisio Lacerda
- Computer Science Department, Universidade Federal de Minas Gerais
| | | | - Gisele L. Pappa
- Computer Science Department, Universidade Federal de Minas Gerais
| |
Collapse
|
29
|
Quillet A, Saad C, Ferry G, Anouar Y, Vergne N, Lecroq T, Dubessy C. Improving Bioinformatics Prediction of microRNA Targets by Ranks Aggregation. Front Genet 2020; 10:1330. [PMID: 32047509 PMCID: PMC6997536 DOI: 10.3389/fgene.2019.01330] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 12/05/2019] [Indexed: 12/18/2022] Open
Abstract
microRNAs are noncoding RNAs which downregulate a large number of target mRNAs and modulate cell activity. Despite continued progress, bioinformatics prediction of microRNA targets remains a challenge since available software still suffer from a lack of accuracy and sensitivity. Moreover, these tools show fairly inconsistent results from one another. Thus, in an attempt to circumvent these difficulties, we aggregated all human results of four important prediction algorithms (miRanda, PITA, SVmicrO, and TargetScan) showing additional characteristics in order to rerank them into a single list. Instead of deciding which prediction tool to use, our method clearly helps biologists getting the best microRNA target predictions from all aggregated databases. The resulting database is freely available through a webtool called miRabel1 which can take either a list of miRNAs, genes, or signaling pathways as search inputs. Receiver operating characteristic curves and precision-recall curves analysis carried out using experimentally validated data and very large data sets show that miRabel significantly improves the prediction of miRNA targets compared to the four algorithms used separately. Moreover, using the same analytical methods, miRabel shows significantly better predictions than other popular algorithms such as MBSTAR, miRWalk, ExprTarget and miRMap. Interestingly, an F-score analysis revealed that miRabel also significantly improves the relevance of the top results. The aggregation of results from different databases is therefore a powerful and generalizable approach to many other species to improve miRNA target predictions. Thus, miRabel is an efficient tool to guide biologists in their search for miRNA targets and integrate them into a biological context.
Collapse
Affiliation(s)
- Aurélien Quillet
- Normandie Univ, UNIROUEN, INSERM, Laboratoire Différenciation et Communication Neuronale et Neuroendocrine, Rouen, France
| | - Chadi Saad
- Normandie Univ, UNIROUEN, INSERM, Laboratoire Différenciation et Communication Neuronale et Neuroendocrine, Rouen, France
| | - Gaëtan Ferry
- Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, Laboratoire d'Informatique du Traitement de l'Information et des Systèmes, Rouen, France
| | - Youssef Anouar
- Normandie Univ, UNIROUEN, INSERM, Laboratoire Différenciation et Communication Neuronale et Neuroendocrine, Rouen, France
| | - Nicolas Vergne
- Normandie Univ, UNIROUEN, CNRS, Laboratoire de Mathématiques Raphaël Salem, Rouen, France
| | - Thierry Lecroq
- Normandie Univ, UNIROUEN, UNIHAVRE, INSA Rouen, Laboratoire d'Informatique du Traitement de l'Information et des Systèmes, Rouen, France
| | - Christophe Dubessy
- Normandie Univ, UNIROUEN, INSERM, Laboratoire Différenciation et Communication Neuronale et Neuroendocrine, Rouen, France
| |
Collapse
|
30
|
Modified fuzzy TOPSIS + TFNs ranking model for candidate selection using the qualifying criteria. Soft comput 2019. [DOI: 10.1007/s00500-019-04521-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
31
|
Cui S, Luo Y, Tseng HH, Ten Haken RK, El Naqa I. Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage. Med Phys 2019; 46:2497-2511. [PMID: 30891794 DOI: 10.1002/mp.13497] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Revised: 02/18/2019] [Accepted: 03/08/2019] [Indexed: 12/23/2022] Open
Abstract
PURPOSE There has been burgeoning interest in applying machine learning methods for predicting radiotherapy outcomes. However, the imbalanced ratio of a large number of variables to a limited sample size in radiation oncology constitutes a major challenge. Therefore, dimensionality reduction methods can be a key to success. The study investigates and contrasts the application of traditional machine learning methods and deep learning approaches for outcome modeling in radiotherapy. In particular, new joint architectures based on variational autoencoder (VAE) for dimensionality reduction are presented and their application is demonstrated for the prediction of lung radiation pneumonitis (RP) from a large-scale heterogeneous dataset. METHODS A large-scale heterogeneous dataset containing a pool of 230 variables including clinical factors (e.g., dose, KPS, stage) and biomarkers (e.g., single nucleotide polymorphisms (SNPs), cytokines, and micro-RNAs) in a population of 106 nonsmall cell lung cancer (NSCLC) patients who received radiotherapy was used for modeling RP. Twenty-two patients had grade 2 or higher RP. Four methods were investigated, including feature selection (case A) and feature extraction (case B) with traditional machine learning methods, a VAE-MLP joint architecture (case C) with deep learning and lastly, the combination of feature selection and joint architecture (case D). For feature selection, Random forest (RF), Support Vector Machine (SVM), and multilayer perceptron (MLP) were implemented to select relevant features. Specifically, each method was run for multiple times to rank features within several cross-validated (CV) resampled sets. A collection of ranking lists were then aggregated by top 5% and Kemeny graph methods to identify the final ranking for prediction. A synthetic minority oversampling technique was applied to correct for class imbalance during this process. For deep learning, a VAE-MLP joint architecture where a VAE aimed for dimensionality reduction and an MLP aimed for classification was developed. In this architecture, reconstruction loss and prediction loss were combined into a single loss function to realize simultaneous training and weights were assigned to different classes to mitigate class imbalance. To evaluate the prediction performance and conduct comparisons, the area under receiver operating characteristic curves (AUCs) were performed for nested CVs for both handcrafted feature selections and the deep learning approach. The significance of differences in AUCs was assessed using the DeLong test of U-statistics. RESULTS An MLP-based method using weight pruning (WP) feature selection yielded the best performance among the different hand-crafted feature selection methods (case A), reaching an AUC of 0.804 (95% CI: 0.761-0.823) with 29 top features. A VAE-MLP joint architecture (case C) achieved a comparable but slightly lower AUC of 0.781 (95% CI: 0.737-0.808) with the size of latent dimension being 2. The combination of handcrafted features (case A) and latent representation (case D) achieved a significant AUC improvement of 0.831 (95% CI: 0.805-0.863) with 22 features (P-value = 0.000642 compared with handcrafted features only (Case A) and P-value = 0.000453 compared to VAE alone (Case C)) with an MLP classifier. CONCLUSION The potential for combination of traditional machine learning methods and deep learning VAE techniques has been demonstrated for dealing with limited datasets in modeling radiotherapy toxicities. Specifically, latent variables from a VAE-MLP joint architecture are able to complement handcrafted features for the prediction of RP and improve prediction over either method alone.
Collapse
Affiliation(s)
- Sunan Cui
- Applied Physics Program, University of Michigan, Ann Arbor, MI, USA
| | - Yi Luo
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Huan-Hsin Tseng
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Randall K Ten Haken
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Issam El Naqa
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
32
|
|
33
|
Taškova K, Fontaine JF, Mrowka R, Andrade-Navarro MA. Literature optimized integration of gene expression for organ-specific evaluation of toxicogenomics datasets. PLoS One 2019; 14:e0210467. [PMID: 30640953 PMCID: PMC6331104 DOI: 10.1371/journal.pone.0210467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 12/24/2018] [Indexed: 11/30/2022] Open
Abstract
The study of drug toxicity in human organs is complicated by their complex inter-relations and by the obvious difficulty to testing drug effects on biologically relevant material. Animal models and human cell cultures offer alternatives for systematic and large-scale profiling of drug effects on gene expression level, as typically found in the so-called toxicogenomics datasets. However, the complexity of these data, which includes variable drug doses, time points, and experimental setups, makes it difficult to choose and integrate the data, and to evaluate the appropriateness of one or another model system to study drug toxicity (of particular drugs) of particular human organs. Here, we define a protocol to integrate drug-wise rankings of gene expression changes in toxicogenomics data, which we apply to the TG-GATEs dataset, to prioritize genes for association to drug toxicity in liver or kidney. Contrast of the results with sets of known human genes associated to drug toxicity in the literature allows to compare different rank aggregation approaches for the task at hand. Collectively, ranks from multiple models point to genes not previously associated to toxicity, notably, the PCNA clamp associated factor (PCLAF), and genes regulated by the master regulator of the antioxidant response NFE2L2, such as NQO1 and SRXN1. In addition, comparing gene ranks from different models allowed us to evaluate striking differences in terms of toxicity-associated genes between human and rat hepatocytes or between rat liver and rat hepatocytes. We interpret these results to point to the different molecular functions associated to organ toxicity that are best described by each model. We conclude that the expected production of toxicogenomics panels with larger numbers of drugs and models, in combination with the ongoing increase of the experimental literature in organ toxicity, will lead to increasingly better associations of genes for organism toxicity.
Collapse
Affiliation(s)
| | | | - Ralf Mrowka
- Experimentelle Nephrologie, Universitätsklinikum Jena, KIM III, Jena, Germany
| | | |
Collapse
|
34
|
Galdi P, Fratello M, Trojsi F, Russo A, Tedeschi G, Tagliaferri R, Esposito F. Stochastic Rank Aggregation for the Identification of Functional Neuromarkers. Neuroinformatics 2019; 17:479-496. [PMID: 30604083 DOI: 10.1007/s12021-018-9412-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The main challenge in analysing functional magnetic resonance imaging (fMRI) data from extended samples of subject (N > 100) is to extract as much relevant information as possible from big amounts of noisy data. When studying neurodegenerative diseases with resting-state fMRI, one of the objectives is to determine regions with abnormal background activity with respect to a healthy brain and this is often attained with comparative statistical models applied to single voxels or brain parcels within one or several functional networks. In this work, we propose a novel approach based on clustering and stochastic rank aggregation to identify parcels that exhibit a coherent behaviour in groups of subjects affected by the same disorder and apply it to default-mode network independent component maps from resting-state fMRI data sets. Brain voxels are partitioned into parcels through k-means clustering, then solutions are enhanced by means of consensus techniques. For each subject, clusters are ranked according to their median value and a stochastic rank aggregation method, TopKLists, is applied to combine the individual rankings within each class of subjects. For comparison, the same approach was tested on an anatomical parcellation. We found parcels for which the rankings were different among control subjects and subjects affected by Parkinson's disease and amyotrophic lateral sclerosis and found evidence in literature for the relevance of top ranked regions in default-mode brain activity. The proposed framework represents a valid method for the identification of functional neuromarkers from resting-state fMRI data, and it might therefore constitute a step forward in the development of fully automated data-driven techniques to support early diagnoses of neurodegenerative diseases.
Collapse
Affiliation(s)
- Paola Galdi
- NeuRoNe Lab, Department of Management and Innovation Systems, University of Salerno, Fisciano, Salerno, Italy
| | - Michele Fratello
- Department of Medical, Surgical, Neurological, Metabolic and Aging Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Francesca Trojsi
- Department of Medical, Surgical, Neurological, Metabolic and Aging Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Antonio Russo
- Department of Medical, Surgical, Neurological, Metabolic and Aging Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Gioacchino Tedeschi
- Department of Medical, Surgical, Neurological, Metabolic and Aging Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Roberto Tagliaferri
- NeuRoNe Lab, Department of Management and Innovation Systems, University of Salerno, Fisciano, Salerno, Italy
| | - Fabrizio Esposito
- Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana", University of Salerno, Via S. Allende, 84081, Baronissi, Salerno, Italy.
| |
Collapse
|
35
|
Li X, Choudhary PK, Biswas S, Wang X. A Bayesian latent variable approach to aggregation of partial and top-ranked lists in genomic studies. Stat Med 2018; 37:4266-4278. [PMID: 30094911 DOI: 10.1002/sim.7920] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 06/13/2018] [Accepted: 07/03/2018] [Indexed: 12/30/2022]
Abstract
In genomic research, it is becoming increasingly popular to perform meta-analysis, the practice of combining results from multiple studies that target a common essential biological problem. Rank aggregation, a robust meta-analytic approach, consolidates such studies at the rank level. There exists extensive research on this topic, and various methods have been developed in the past. However, these methods have two major limitations when they are applied in the genomic context. First, they are mainly designed to work with full lists, whereas partial and/or top-ranked lists prevail in genomic studies. Second, the component studies are often clustered, and the existing methods fail to utilize such information. To address the above concerns, a Bayesian latent variable approach, called BiG, is proposed to formally deal with partial and top-ranked lists and incorporate the effect of clustering. Various reasonable prior specifications for variance parameters in hierarchical models are carefully studied and compared. Simulation results demonstrate the superior performance of BiG compared with other popular rank aggregation methods under various practical settings. A non-small-cell lung cancer data example is analyzed for illustration.
Collapse
Affiliation(s)
- Xue Li
- Department of Statistical Science, Southern Methodist University, Dallas, Texas
| | | | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas
| |
Collapse
|
36
|
Zhao Y, Zhang J, Xie H, Zhang S, Gu L. Minimization of annotation work: diagnosis of mammographic masses via active learning. Phys Med Biol 2018; 63:115003. [PMID: 29697059 DOI: 10.1088/1361-6560/aac042] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The prerequisite for establishing an effective prediction system for mammographic diagnosis is the annotation of each mammographic image. The manual annotation work is time-consuming and laborious, which becomes a great hindrance for researchers. In this article, we propose a novel active learning algorithm that can adequately address this problem, leading to the minimization of the labeling costs on the premise of guaranteed performance. Our proposed method is different from the existing active learning methods designed for the general problem as it is specifically designed for mammographic images. Through its modified discriminant functions and improved sample query criteria, the proposed method can fully utilize the pairing of mammographic images and select the most valuable images from both the mediolateral and craniocaudal views. Moreover, in order to extend active learning to the ordinal regression problem, which has no precedent in existing studies, but is essential for mammographic diagnosis (mammographic diagnosis is not only a classification task, but also an ordinal regression task for predicting an ordinal variable, viz. the malignancy risk of lesions), multiple sample query criteria need to be taken into consideration simultaneously. We formulate it as a criteria integration problem and further present an algorithm based on self-adaptive weighted rank aggregation to achieve a good solution. The efficacy of the proposed method was demonstrated on thousands of mammographic images from the digital database for screening mammography. The labeling costs of obtaining optimal performance in the classification and ordinal regression task respectively fell to 33.8 and 19.8 percent of their original costs. The proposed method also generated 1228 wins, 369 ties and 47 losses for the classification task, and 1933 wins, 258 ties and 185 losses for the ordinal regression task compared to the other state-of-the-art active learning algorithms. By taking the particularities of mammographic images, the proposed AL method can indeed reduce the manual annotation work to a great extent without sacrificing the performance of the prediction system for mammographic diagnosis.
Collapse
Affiliation(s)
- Yu Zhao
- Department of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, People's Republic of China. Author contributed to this work
| | | | | | | | | |
Collapse
|
37
|
Hubert G, Pitarch Y, Pinel-Sauvagnat K, Tournier R, Laporte L. TournaRank: When retrieval becomes document competition. Inf Process Manag 2018. [DOI: 10.1016/j.ipm.2017.11.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
38
|
Abdulrahman SM, Brazdil P, van Rijn JN, Vanschoren J. Speeding up algorithm selection using average ranking and active testing by introducing runtime. Mach Learn 2017. [DOI: 10.1007/s10994-017-5687-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
39
|
A novel method for estimating the common signals for consensus across multiple ranked lists. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2017.05.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
40
|
Thiam P, Meudt S, Palm G, Schwenker F. A Temporal Dependency Based Multi-modal Active Learning Approach for Audiovisual Event Detection. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9719-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
41
|
Li X, Wang X, Xiao G. A comparative study of rank aggregation methods for partial and top ranked lists in genomic applications. Brief Bioinform 2017; 20:178-189. [PMID: 28968705 PMCID: PMC6357556 DOI: 10.1093/bib/bbx101] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Indexed: 02/05/2023] Open
Abstract
Rank aggregation (RA), the process of combining multiple ranked lists into a single ranking, has played an important role in integrating information from individual genomic studies that address the same biological question. In previous research, attention has been focused on aggregating full lists. However, partial and/or top ranked lists are prevalent because of the great heterogeneity of genomic studies and limited resources for follow-up investigation. To be able to handle such lists, some ad hoc adjustments have been suggested in the past, but how RA methods perform on them (after the adjustments) has never been fully evaluated. In this article, a systematic framework is proposed to define different situations that may occur based on the nature of individually ranked lists. A comprehensive simulation study is conducted to examine the performance characteristics of a collection of existing RA methods that are suitable for genomic applications under various settings simulated to mimic practical situations. A non-small cell lung cancer data example is provided for further comparison. Based on our numerical results, general guidelines about which methods perform the best/worst, and under what conditions, are provided. Also, we discuss key factors that substantially affect the performance of the different methods.
Collapse
Affiliation(s)
- Xue Li
- Department of Statistical Science at Southern Methodist University, Dallas, TX
| | - Xinlei Wang
- Department of Statistical Science at Southern Methodist University, Dallas, TX,Corresponding author. Xinlei Wang, Department of Statistical Science, Southern Methodist University, 3225 Daniel Avenue, P O Box 750332, Dallas, Texas 75275, USA. Tel: 214-768-2459; Fax: (214) 768-4035; E-mail:
| | - Guanghua Xiao
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX
| |
Collapse
|
42
|
Mandal M, Mukhopadhyay A. Multiobjective PSO-based rank aggregation: Application in gene ranking from microarray data. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.12.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
43
|
Kavakiotis I, Xochelli A, Agathangelidis A, Tsoumakas G, Maglaveras N, Stamatopoulos K, Hadzidimitriou A, Vlahavas I, Chouvarda I. Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia. BMC Bioinformatics 2016; 17 Suppl 5:173. [PMID: 27295298 PMCID: PMC4905615 DOI: 10.1186/s12859-016-1044-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023] Open
Abstract
BACKGROUND Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. RESULTS The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. CONCLUSIONS This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.
Collapse
Affiliation(s)
- Ioannis Kavakiotis
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Aliki Xochelli
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Andreas Agathangelidis
- Division of Molecular Oncology and Department of Onco-Hematology, San Raffaele Scientific Institute, Milan, Italy
| | - Grigorios Tsoumakas
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Nicos Maglaveras
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Kostas Stamatopoulos
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Anastasia Hadzidimitriou
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ioannis Vlahavas
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Ioanna Chouvarda
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece.,Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
44
|
Abstract
Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this challenge, truth discovery, which integrates multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. We also discuss some future directions of truth discovery research. We hope that this survey will promote a better understanding of the current progress on truth discovery, and offer some guidelines on how to apply these approaches in application domains.
Collapse
Affiliation(s)
| | | | | | - Qi Li
- SUNY Buffalo, Buffalo, NY
| | - Lu Su
- SUNY Buffalo, Buffalo, NY
| | | | - Wei Fan
- Baidu Big Data Lab, Sunnyvale, CA
| | | |
Collapse
|
45
|
|
46
|
Zollinger A, Davison AC, Goldstein DR. Meta-analysis of incomplete microarray studies. Biostatistics 2015; 16:686-700. [PMID: 25987649 DOI: 10.1093/biostatistics/kxv014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 03/12/2015] [Indexed: 12/18/2022] Open
Abstract
Meta-analysis of microarray studies to produce an overall gene list is relatively straightforward when complete data are available. When some studies lack information-providing only a ranked list of genes, for example-it is common to reduce all studies to ranked lists prior to combining them. Since this entails a loss of information, we consider a hierarchical Bayes approach to meta-analysis using different types of information from different studies: the full data matrix, summary statistics, or ranks. The model uses an informative prior for the parameter of interest to aid the detection of differentially expressed genes. Simulations show that the new approach can give substantial power gains compared with classical meta-analysis and list aggregation methods. A meta-analysis of 11 published studies with different data types identifies genes known to be involved in ovarian cancer and shows significant enrichment.
Collapse
Affiliation(s)
- Alix Zollinger
- Ecole Polytechnique Fédérale de Lausanne, EPFL-FSB-MATHAA-STAT, Station 8, 1015 Lausanne, Switzerland
| | - Anthony C Davison
- Ecole Polytechnique Fédérale de Lausanne, EPFL-FSB-MATHAA-STAT, Station 8, 1015 Lausanne, Switzerland
| | - Darlene R Goldstein
- Ecole Polytechnique Fédérale de Lausanne, EPFL-FSB-MATHAA-STAT, Station 8, 1015 Lausanne, Switzerland
| |
Collapse
|
47
|
Wang W, Zhou X, Liu Z, Sun F. Network tuned multiple rank aggregation and applications to gene ranking. BMC Bioinformatics 2015; 16 Suppl 1:S6. [PMID: 25708095 PMCID: PMC4331705 DOI: 10.1186/1471-2105-16-s1-s6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
With the development of various high throughput technologies and analysis methods, researchers can study different aspects of a biological phenomenon simultaneously or one aspect repeatedly with different experimental techniques and analysis methods. The output from each study is a rank list of components of interest. Aggregation of the rank lists of components, such as proteins, genes and single nucleotide variants (SNV), produced by these experiments has been proven to be helpful in both filtering the noise and bringing forth a more complete understanding of the biological problems. Current available rank aggregation methods do not consider the network information that has been observed to provide vital contributions in many data integration studies. We developed network tuned rank aggregation methods incorporating network information and demonstrated its superior performance over aggregation methods without network information. The methods are tested on predicting the Gene Ontology function of yeast proteins. We validate the methods using combinations of three gene expression data sets and three protein interaction networks as well as an integrated network by combining the three networks. Results show that the aggregated rank lists are more meaningful if protein interaction network is incorporated. Among the methods compared, CGI_RRA and CGI_Endeavour, which integrate rank lists with networks using CGI [1] followed by rank aggregation using either robust rank aggregation (RRA) [2] or Endeavour [3] perform the best. Finally, we use the methods to locate target genes of transcription factors.
Collapse
|
48
|
da Silva SF, Avalhais LPS, Batista MA, Barcelos CAZ, Traina AJM. Findings on ranking evaluation functions for feature weighting in image retrieval. JOURNAL OF THE BRAZILIAN COMPUTER SOCIETY 2014. [DOI: 10.1186/1678-4804-20-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Abstract
Background
There are substantial benefits to be gained from ranking optimization in several information retrieval and recommendation systems. However, the analysis of ranking evaluation functions (REFs), which play a major role in many ranking optimization models, needs to be further investigated. An analysis of previous studies that investigated REFs was performed, and evidence was found which indicated that the choice of a proper REF is context sensitive.
Methods
In this study, we analyze a broad set of REFs for feature weighting aimed at increasing the image retrieval effectiveness. The REFs analyzed sums ten and includes the most successful and representative REFs from the literature. The REFs were embedded into a genetic algorithm (GA)-based relevance feedback (RF) model, called WLSP-C ±, aimed at improving image retrieval results through the use of learning weights for image descriptors and image regions.
Results
Analyses of precision-recall curves in five real-world image data sets showed that one non-parameterized REF named F5, not analyzed in previous studies, overcame recommended ones, which require parameter adjustment. We also provided a computational analysis of the GA-based RF model investigated, and it was shown that it is linear in regard to the image data set cardinality.
Conclusions
We conclude that REF F5 should be investigated in other contexts and problem scenarios centered on ranking optimization, as ranking optimization techniques rely heavily on the ranking quality measure.
Collapse
|