1
|
A novel state space reduction algorithm for team formation in social networks. PLoS One 2021; 16:e0259786. [PMID: 34855771 PMCID: PMC8638979 DOI: 10.1371/journal.pone.0259786] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 10/26/2021] [Indexed: 11/18/2022] Open
Abstract
Team formation (TF) in social networks exploits graphs (i.e., vertices = experts and edges = skills) to represent a possible collaboration between the experts. These networks lead us towards building cost-effective research teams irrespective of the geolocation of the experts and the size of the dataset. Previously, large datasets were not closely inspected for the large-scale distributions & relationships among the researchers, resulting in the algorithms failing to scale well on the data. Therefore, this paper presents a novel TF algorithm for expert team formation called SSR-TF based on two metrics; communication cost and graph reduction, that will become a basis for future TF's. In SSR-TF, communication cost finds the possibility of collaboration between researchers. The graph reduction scales the large data to only appropriate skills and the experts, resulting in real-time extraction of experts for collaboration. This approach is tested on five organic and benchmark datasets, i.e., UMP, DBLP, ACM, IMDB, and Bibsonomy. The SSR-TF algorithm is able to build cost-effective teams with the most appropriate experts-resulting in the formation of more communicative teams with high expertise levels.
Collapse
|
2
|
A Hybrid Method to Predict Postoperative Survival of Lung Cancer Using Improved SMOTE and Adaptive SVM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:2213194. [PMID: 34545291 PMCID: PMC8449740 DOI: 10.1155/2021/2213194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 06/09/2021] [Accepted: 08/21/2021] [Indexed: 11/18/2022]
Abstract
Predicting postoperative survival of lung cancer patients (LCPs) is an important problem of medical decision-making. However, the imbalanced distribution of patient survival in the dataset increases the difficulty of prediction. Although the synthetic minority oversampling technique (SMOTE) can be used to deal with imbalanced data, it cannot identify data noise. On the other hand, many studies use a support vector machine (SVM) combined with resampling technology to deal with imbalanced data. However, most studies require manual setting of SVM parameters, which makes it difficult to obtain the best performance. In this paper, a hybrid improved SMOTE and adaptive SVM method is proposed for imbalance data to predict the postoperative survival of LCPs. The proposed method is divided into two stages: in the first stage, the cross-validated committees filter (CVCF) is used to remove noise samples to improve the performance of SMOTE. In the second stage, we propose an adaptive SVM, which uses fuzzy self-tuning particle swarm optimization (FPSO) to optimize the parameters of SVM. Compared with other advanced algorithms, our proposed method obtains the best performance with 95.11% accuracy, 95.10% G-mean, 95.02% F1, and 95.10% area under the curve (AUC) for predicting postoperative survival of LCPs.
Collapse
|
3
|
An Optimized Method for Skin Cancer Diagnosis Using Modified Thermal Exchange Optimization Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5527698. [PMID: 34239598 PMCID: PMC8235991 DOI: 10.1155/2021/5527698] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 04/12/2021] [Accepted: 05/31/2021] [Indexed: 11/17/2022]
Abstract
Skin cancer is the most common cancer of the body. It is estimated that more than one million people worldwide develop skin cancer each year. Early detection of this cancer has a high effect on the disease treatment. In this paper, a new optimal and automatic pipeline approach has been proposed for the diagnosis of this disease from dermoscopy images. The proposed method includes a noise reduction process before processing for eliminating the noises. Then, the Otsu method as one of the widely used thresholding method is used to characterize the region of interest. Afterward, 20 different features are extracted from the image. To reduce the method complexity, a new modified version of the Thermal Exchange Optimization Algorithm is performed to the features. This improves the method precision and consistency. To validate the proposed method's efficiency, it is implemented to the American Cancer Society database, its results are compared with some state-of-the-art methods, and the final results showed the superiority of the proposed method against the others.
Collapse
|
4
|
Combining diversity and dispersion criteria for anticlustering: A bicriterion approach. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73:375-396. [PMID: 31512759 DOI: 10.1111/bmsp.12186] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 06/22/2019] [Indexed: 06/10/2023]
Abstract
Most partitioning methods used in psychological research seek to produce homogeneous groups (i.e., groups with low intra-group dissimilarity). However, there are also applications where the goal is to provide heterogeneous groups (i.e., groups with high intra-group dissimilarity). Examples of these anticlustering contexts include construction of stimulus sets, formation of student groups, assignment of employees to project work teams, and assembly of test forms from a bank of items. Unfortunately, most commercial software packages are not equipped to accommodate the objective criteria and constraints that commonly arise for anticlustering problems. Two important objective criteria for anticlustering based on information in a dissimilarity matrix are: a diversity measure based on within-cluster sums of dissimilarities; and a dispersion measure based on the within-cluster minimum dissimilarities. In many instances, it is possible to find a partition that provides a large improvement in one of these two criteria with little (or no) sacrifice in the other criterion. For this reason, it is of significant value to explore the trade-offs that arise between these two criteria. Accordingly, the key contribution of this paper is the formulation of a bicriterion optimization problem for anticlustering based on the diversity and dispersion criteria, along with heuristics to approximate the Pareto efficient set of partitions. A motivating example and computational study are provided within the framework of test assembly.
Collapse
|
5
|
A review of auditing techniques for the Unified Medical Language System. J Am Med Inform Assoc 2020; 27:1625-1638. [PMID: 32766692 PMCID: PMC7566540 DOI: 10.1093/jamia/ocaa108] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/05/2020] [Accepted: 05/13/2020] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE The study sought to describe the literature related to the development of methods for auditing the Unified Medical Language System (UMLS), with particular attention to identifying errors and inconsistencies of attributes of the concepts in the UMLS Metathesaurus. MATERIALS AND METHODS We applied the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) approach by searching the MEDLINE database and Google Scholar for studies referencing the UMLS and any of several terms related to auditing, error detection, and quality assurance. A qualitative analysis and summarization of articles that met inclusion criteria were performed. RESULTS Eighty-three studies were reviewed in detail. We first categorized techniques based on various aspects including concepts, concept names, and synonymy (n = 37), semantic type assignments (n = 36), hierarchical relationships (n = 24), lateral relationships (n = 12), ontology enrichment (n = 8), and ontology alignment (n = 18). We also categorized the methods according to their level of automation (ie, automated systematic, automated heuristic, or manual) and the type of knowledge used (ie, intrinsic or extrinsic knowledge). CONCLUSIONS This study is a comprehensive review of the published methods for auditing the various conceptual aspects of the UMLS. Categorizing the auditing techniques according to the various aspects will enable the curators of the UMLS as well as researchers comprehensive easy access to this wealth of knowledge (eg, for auditing lateral relationships in the UMLS). We also reviewed ontology enrichment and alignment techniques due to their critical use of and impact on the UMLS.
Collapse
|
6
|
Ant Colony Optimization for the Control of Pollutant Spreading on Social Networks. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4053-4065. [PMID: 31295135 DOI: 10.1109/tcyb.2019.2922266] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The rapid development of online social networks not only enables prompt and convenient dissemination of desirable information but also incurs fast and wide propagation of undesirable information. A common way to control the spread of pollutants is to block some nodes, but such a strategy may affect the service quality of a social network and leads to a high control cost if too many nodes are blocked. This paper considers the node selection problem as a biobjective optimization problem to find a subset of nodes to be blocked so that the effect of the control is maximized while the cost of the control is minimized. To solve this problem, we design an ant colony optimization algorithm with an adaptive dimension size selection under the multiobjective evolutionary algorithm framework based on decomposition (MOEA/D-ADACO). The proposed algorithm divides the biobjective problem into a set of single-objective subproblems and each ant takes charge of optimizing one subproblem. Moreover, two types of pheromone and heuristic information are incorporated into MOEA/D-ADACO, that is, pheromone and heuristic information of dimension size selection and that of node selection. While constructing solutions, the ants first determine the dimension size according to the former type of pheromone and heuristic information. Then, the ants select a specific number of nodes to build solutions according to the latter type of pheromone and heuristic information. Experiments conducted on a set of real-world online social networks confirm that the proposed biobjective optimization model and the developed MOEA/D-ADACO are promising for the pollutant spreading control.
Collapse
|
7
|
Abstract
We propose a process graph (P-graph) approach to develop ecosystem networks from knowledge of the properties of the component species. Originally developed as a process engineering tool for designing industrial plants, the P-graph framework has key advantages over conventional ecological network analysis techniques based on input-output models. A P-graph is a bipartite graph consisting of two types of nodes, which we propose to represent components of an ecosystem. Compartments within ecosystems (e.g., organism species) are represented by one class of nodes, while the roles or functions that they play relative to other compartments are represented by a second class of nodes. This bipartite graph representation enables a powerful, unambiguous representation of relationships among ecosystem compartments, which can come in tangible (e.g., mass flow in predation) or intangible form (e.g., symbiosis). For example, within a P-graph, the distinct roles of bees as pollinators for some plants and as prey for some animals can be explicitly represented, which would not otherwise be possible using conventional ecological network analysis. After a discussion of the mapping of ecosystems into P-graph, we also discuss how this framework can be used to guide understanding of complex networks that exist in nature. Two component algorithms of P-graph, namely maximal structure generation (MSG) and solution structure generation (SSG), are shown to be particularly useful for ecological network analysis. These algorithms enable candidate ecosystem networks to be deduced based on current scientific knowledge on the individual ecosystem components. This method can be used to determine the (a) effects of loss of specific ecosystem compartments due to extinction, (b) potential efficacy of ecosystem reconstruction efforts, and (c) maximum sustainable exploitation of human ecosystem services by humans. We illustrate the use of P-graph for the analysis of ecosystem compartment loss using a small-scale stylized case study, and further propose a new criticality index that can be easily derived from SSG results.
Collapse
|
8
|
Introducing Heuristic Information Into Ant Colony Optimization Algorithm for Identifying Epistasis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1253-1261. [PMID: 30403637 DOI: 10.1109/tcbb.2018.2879673] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Epistasis learning, which is aimed at detecting associations between multiple Single Nucleotide Polymorphisms (SNPs) and complex diseases, has gained increasing attention in genome wide association studies. Although much work has been done on mapping the SNPs underlying complex diseases, there is still difficulty in detecting epistatic interactions due to the lack of heuristic information to expedite the search process. In this study, a method EACO is proposed to detect epistatic interactions based on the ant colony optimization (ACO) algorithm, the highlights of which are the introduced heuristic information, fitness function, and a candidate solutions filtration strategy. The heuristic information multi-SURF* is introduced into EACO for identifying epistasis, which is incorporated into ant-decision rules to guide the search with linear time. Two functionally complementary fitness functions, mutual information and the Gini index, are combined to effectively evaluate the associations between SNP combinations and the phenotype. Furthermore, a strategy for candidate solutions filtration is provided to adaptively retain all optimal solutions which yields a more accurate way for epistasis searching. Experiments of EACO, as well as three ACO based methods (AntEpiSeeker, MACOED, and epiACO) and four commonly used methods (BOOST, SNPRuler, TEAM, and epiMODE) are performed on both simulation data sets and a real data set of age-related macular degeneration. Results indicate that EACO is promising in identifying epistasis.
Collapse
|
9
|
Two New Heuristic Methods for Protein Model Quality Assessment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1430-1439. [PMID: 30418914 PMCID: PMC8988942 DOI: 10.1109/tcbb.2018.2880202] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Protein tertiary structure prediction is an important open challenge in bioinformatics and requires effective methods to accurately evaluate the quality of protein 3-D models generated computationally. Many quality assessment (QA) methods have been proposed over the past three decades. However, the accuracy or robustness is unsatisfactory for practical applications. In this paper, two new heuristic QA methods are proposed: MUfoldQA_S and MUfoldQA_C. The MUfoldQA_S is a quasi-single-model QA method that assesses the model quality based on the known protein structures with similar sequences. This algorithm can be directly applied to protein fragments without the necessity of building a full structural model. A BLOSUM-based heuristic is also introduced to help differentiate accurate templates from poor ones. In MUfoldQA_C, the ideas from MUfoldQA_S were combined with the consensus approach to create a multi-model QA method that could also utilize information from existing reference models and have demonstrated improved performance. Extensive experimental results of these two methods have shown significant improvement over existing methods. In addition, both methods have been blindly tested in the CASP12 world-wide competition in the protein structure prediction field and ranked as top performers in their respective categories.
Collapse
|
10
|
Abstract
In many objective optimization problems (MaOPs), more than three distinct objectives are optimized. The challenging part in MaOPs is to get the Pareto approximation (PA) with high diversity and good convergence. In Literature, in order to solve the issue of diversity and convergence in MaOPs, many approaches are proposed using different multi objective evolutionary algorithms (MOEAs). Moreover, to get better results, the researchers use the sets of reference points to differentiate the solutions and to model the search process, it further evaluates and selects the non-dominating solutions by using the reference set of solutions. Furthermore, this technique is used in some of the swarm-based evolutionary algorithms. In this paper, we have used some effective adaptations of bat algorithm with the previous mentioned approach to effectively handle the many objective problems. Moreover, we have called this algorithm as many objective bat algorithm (MaOBAT). This algorithm is a biologically inspired algorithm, which uses echolocation power of micro bats. Each bat represents a complete solution, which can be evaluated based on the problem specific fitness function and then based on the dominance relationship, non-dominated solutions are selected. In proposed MaOBAT, dominance rank is used as dominance relationship (dominance rank of a solution means by how many other solutions a solution dominated). In our proposed strategy, dynamically allocated set of reference points are used, allowing the algorithm to have good convergence and high diversity pareto fronts (PF). The experimental results show that the proposed algorithm has significant advantages over several state-of-the-art algorithms in terms of the quality of the solution.
Collapse
|
11
|
Evaluation of user satisfaction and usability of a mobile app for smoking cessation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 182:105042. [PMID: 31473444 DOI: 10.1016/j.cmpb.2019.105042] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 06/20/2019] [Accepted: 08/19/2019] [Indexed: 05/14/2023]
Abstract
BACKGROUND Mobile apps have a great potential to support patients in healthcare, and to encourage healthy behavioral changes such as smoking cessation. Nevertheless, the user rejection levels are still high. A set of factors that has impact on the app effectiveness is related to the quality of those features that lead to positive user experiences when using the app. This work aims to evaluate the user experience, and more specifically the usability and the user satisfaction with a mobile application for smoking cessation. This will also provide a basis for future improvements. METHODS We provided a smoking cessation mobile Android app to two different user cohorts, the smokers as valid users and the experts, for three weeks. The app featured usual functionalities to help quit smoking, including an achieved benefits section, mini-games to distract during cravings, and supportive motivational messages. We collected information about user experience, through game playability and message satisfaction questionnaires, and the experts' opinions. We also considered usage of app sections, the duration of the mini-game sessions, and the user ratings for motivational messages. RESULTS We included 45 valid users and 25 experts in this study. The questionnaire indicated 80% satisfaction rate for the motivational messages. According to game questionnaires, over 69% of the participants agreed that the games have good usability features, however, for questions related to mobility and gameplay heuristics, agreements were below 67%. The most accessed app sections were achieved benefits and the one with motivational messages. The experts described issues that could help to improve the application. CONCLUSIONS The combination of questionnaires with expert reports allowed to identify several problems and possible corrections. Our study showed that motivational messages have a good satisfaction rate, although it is necessary to consider technical features of some mobile devices that may hinder message reception. Games have good usability and it's expected that the addition of difficulty levels and a better accessibility to the game menu could make them more attractive and increase its usage. Future development of mHealth apps based on gamification and motivational messages need to consider these factors for better user satisfaction and usability.
Collapse
|
12
|
Extractive single document summarization using binary differential evolution: Optimization of different sentence quality measures. PLoS One 2019; 14:e0223477. [PMID: 31725721 PMCID: PMC6855635 DOI: 10.1371/journal.pone.0223477] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 09/23/2019] [Indexed: 11/27/2022] Open
Abstract
With the increase in the amount of text information in different real-life applications, automatic text-summarization systems become more predominant in extracting relevant information. In the current study, we formulated the problem of extractive text-summarization as a binary optimization problem, and multi-objective binary differential evolution (DE) based optimization strategy is employed to solve this. The solutions of DE encode a possible subset of sentences to be present in the summary which is then evaluated based on some statistical features (objective functions) namely, the position of the sentence in the document, the similarity of a sentence with the title, length of the sentence, cohesion, readability, and coverage. These objective functions, measuring different aspects of summary, are optimized simultaneously using the search capability of DE. Some newly designed self-organizing map (SOM) based genetic operators are incorporated in the optimization process to improve the convergence. SOM generates a mating pool containing solutions and their neighborhoods. This mating pool takes part in the genetic operation (crossover and mutation) to create new solutions. To measure the similarity or dissimilarity between sentences, different existing measures like normalized Google distance, word mover distance, and cosine similarity are explored. For the purpose of evaluation, two standard summarization datasets namely, DUC2001, and DUC2002 are utilized, and the obtained results are compared with various supervised, unsupervised and optimization strategy based existing summarization techniques using ROUGE measures. Results illustrate the superiority of our approach in terms of convergence rate and ROUGE scores as compared to state-of-the-art methods. We have obtained 45% and 5% improvements over two recent state-of-the-art methods considering ROUGE−2 and ROUGE−1 scores, respectively, for the DUC2001 dataset. While for the DUC2002 dataset, improvements obtained by our approach are 20% and 5%, considering ROUGE−2 and ROUGE−1 scores, respectively. In addition to these standard datasets, CNN news dataset is also utilized to evaluate the efficacy of our proposed approach. It was also shown that the best performance not only depends on the objective functions used but also on the correct choice of similarity/dissimilarity measure between sentences.
Collapse
|
13
|
TSSCM: A synergism-based three-step cascade model for influence maximization on large-scale social networks. PLoS One 2019; 14:e0221271. [PMID: 31479453 PMCID: PMC6719832 DOI: 10.1371/journal.pone.0221271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 07/14/2019] [Indexed: 11/19/2022] Open
Abstract
Identification of the most influential spreaders that maximize information propagation in social networks is a classic optimization problem, called the influence maximization (IM) problem. A reasonable diffusion model that can accurately simulate information propagation in social networks is the key step to efficiently solving the IM problem. Synergism of neighbor nodes plays an important role in information propagation dynamics. Some known diffusion models have considered the reinforcement mechanism in defining the activation threshold. Most of these models focus on the synergetic effects of nodes on their common neighbors, but the accumulation of synergism has been neglected in previous studies. Inspired by these facts, we first discuss the catalytic role of synergism in the spreading dynamics of social networks and then propose a novel diffusion model called the synergism-based three-step cascade model (TSSCM) based on the above analysis and the three-degree influence theory. Finally, we devise an algorithm for solving the IM problem based on the TSSCM. Experiments on five real large-scale social networks demonstrate the efficacy of our method, which achieves competitive results in terms of influence spreading compared to the four other algorithms tested.
Collapse
|
14
|
Generating New Space-Filling Test Instances for Continuous Black-Box Optimization. EVOLUTIONARY COMPUTATION 2019; 28:379-404. [PMID: 31295020 DOI: 10.1162/evco_a_00262] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This article presents a method to generate diverse and challenging new test instances for continuous black-box optimization. Each instance is represented as a feature vector of exploratory landscape analysis measures. By projecting the features into a two-dimensional instance space, the location of existing test instances can be visualized, and their similarities and differences revealed. New instances are generated through genetic programming which evolves functions with controllable characteristics. Convergence to selected target points in the instance space is used to drive the evolutionary process, such that the new instances span the entire space more comprehensively. We demonstrate the method by generating two-dimensional functions to visualize its success, and ten-dimensional functions to test its scalability. We show that the method can recreate existing test functions when target points are co-located with existing functions, and can generate new functions with entirely different characteristics when target points are located in empty regions of the instance space. Moreover, we test the effectiveness of three state-of-the-art algorithms on the new set of instances. The results demonstrate that the new set is not only more diverse than a well-known benchmark set, but also more challenging for the tested algorithms. Hence, the method opens up a new avenue for developing test instances with controllable characteristics, necessary to expose the strengths and weaknesses of algorithms, and drive algorithm development.
Collapse
|
15
|
Simple Hyper-Heuristics Control the Neighbourhood Size of Randomised Local Search Optimally for LeadingOnes . EVOLUTIONARY COMPUTATION 2019; 28:437-461. [PMID: 31120773 DOI: 10.1162/evco_a_00258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Selection hyper-heuristics (HHs) are randomised search methodologies which choose and execute heuristics during the optimisation process from a set of low-level heuristics. A machine learning mechanism is generally used to decide which low-level heuristic should be applied in each decision step. In this article, we analyse whether sophisticated learning mechanisms are always necessary for HHs to perform well. To this end we consider the most simple HHs from the literature and rigorously analyse their performance for the LeadingOnes benchmark function. Our analysis shows that the standard Simple Random, Permutation, Greedy, and Random Gradient HHs show no signs of learning. While the former HHs do not attempt to learn from the past performance of low-level heuristics, the idea behind the Random Gradient HH is to continue to exploit the currently selected heuristic as long as it is successful. Hence, it is embedded with a reinforcement learning mechanism with the shortest possible memory. However, the probability that a promising heuristic is successful in the next step is relatively low when perturbing a reasonable solution to a combinatorial optimisation problem. We generalise the "simple" Random Gradient HH so success can be measured over a fixed period of time τ, instead of a single iteration. For LeadingOnes we prove that the Generalised Random Gradient (GRG) HH can learn to adapt the neighbourhood size of Randomised Local Search to optimality during the run. As a result, we prove it has the best possible performance achievable with the low-level heuristics (Randomised Local Search with different neighbourhood sizes), up to lower-order terms. We also prove that the performance of the HH improves as the number of low-level local search heuristics to choose from increases. In particular, with access to k low-level local search heuristics, it outperforms the best-possible algorithm using any subset of the k heuristics. Finally, we show that the advantages of GRG over Randomised Local Search and Evolutionary Algorithms using standard bit mutation increase if the anytime performance is considered (i.e., the performance gap is larger if approximate solutions are sought rather than exact ones). Experimental analyses confirm these results for different problem sizes (up to n=108) and shed some light on the best choices for the parameter τ in various situations.
Collapse
|
16
|
Parameterized Analysis of Multiobjective Evolutionary Algorithms and the Weighted Vertex Cover Problem. EVOLUTIONARY COMPUTATION 2019; 27:559-575. [PMID: 31012735 DOI: 10.1162/evco_a_00255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Evolutionary multiobjective optimization for the classical vertex cover problem has been analysed in Kratsch and Neumann (2013) in the context of parameterized complexity analysis. This article extends the analysis to the weighted vertex cover problem in which integer weights are assigned to the vertices and the goal is to find a vertex cover of minimum weight. Using an alternative mutation operator introduced in Kratsch and Neumann (2013), we provide a fixed parameter evolutionary algorithm with respect to OPT, the cost of an optimal solution for the problem. Moreover, we present a multiobjective evolutionary algorithm with standard mutation operator that keeps the population size in a polynomial order by means of a proper diversity mechanism, and therefore, manages to find a 2-approximation in expected polynomial time. We also introduce a population-based evolutionary algorithm which finds a (1+ɛ)-approximation in expected time O(n·2min{n,2(1-ɛ)OPT}+n3).
Collapse
|
17
|
A heuristic method for fast and accurate phasing and imputation of single-nucleotide polymorphism data in bi-parental plant populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2018; 131:2345-2357. [PMID: 30078163 PMCID: PMC6208939 DOI: 10.1007/s00122-018-3156-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 07/31/2018] [Indexed: 05/02/2023]
Abstract
Key message New fast and accurate method for phasing and imputation of SNP chip genotypes within diploid bi-parental plant populations. This paper presents a new heuristic method for phasing and imputation of genomic data in diploid plant species. Our method, called AlphaPlantImpute, explicitly leverages features of plant breeding programmes to maximise the accuracy of imputation. The features are a small number of parents, which can be inbred and usually have high-density genomic data, and few recombinations separating parents and focal individuals genotyped at low density (i.e. descendants that are the imputation targets). AlphaPlantImpute works roughly in three steps. First, it identifies informative low-density genotype markers in parents. Second, it tracks the inheritance of parental alleles and haplotypes to focal individuals at informative markers. Finally, it uses this low-density information as anchor points to impute focal individuals to high density. We tested the imputation accuracy of AlphaPlantImpute in simulated bi-parental populations across different scenarios. We also compared its accuracy to existing software called PlantImpute. In general, AlphaPlantImpute had better or equal imputation accuracy as PlantImpute. The computational time and memory requirements of AlphaPlantImpute were tiny compared to PlantImpute. For example, accuracy of imputation was 0.96 for a scenario where both parents were inbred and genotyped at 25,000 markers per chromosome and a focal F2 individual was genotyped with 50 markers per chromosome. The maximum memory requirement for this scenario was 0.08 GB and took 37 s to complete.
Collapse
|
18
|
Solution to travelling salesman problem by clusters and a modified multi-restart iterated local search metaheuristic. PLoS One 2018; 13:e0201868. [PMID: 30133477 PMCID: PMC6104944 DOI: 10.1371/journal.pone.0201868] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 07/24/2018] [Indexed: 11/23/2022] Open
Abstract
This article finds feasible solutions to the travelling salesman problem, obtaining the route with the shortest distance to visit n cities just once, returning to the starting city. The problem addressed is clustering the cities, then using the NEH heuristic, which provides an initial solution that is refined using a modification of the metaheuristic Multi-Restart Iterated Local Search MRSILS; finally, clusters are joined to end the route with the minimum distance to the travelling salesman problem. The contribution of this research is the use of the metaheuristic MRSILS, that in our knowledge had not been used to solve the travelling salesman problem using clusters. The main objective of this article is to demonstrate that the proposed algorithm is more efficient than Genetic Algorithms when clusters are used. To demonstrate the above, both algorithms are compared with some cases taken from the literature, also a comparison with the best-known results is done. In addition, statistical studies are made in the same conditions to demonstrate this fact. Our method obtains better results in all the 10 cases compared.
Collapse
|
19
|
Discovering Gene Regulatory Elements Using Coverage-Based Heuristics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1290-1300. [PMID: 26540692 DOI: 10.1109/tcbb.2015.2496261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Data mining algorithms and sequencing methods (such as RNA-seq and ChIP-seq) are being combined to discover genomic regulatory motifs that relate to a variety of phenotypes. However, motif discovery algorithms often produce very long lists of putative transcription factor binding sites, hindering the discovery of phenotype-related regulatory elements by making it difficult to select a manageable set of candidate motifs for experimental validation. To address this issue, the authors introduce the motif selection problem and provide coverage-based search heuristics for its solution. Analysis of 203 ChIP-seq experiments from the ENCyclopedia of DNA Elements project shows that our algorithms produce motifs that have high sensitivity and specificity and reveals new insights about the regulatory code of the human genome. The greedy algorithm performs the best, selecting a median of two motifs per ChIP-seq transcription factor group while achieving a median sensitivity of 77 percent.
Collapse
|
20
|
Theoretical Analysis of Local Search and Simple Evolutionary Algorithms for the Generalized Travelling Salesperson Problem. EVOLUTIONARY COMPUTATION 2018; 27:525-558. [PMID: 29932364 DOI: 10.1162/evco_a_00233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The generalized travelling salesperson problem is an important NP-hard combinatorial optimization problem for which metaheuristics, such as local search and evolutionary algorithms, have been used very successfully. Two hierarchical approaches with different neighbourhood structures, namely a cluster-based approach and a node-based approach, have been proposed by Hu and Raidl (2008) for solving this problem. In this article, local search algorithms and simple evolutionary algorithms based on these approaches are investigated from a theoretical perspective. For local search algorithms, we point out the complementary abilities of the two approaches by presenting instances where they mutually outperform each other. Afterwards, we introduce an instance which is hard for both approaches when initialized on a particular point of the search space, but where a variable neighbourhood search combining them finds the optimal solution in polynomial time. Then we turn our attention to analysing the behaviour of simple evolutionary algorithms that use these approaches. We show that the node-based approach solves the hard instance of the cluster-based approach presented in Corus et al. (2016) in polynomial time. Furthermore, we prove an exponential lower bound on the optimization time of the node-based approach for a class of Euclidean instances.
Collapse
|
21
|
A Hybrid Genetic Programming Algorithm for Automated Design of Dispatching Rules. EVOLUTIONARY COMPUTATION 2018; 27:467-496. [PMID: 29863420 DOI: 10.1162/evco_a_00230] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Designing effective dispatching rules for production systems is a difficult and time-consuming task if it is done manually. In the last decade, the growth of computing power, advanced machine learning, and optimisation techniques has made the automated design of dispatching rules possible and automatically discovered rules are competitive or outperform existing rules developed by researchers. Genetic programming is one of the most popular approaches to discovering dispatching rules in the literature, especially for complex production systems. However, the large heuristic search space may restrict genetic programming from finding near optimal dispatching rules. This article develops a new hybrid genetic programming algorithm for dynamic job shop scheduling based on a new representation, a new local search heuristic, and efficient fitness evaluators. Experiments show that the new method is effective regarding the quality of evolved rules. Moreover, evolved rules are also significantly smaller and contain more relevant attributes.
Collapse
|
22
|
Anatomy of the Attraction Basins: Breaking with the Intuition. EVOLUTIONARY COMPUTATION 2018; 27:435-466. [PMID: 29786459 DOI: 10.1162/evco_a_00227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Solving combinatorial optimization problems efficiently requires the development of algorithms that consider the specific properties of the problems. In this sense, local search algorithms are designed over a neighborhood structure that partially accounts for these properties. Considering a neighborhood, the space is usually interpreted as a natural landscape, with valleys and mountains. Under this perception, it is commonly believed that, if maximizing, the solutions located in the slopes of the same mountain belong to the same attraction basin, with the peaks of the mountains being the local optima. Unfortunately, this is a widespread erroneous visualization of a combinatorial landscape. Thus, our aim is to clarify this aspect, providing a detailed analysis of, first, the existence of plateaus where the local optima are involved, and second, the properties that define the topology of the attraction basins, picturing a reliable visualization of the landscapes. Some of the features explored in this article have never been examined before. Hence, new findings about the structure of the attraction basins are shown. The study is focused on instances of permutation-based combinatorial optimization problems considering the 2-exchange and the insert neighborhoods. As a consequence of this work, we break away from the extended belief about the anatomy of attraction basins.
Collapse
|
23
|
Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing. PLoS Comput Biol 2017; 13:e1005777. [PMID: 28968408 PMCID: PMC5645146 DOI: 10.1371/journal.pcbi.1005777] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Revised: 10/17/2017] [Accepted: 09/18/2017] [Indexed: 11/25/2022] Open
Abstract
With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS) if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively. High-throughput sequencing data has been accumulating at an extreme pace. The need to efficiently analyze and process it has become a critical challenge of the field. Many of the data structures and algorithms for this task rely on k-mer sets (DNA words of length k) to represent the sequences in a dataset. The runtime and memory usage of these highly depend on the size of the k-mer sets used. Thus, a minimum-size k-mer hitting set, namely, a set of k-mers that hit (have non-empty overlap with) all sequences, is desirable. In this work, we create universal k-mer hitting sets that hit any L-long sequence. We present several heuristic approaches for constructing such small sets; the approaches vary in the trade-off between the size of the produced set and runtime and memory usage. We show the benefit in practice of using the produced universal k-mer hitting sets compared to minimizers and randomly created hitting sets on the human genome.
Collapse
|
24
|
Local introduction and heterogeneous spatial spread of dengue-suppressing Wolbachia through an urban population of Aedes aegypti. PLoS Biol 2017; 15:e2001894. [PMID: 28557993 PMCID: PMC5448718 DOI: 10.1371/journal.pbio.2001894] [Citation(s) in RCA: 144] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2016] [Accepted: 04/17/2017] [Indexed: 11/30/2022] Open
Abstract
Dengue-suppressing Wolbachia strains are promising tools for arbovirus control, particularly as they have the potential to self-spread following local introductions. To test this, we followed the frequency of the transinfected Wolbachia strain wMel through Ae. aegypti in Cairns, Australia, following releases at 3 nonisolated locations within the city in early 2013. Spatial spread was analysed graphically using interpolation and by fitting a statistical model describing the position and width of the wave. For the larger 2 of the 3 releases (covering 0.97 km2 and 0.52 km2), we observed slow but steady spatial spread, at about 100–200 m per year, roughly consistent with theoretical predictions. In contrast, the smallest release (0.11 km2) produced erratic temporal and spatial dynamics, with little evidence of spread after 2 years. This is consistent with the prediction concerning fitness-decreasing Wolbachia transinfections that a minimum release area is needed to achieve stable local establishment and spread in continuous habitats. Our graphical and likelihood analyses produced broadly consistent estimates of wave speed and wave width. Spread at all sites was spatially heterogeneous, suggesting that environmental heterogeneity will affect large-scale Wolbachia transformations of urban mosquito populations. The persistence and spread of Wolbachia in release areas meeting minimum area requirements indicates the promise of successful large-scale population transformation. Wolbachia are bacteria that live inside insect cells. In insects that act as viral vectors, Wolbachia can suppress virus transmission to new hosts. Wolbachia have been experimentally introduced into Aedes aegypti mosquito populations to reduce the transmission of dengue, Zika, and other arboviruses that cause human disease. Wolbachia invade populations by causing cytoplasmic incompatibility, a phenomenon whereby embryos from crosses between infected males and uninfected females fail to hatch. While Wolbachia have been shown to successfully invade and remain established in isolated Ae. aegypti populations, outward spread from urban release zones has not been previously documented. This is an important step in demonstrating that Wolbachia can be used to combat mosquito-borne infectious disease in cities. Here we describe Wolbachia spread from 2 introduction areas within Cairns in northeastern Australia at a rate of about 100–200 meters per year. Spread occurs only when introduction areas are sufficiently large. The slow rates of observed spread are broadly consistent with mathematical predictions based on estimated Ae. aegypti dispersal distances, Wolbachia dynamics, and effects seen in isolated populations. Spread is uneven and likely depends on local characteristics (e.g., barriers) that affect mosquito density and dispersal. Our data indicate that Wolbachia can be introduced locally in large cities, remain established where released, and slowly spread from release areas. These dynamics indicate that high Wolbachia infection frequencies can be established gradually across large urban areas through local releases.
Collapse
|
25
|
Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix. ACTA ACUST UNITED AC 2017; 17:147-154. [PMID: 28083762 PMCID: PMC5274646 DOI: 10.1007/s10969-016-9210-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 12/05/2016] [Indexed: 12/28/2022]
Abstract
Protein database search for public databases is a fundamental step in the target selection of proteins in structural and functional genomics and also for inferring protein structure, function, and evolution. Most database search methods employ amino acid substitution matrices to score amino acid pairs. The choice of substitution matrix strongly affects homology detection performance. We earlier proposed a substitution matrix named MIQS that was optimized for distant protein homology search. Herein we further evaluate MIQS in combination with LAST, a heuristic and fast database search tool with a tunable sensitivity parameter m, where larger m denotes higher sensitivity. Results show that MIQS substantially improves the homology detection and alignment quality performance of LAST across diverse m parameters. Against a protein database consisting of approximately 15 million sequences, LAST with m = 105 achieves better homology detection performance than BLASTP, and completes the search 20 times faster. Compared to the most sensitive existing methods being used today, CS-BLAST and SSEARCH, LAST with MIQS and m = 106 shows comparable homology detection performance at 2.0 and 3.9 times greater speed, respectively. Results demonstrate that MIQS-powered LAST is a time-efficient method for sensitive and accurate homology search.
Collapse
|
26
|
Scheduling Independent Partitions in Integrated Modular Avionics Systems. PLoS One 2016; 11:e0168064. [PMID: 27942013 PMCID: PMC5152929 DOI: 10.1371/journal.pone.0168064] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Accepted: 11/23/2016] [Indexed: 11/18/2022] Open
Abstract
Recently the integrated modular avionics (IMA) architecture has been widely adopted by the avionics industry due to its strong partition mechanism. Although the IMA architecture can achieve effective cost reduction and reliability enhancement in the development of avionics systems, it results in a complex allocation and scheduling problem. All partitions in an IMA system should be integrated together according to a proper schedule such that their deadlines will be met even under the worst case situations. In order to help provide a proper scheduling table for all partitions in IMA systems, we study the schedulability of independent partitions on a multiprocessor platform in this paper. We firstly present an exact formulation to calculate the maximum scaling factor and determine whether all partitions are schedulable on a limited number of processors. Then with a Game Theory analogy, we design an approximation algorithm to solve the scheduling problem of partitions, by allowing each partition to optimize its own schedule according to the allocations of the others. Finally, simulation experiments are conducted to show the efficiency and reliability of the approach proposed in terms of time consumption and acceptance ratio.
Collapse
|
27
|
Beacon-based opportunistic scheduling in wireless body area network. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2016:4995-4998. [PMID: 28269390 DOI: 10.1109/embc.2016.7591849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Wireless Body Area Networks (WBANs) are one of the key technologies that support the development of digital health care, which has attracted increasing attention in recent years. Compared with general Wireless Sensor Networks (WSNs), WBANs have more stringent requirements on reliability and energy efficiency. Though WBANs are applied within limited transmission range, the on-body channel condition can be very challenging because of blocking or absorbing of signal. In this paper, we are looking into the design of Medium Access Control (MAC) protocols and propose an opportunistic scheduling scheme by applying heuristic scheduling and dynamic superframe length adjustment to improve the system performance. The simulations have been supplemented to show the advantages of the proposed solutions in outage rate performance, compared with existing solutions.
Collapse
|
28
|
Hybrid Self-Adaptive Evolution Strategies Guided by Neighborhood Structures for Combinatorial Optimization Problems. EVOLUTIONARY COMPUTATION 2016; 24:637-666. [PMID: 27258842 DOI: 10.1162/evco_a_00187] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This article presents an Evolution Strategy (ES)--based algorithm, designed to self-adapt its mutation operators, guiding the search into the solution space using a Self-Adaptive Reduced Variable Neighborhood Search procedure. In view of the specific local search operators for each individual, the proposed population-based approach also fits into the context of the Memetic Algorithms. The proposed variant uses the Greedy Randomized Adaptive Search Procedure with different greedy parameters for generating its initial population, providing an interesting exploration-exploitation balance. To validate the proposal, this framework is applied to solve three different [Formula: see text]-Hard combinatorial optimization problems: an Open-Pit-Mining Operational Planning Problem with dynamic allocation of trucks, an Unrelated Parallel Machine Scheduling Problem with Setup Times, and the calibration of a hybrid fuzzy model for Short-Term Load Forecasting. Computational results point out the convergence of the proposed model and highlight its ability in combining the application of move operations from distinct neighborhood structures along the optimization. The results gathered and reported in this article represent a collective evidence of the performance of the method in challenging combinatorial optimization problems from different application domains. The proposed evolution strategy demonstrates an ability of adapting the strength of the mutation disturbance during the generations of its evolution process. The effectiveness of the proposal motivates the application of this novel evolutionary framework for solving other combinatorial optimization problems.
Collapse
|
29
|
The Unrestricted Black-Box Complexity of Jump Functions. EVOLUTIONARY COMPUTATION 2016; 24:719-744. [PMID: 27243329 DOI: 10.1162/evco_a_00185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We analyze the unrestricted black-box complexity of the Jump function classes for different jump sizes. For upper bounds, we present three algorithms for small, medium, and extreme jump sizes. We prove a matrix lower bound theorem which is capable of giving better lower bounds than the classic information theory approach. Using this theorem, we prove lower bounds that almost match the upper bounds. For the case of extreme jump functions, which apart from the optimum reveal only the middle fitness value(s), we use an additional lower bound argument to show that any black-box algorithm does not gain significant insight about the problem instance from the first [Formula: see text] fitness evaluations. This, together with our upper bound, shows that the black-box complexity of extreme jump functions is [Formula: see text].
Collapse
|
30
|
Abstract
We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
Collapse
|
31
|
A general proof of consistency of heuristic classification for cognitive diagnosis models. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2015; 68:387-409. [PMID: 25872467 DOI: 10.1111/bmsp.12055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2013] [Revised: 02/22/2015] [Indexed: 06/04/2023]
Abstract
The Asymptotic Classification Theory of Cognitive Diagnosis (Chiu et al., 2009, Psychometrika, 74, 633-665) determined the conditions that cognitive diagnosis models must satisfy so that the correct assignment of examinees to proficiency classes is guaranteed when non-parametric classification methods are used. These conditions have only been proven for the Deterministic Input Noisy Output AND gate model. For other cognitive diagnosis models, no theoretical legitimization exists for using non-parametric classification techniques for assigning examinees to proficiency classes. The specific statistical properties of different cognitive diagnosis models require tailored proofs of the conditions of the Asymptotic Classification Theory of Cognitive Diagnosis for each individual model – a tedious undertaking in light of the numerous models presented in the literature. In this paper a different way is presented to address this task. The unified mathematical framework of general cognitive diagnosis models is used as a theoretical basis for a general proof that under mild regularity conditions any cognitive diagnosis model is covered by the Asymptotic Classification Theory of Cognitive Diagnosis.
Collapse
|