Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms.
Artif Intell Med 2010;
50:193-201. [PMID:
20650616 DOI:
10.1016/j.artmed.2010.05.010]
[Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Revised: 05/27/2010] [Accepted: 05/30/2010] [Indexed: 11/22/2022]
Abstract
OBJECTIVES
This paper presents an optimization algorithm for the automatic selection of a minimal subset of tagging single nucleotide polymorphisms (SNPs).
METHODS AND MATERIALS
The determination of the set of minimal tagging SNPs is approached as an optimization problem in which each tagged SNP can be covered by a single tagging SNP or by a pair of tagging SNPs. The problem is solved using an estimation of distribution algorithm (EDA) which takes advantage of the underlying topological structure defined by the SNP correlations to model the problem interactions. The EDA stochastically searches the constrained space of feasible solutions. It is evaluated across HapMap reference panel data sets.
RESULTS
The EDA was compared with a SAT solver, able to find the single-marker minimal tagging sets, and with the Tagger program. The percentage of reduction ranged from 10% to 43% in the number of tagging SNPs of the minimal multi-marker tagging set found by the EDA with respect to the other algorithms.
CONCLUSIONS
The introduced algorithm is effective for the identification of minimal multi-marker SNP sets, which considerably reduce the dimension of the tagging SNP set in comparison with single-marker sets. Other variants of the SNP problem can be treated following the same approach.
Collapse