Domínguez C, Heras J, Mata E, Pascual V, Vázquez-Garcidueñas MS, Vázquez-Marrufo G. Extending GelJ for interoperability: Filling the gap in the bioinformatics resources for population genetics analysis with dominant markers.
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;
140:69-76. [PMID:
28254092 DOI:
10.1016/j.cmpb.2016.12.001]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 10/14/2016] [Accepted: 12/05/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVE
The manual transformation of DNA fingerprints of dominant markers into the input of tools for population genetics analysis is a time-consuming and error-prone task; especially when the researcher deals with a large number of samples. In addition, when the researcher needs to use several tools for population genetics analysis, the situation worsens due to the incompatibility of data-formats across tools. The goal of this work consists in automating, from banding patterns of gel images, the input-generation for the great diversity of tools devoted to population genetics analysis.
METHODS
After a thorough analysis of tools for population genetics analysis with dominant markers, and tools for working with phylogenetic trees; we have detected the input requirements of those systems. In the case of programs devoted to phylogenetic trees, the Newick and Nexus formats are widely employed; whereas, each population genetics analysis tool uses its own specific format. In order to handle such a diversity of formats in the latter case, we have developed a new XML format, called PopXML, that takes into account the variety of information required by each population genetics analysis tool. Moreover, the acquired knowledge has been incorporated into the pipeline of the GelJ system - a tool for analysing DNA fingerprint gel images - to reach our automatisation goal.
RESULTS
We have implemented, in the GelJ system, a pipeline that automatically generates, from gel banding patterns, the input of tools for population genetics analysis and phylogenetic trees. Such a pipeline has been employed to successfully generate, from thousands of banding patterns, the input of 29 population genetics analysis tools and 32 tools for managing phylogenetic trees.
CONCLUSIONS
GelJ has become the first tool that fills the gap between gel image processing software and population genetics analysis with dominant markers, phylogenetic reconstruction, and tree editing software. This has been achieved by automating the process of generating the input for the latter software from gel banding patterns processed by GelJ.
Collapse