1
|
López-Fernández H, Duque P, Henriques S, Vázquez N, Fdez-Riverola F, Vieira CP, Reboiro-Jato M, Vieira J. Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences. Interdiscip Sci 2018; 11:1-9. [PMID: 30511150 DOI: 10.1007/s12539-018-0312-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 11/19/2018] [Accepted: 11/28/2018] [Indexed: 01/22/2023]
Abstract
Useful insight into the evolution of genes and gene families can be provided by the analysis of all available genome datasets rather than just a few, which are usually those of model species. Handling and transforming such datasets into the desired format for downstream analyses is, however, often a difficult and time-consuming task for researchers without a background in informatics. Therefore, we present two simple and fast protocols for data preparation, using an easy-to-install, open-source, cross-platform software application with user-friendly, rich graphical user interface (SEDA; http://www.sing-group.org/seda/index.html ). The first protocol is a substantial improvement over one recently published (López-Fernández et al. Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88-96 (2019)[1]), which was used to study the evolution of GULO, a gene that encodes the enzyme responsible for the last step of vitamin C synthesis. In this paper, we show how the sequence data file used for the phylogenetic analyses can now be obtained much faster by changing the way coding sequence isoforms are removed, using the newly implemented SEDA operation "Remove isoforms". This protocol can be used to easily show that putative functional GULO genes are present in several Prostotomian groups such as Molluscs, Priapulida and Arachnida. Such findings could have been easily missed if only a few Protostomian model species had been used. The second protocol allowed us to identify positively selected amino acid sites in a set of 19 primate HLA immunity genes. Interestingly, the proteins encoded by MHC class II genes can show just as many positively selected amino acid sites as those encoded by classical MHC class I genes. Although a significant percentage of codons, which can be as high as 14.8%, are evolving under positive selection, the main mode of evolution of HLA immunity genes is purifying selection. Using a large number of primate species, the probability of missing the identification of positively selected amino acid sites is lower. Both projects were performed in less than one week, and most of the time was spent running the analyses rather than preparing the files. Such protocols can be easily adapted to answer many other questions using a phylogenetic approach.
Collapse
Affiliation(s)
- Hugo López-Fernández
- ESEI -Escuela Superior de Ingeniería Informática, Universidad de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain. .,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia), Vigo, Spain. .,SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain. .,Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal. .,Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.
| | - Pedro Duque
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.,Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre 1021/1055, 4169-007, Porto, Portugal
| | - Sílvia Henriques
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal
| | - Noé Vázquez
- ESEI -Escuela Superior de Ingeniería Informática, Universidad de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia), Vigo, Spain
| | - Florentino Fdez-Riverola
- ESEI -Escuela Superior de Ingeniería Informática, Universidad de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia), Vigo, Spain.,SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Cristina P Vieira
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal
| | - Miguel Reboiro-Jato
- ESEI -Escuela Superior de Ingeniería Informática, Universidad de Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia), Vigo, Spain.,SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Jorge Vieira
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal.,Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135, Porto, Portugal
| |
Collapse
|