1
|
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons. Viruses 2023; 15:519. [PMID: 36851733 PMCID: PMC9966878 DOI: 10.3390/v15020519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/01/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023] Open
Abstract
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the Microviridae family and viruses of the Flavivirus genus. We obtained conserved models for the generic detection of any Microviridae or Flavivirus sequence, and profile HMMs that can specifically discriminate Microviridae subfamilies or Flavivirus species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
Collapse
Affiliation(s)
- Liliane S. Oliveira
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Bas E. Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich-Schiller-University Jena, 07743 Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
2
|
Abstract
Design of primers and probes is one of the most crucial factors affecting the success and quality of quantitative real-time PCR (qPCR) analyses, since an accurate and reliable quantification depends on using efficient primers and probes. Design of primers and probes should meet several criteria to find potential primers and probes for specific qPCR assays. The formation of primer-dimers and other non-specific products should be avoided or reduced. This factor is especially important when designing primers for SYBR(®) Green protocols but also in designing probes to ensure specificity of the developed qPCR protocol. To design primers and probes for qPCR, multiple software programs and websites are available being numerous of them free. These tools often consider the default requirements for primers and probes, although new research advances in primer and probe design should be progressively added to different algorithm programs. After a proper design, a precise validation of the primers and probes is necessary. Specific consideration should be taken into account when designing primers and probes for multiplex qPCR and reverse transcription qPCR (RT-qPCR). This chapter provides guidelines for the design of suitable primers and probes and their subsequent validation through the development of singlex qPCR, multiplex qPCR, and RT-qPCR protocols.
Collapse
|
3
|
Carmona R, Zafra A, Seoane P, Castro AJ, Guerrero-Fernández D, Castillo-Castillo T, Medina-García A, Cánovas FM, Aldana-Montes JF, Navas-Delgado I, Alché JDD, Claros MG. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome. FRONTIERS IN PLANT SCIENCE 2015; 6:625. [PMID: 26322066 PMCID: PMC4531244 DOI: 10.3389/fpls.2015.00625] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 07/28/2015] [Indexed: 05/18/2023]
Abstract
Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species.
Collapse
Affiliation(s)
- Rosario Carmona
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
| | - Adoración Zafra
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - Pedro Seoane
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
| | - Antonio J. Castro
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - Darío Guerrero-Fernández
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
| | | | - Ana Medina-García
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Francisco M. Cánovas
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
| | - José F. Aldana-Montes
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Ismael Navas-Delgado
- Departamento de Lenguajes y Ciencias de la Computación, Universidad de MálagaMálaga, Spain
| | - Juan de Dios Alché
- Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, Consejo Superior de Investigaciones CientíficasGranada, Spain
| | - M. Gonzalo Claros
- Plataforma Andaluza de Bioinformática, Edificio de Bioinnovación, Universidad de MálagaMálaga, Spain
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de MálagaMálaga, Spain
- *Correspondence: M. Gonzalo Claros, Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Campus de Teatinos, 29071 Málaga, Spain,
| |
Collapse
|
4
|
Wu B, Gong J, Yuan S, Zhang Y, Wei T. Patterns of evolutionary selection pressure in the immune signaling protein TRAF3IP2 in mammals. Gene 2013; 531:403-10. [PMID: 24021976 DOI: 10.1016/j.gene.2013.08.074] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Revised: 08/21/2013] [Accepted: 08/22/2013] [Indexed: 01/08/2023]
Abstract
TRAF3 interacting protein 2 (TRAF3IP2) is important for immune responses to pathogens, inflammatory signals and autoimmunity in mammals. In the present study, we collected 19 mammalian TRAF3IP2 sequences and investigated the various types of selection pressure acting on them. Maximum likelihood estimations of nonsynonymous (dN) to synonymous (dS) substitution (dN/dS) ratios for the aligned coding sequences indicated that, as a whole, TRAF3IP2 has been subject to purifying selection. However, the N-terminus of the protein has been subject to higher selection pressure than the C-terminal domain. While eight amino acid residues within the N-terminus appear to have evolved under positive selection, no evidence for such selection was found in the C-terminus. The positively selected residues, which fall outside the currently known functional sites within TRAF3IP2, may have novel functions. The different selection pressures acting on the N- and C-terminal regions are consistent with their protein structures: the C-terminal structure is an ordered structure, whereas the N-terminus is disordered. Taken together with the results of previous studies, it is plausible that positive selection on the N-terminus of TRAF3IP2 may have occurred by competitive coevolution between mammalian hosts and viruses.
Collapse
Affiliation(s)
- Baojun Wu
- Department of Biological Sciences, Wayne State University, 5047 Gullen Mall, Detroit, MI 48202, USA
| | | | | | | | | |
Collapse
|
5
|
Fernández-Pozo N, Canales J, Guerrero-Fernández D, Villalobos DP, Díaz-Moreno SM, Bautista R, Flores-Monterroso A, Guevara MÁ, Perdiguero P, Collada C, Cervera MT, Soto A, Ordás R, Cantón FR, Avila C, Cánovas FM, Claros MG. EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics 2011; 12:366. [PMID: 21762488 PMCID: PMC3152544 DOI: 10.1186/1471-2164-12-366] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Accepted: 07/15/2011] [Indexed: 11/30/2022] Open
Abstract
Background Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases. Description EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided. Conclusions The EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome.
Collapse
Affiliation(s)
- Noé Fernández-Pozo
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Campus de Teatinos s/n, Universidad de Málaga, 29071 Málaga, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|