1
|
Benítez-Hidalgo A, Aldana-Montes JF, Navas-Delgado I, Roldán-García MDM. SALON ontology for the formal description of sequence alignments. BMC Bioinformatics 2023; 24:69. [PMID: 36849882 PMCID: PMC9972671 DOI: 10.1186/s12859-023-05190-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Information provided by high-throughput sequencing platforms allows the collection of content-rich data about biological sequences and their context. Sequence alignment is a bioinformatics approach to identifying regions of similarity in DNA, RNA, or protein sequences. However, there is no consensus about the specific common terminology and representation for sequence alignments. Thus, automatically linking the wide existing knowledge about the sequences with the alignments is challenging. RESULTS The Sequence Alignment Ontology (SALON) defines a helpful vocabulary for representing and semantically annotating pairwise and multiple sequence alignments. SALON is an OWL 2 ontology that supports automated reasoning for alignments validation and retrieving complementary information from public databases under the Open Linked Data approach. This will reduce the effort needed by scientists to interpret the sequence alignment results. CONCLUSIONS SALON defines a full range of controlled terminology in the domain of sequence alignments. It can be used as a mediated schema to integrate data from different sources and validate acquired knowledge.
Collapse
Affiliation(s)
- Antonio Benítez-Hidalgo
- Departamento de Lenguajes y Ciencias de la Computación, University of Málaga, Málaga, Spain. .,University of Málaga, ITIS Software, Ada Byron Research Building, Málaga, Spain. .,Instituto de Investigación Biomédica de Málaga - IBIMA, Málaga, Spain.
| | - José F. Aldana-Montes
- grid.10215.370000 0001 2298 7828Departamento de Lenguajes y Ciencias de la Computación, University of Málaga, Málaga, Spain ,grid.10215.370000 0001 2298 7828University of Málaga, ITIS Software, Ada Byron Research Building, Málaga, Spain ,grid.452525.1Instituto de Investigación Biomédica de Málaga – IBIMA, Málaga, Spain
| | - Ismael Navas-Delgado
- grid.10215.370000 0001 2298 7828Departamento de Lenguajes y Ciencias de la Computación, University of Málaga, Málaga, Spain ,grid.10215.370000 0001 2298 7828University of Málaga, ITIS Software, Ada Byron Research Building, Málaga, Spain ,grid.452525.1Instituto de Investigación Biomédica de Málaga – IBIMA, Málaga, Spain
| | - María del Mar Roldán-García
- grid.10215.370000 0001 2298 7828Departamento de Lenguajes y Ciencias de la Computación, University of Málaga, Málaga, Spain ,grid.10215.370000 0001 2298 7828University of Málaga, ITIS Software, Ada Byron Research Building, Málaga, Spain ,grid.452525.1Instituto de Investigación Biomédica de Málaga – IBIMA, Málaga, Spain
| |
Collapse
|
2
|
Wong KY, Tan KY, Tan NH, Gnanathasan CA, Tan CH. Elucidating the Venom Diversity in Sri Lankan Spectacled Cobra ( Naja naja) through De Novo Venom Gland Transcriptomics, Venom Proteomics and Toxicity Neutralization. Toxins (Basel) 2021; 13:558. [PMID: 34437429 PMCID: PMC8402536 DOI: 10.3390/toxins13080558] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/01/2021] [Accepted: 08/05/2021] [Indexed: 01/18/2023] Open
Abstract
Inadequate effectiveness of Indian antivenoms in treating envenomation caused by the Spectacled Cobra/Indian Cobra (Naja naja) in Sri Lanka has been attributed to geographical variations in the venom composition. This study investigated the de novo venom-gland transcriptomics and venom proteomics of the Sri Lankan N. naja (NN-SL) to elucidate its toxin gene diversity and venom variability. The neutralization efficacy of a commonly used Indian antivenom product in Sri Lanka was examined against the lethality induced by NN-SL venom in mice. The transcriptomic study revealed high expression of 22 toxin genes families in NN-SL, constituting 46.55% of total transcript abundance. Three-finger toxins (3FTX) were the most diversely and abundantly expressed (87.54% of toxin gene expression), consistent with the dominance of 3FTX in the venom proteome (72.19% of total venom proteins). The 3FTX were predominantly S-type cytotoxins/cardiotoxins (CTX) and α-neurotoxins of long-chain or short-chain subtypes (α-NTX). CTX and α-NTX are implicated in local tissue necrosis and fatal neuromuscular paralysis, respectively, in envenomation caused by NN-SL. Intra-species variations in the toxin gene sequences and expression levels were apparent between NN-SL and other geographical specimens of N. naja, suggesting potential antigenic diversity that impacts antivenom effectiveness. This was demonstrated by limited potency (0.74 mg venom/ml antivenom) of the Indian polyvalent antivenom (VPAV) in neutralizing the NN-SL venom. A pan-regional antivenom with improved efficacy to treat N. naja envenomation is needed.
Collapse
Affiliation(s)
- Kin Ying Wong
- Venom Research and Toxicology Laboratory, Department of Pharmacology, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia;
| | - Kae Yi Tan
- Protein and Interactomics Laboratory, Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia;
| | - Nget Hong Tan
- Protein and Interactomics Laboratory, Department of Molecular Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia;
| | | | - Choo Hock Tan
- Venom Research and Toxicology Laboratory, Department of Pharmacology, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, Malaysia;
| |
Collapse
|
3
|
Parrot C, Moulinier L, Bernard F, Hashem Y, Dupuy D, Sissler M. Peculiarities of aminoacyl-tRNA synthetases from trypanosomatids. J Biol Chem 2021; 297:100913. [PMID: 34175310 PMCID: PMC8319005 DOI: 10.1016/j.jbc.2021.100913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 10/28/2022] Open
Abstract
Trypanosomatid parasites are responsible for various human diseases, such as sleeping sickness, animal trypanosomiasis, or cutaneous and visceral leishmaniases. The few available drugs to fight related parasitic infections are often toxic and present poor efficiency and specificity, and thus, finding new molecular targets is imperative. Aminoacyl-tRNA synthetases (aaRSs) are essential components of the translational machinery as they catalyze the specific attachment of an amino acid onto cognate tRNA(s). In trypanosomatids, one gene encodes both cytosolic- and mitochondrial-targeted aaRSs, with only three exceptions. We identify here a unique specific feature of aaRSs from trypanosomatids, which is that most of them harbor distinct insertion and/or extension sequences. Among the 26 identified aaRSs in the trypanosome Leishmania tarentolae, 14 contain an additional domain or a terminal extension, confirmed in mature mRNAs by direct cDNA nanopore sequencing. Moreover, these RNA-Seq data led us to address the question of aaRS dual localization and to determine splice-site locations and the 5'-UTR lengths for each mature aaRS-encoding mRNA. Altogether, our results provided evidence for at least one specific mechanism responsible for mitochondrial addressing of some L. tarentolae aaRSs. We propose that these newly identified features of trypanosomatid aaRSs could be developed as relevant drug targets to combat the diseases caused by these parasites.
Collapse
Affiliation(s)
- Camila Parrot
- ARNA - UMR5320 CNRS - U1212 INSERM, Université de Bordeaux, IECB, Pessac, France
| | - Luc Moulinier
- CSTB Complex Systems and Translational Bioinformatics, ICube laboratory and Strasbourg Federation of Translational Medicine (FMTS), CNRS, Université de Strasbourg, Strasbourg, France
| | - Florian Bernard
- ARNA - UMR5320 CNRS - U1212 INSERM, Université de Bordeaux, IECB, Pessac, France
| | - Yaser Hashem
- ARNA - UMR5320 CNRS - U1212 INSERM, Université de Bordeaux, IECB, Pessac, France
| | - Denis Dupuy
- ARNA - UMR5320 CNRS - U1212 INSERM, Université de Bordeaux, IECB, Pessac, France
| | - Marie Sissler
- ARNA - UMR5320 CNRS - U1212 INSERM, Université de Bordeaux, IECB, Pessac, France.
| |
Collapse
|
4
|
Lü X, Han SC, Li ZG, Li LY, Li J. Gene Characterization and Enzymatic Activities Related to Trehalose Metabolism of In Vitro Reared Trichogramma dendrolimi Matsumura (Hymenoptera: Trichogrammatidae) under Sustained Cold Stress. INSECTS 2020; 11:insects11110767. [PMID: 33171708 PMCID: PMC7694998 DOI: 10.3390/insects11110767] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 10/28/2020] [Accepted: 11/04/2020] [Indexed: 11/18/2022]
Abstract
Simple Summary Trehalose is a non-reducing disaccharide that presents in a wide variety of organisms, where it serves as an energy source or stress protectant. Trehalose is the most characteristic sugar of insect hemolymph and plays a crucial role in the regulation of insect growth and development. Trichogramma species are economically important egg parasitoids, which are being mass-produced for biological control programs worldwide. Many Trichogramma species could be mass reared on artificial mediums (not insect eggs), in which components contain insect hemolymph and trehalose. These in vitro-reared parasitoid wasps were strongly affected by cold storage, but prepupae could be successfully stored at 13 °C for up to 4 weeks. The aims of the present study were to determine the role of trehalose and the relationship between trehalose and egg parasitoid stress resistance. Our study revealed that (1) trehalose regulated the growth under sustained cold stress; (2) prepupal stage is a critical developmental period and 13 °C is the cold tolerance threshold temperature; (3) in vitro reared Trichogramma dendrolimi could be reared at temperatures of 16 °C, 20 °C, and 23 °C to reduce rearing costs. This finding identifies a low cost, prolonged development rearing method for T. dendrolimi, which will facilitate improved mass rearing methods for biocontrol. Abstract Trichogramma spp. is an important egg parasitoid wasp for biocontrol of agriculture and forestry insect pests. Trehalose serves as an energy source or stress protectant for insects. To study the potential role of trehalose in cold resistance on an egg parasitoid, cDNA for trehalose-6-phosphate synthase (TPS) and soluble trehalase (TRE) from Trichogramma dendrolimi were cloned and characterized. Gene expressions and enzyme activities of TdTPS and TdTRE were determined in larvae, prepupae, pupae, and adults at sustained low temperatures, 13 °C and 16 °C. TdTPS and TdTRE expressions had similar patterns with higher levels in prepupae at 13 °C and 16 °C. TdTPS enzyme activities increased with a decrease of temperature, and TdTRE activity in prepupae decreased sharply at these two low temperatures. In vitro reared T. dendrolimi could complete entire development above 13 °C, and the development period was prolonged without cold injury. Results indicated trehalose might regulate growth and the metabolic process of cold tolerance. Moreover, 13 °C is the cold tolerance threshold temperature and the prepupal stage is a critical developmental period for in vitro reared T. dendrolimi. These findings identify a low cost, prolonged development rearing method, and the cold tolerance for T. dendrolimi, which will facilitate improved mass rearing methods for biocontrol.
Collapse
Affiliation(s)
- Xin Lü
- Correspondence: (X.L.); (J.L.)
| | | | | | | | - Jun Li
- Correspondence: (X.L.); (J.L.)
| |
Collapse
|
5
|
Kress A, Lecompte O, Poch O, Thompson JD. PROBE: analysis and visualization of protein block-level evolution. Bioinformatics 2018; 34:3390-3392. [DOI: 10.1093/bioinformatics/bty367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/04/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| |
Collapse
|
6
|
Moulinier L, Ripp R, Castillo G, Poch O, Sissler M. MiSynPat: An integrated knowledge base linking clinical, genetic, and structural data for disease-causing mutations in human mitochondrial aminoacyl-tRNA synthetases. Hum Mutat 2017; 38:1316-1324. [PMID: 28608363 PMCID: PMC5638098 DOI: 10.1002/humu.23277] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Revised: 06/02/2017] [Accepted: 06/06/2017] [Indexed: 11/25/2022]
Abstract
Numerous mutations in each of the mitochondrial aminoacyl‐tRNA synthetases (aaRSs) have been implicated in human diseases. The mutations are autosomal and recessive and lead mainly to neurological disorders, although with pleiotropic effects. The processes and interactions that drive the etiology of the disorders associated with mitochondrial aaRSs (mt‐aaRSs) are far from understood. The complexity of the clinical, genetic, and structural data requires concerted, interdisciplinary efforts to understand the molecular biology of these disorders. Toward this goal, we designed MiSynPat, a comprehensive knowledge base together with an ergonomic Web server designed to organize and access all pertinent information (sequences, multiple sequence alignments, structures, disease descriptions, mutation characteristics, original literature) on the disease‐linked human mt‐aaRSs. With MiSynPat, a user can also evaluate the impact of a possible mutation on sequence‐conservation‐structure in order to foster the links between basic and clinical researchers and to facilitate future diagnosis. The proposed integrated view, coupled with research on disease‐related mt‐aaRSs, will help to reveal new functions for these enzymes and to open new vistas in the molecular biology of the cell. The purpose of MiSynPat, freely available at http://misynpat.org, is to constitute a reference and a converging resource for scientists and clinicians.
Collapse
Affiliation(s)
- Luc Moulinier
- CSTB Complex Systems and Translational Bioinformatics, ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), CNRS, Université de Strasbourg, Strasbourg, France
| | - Raymond Ripp
- CSTB Complex Systems and Translational Bioinformatics, ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), CNRS, Université de Strasbourg, Strasbourg, France
| | - Gaston Castillo
- CSTB Complex Systems and Translational Bioinformatics, ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), CNRS, Université de Strasbourg, Strasbourg, France.,Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, Strasbourg, France
| | - Olivier Poch
- CSTB Complex Systems and Translational Bioinformatics, ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), CNRS, Université de Strasbourg, Strasbourg, France
| | - Marie Sissler
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, Strasbourg, France
| |
Collapse
|
7
|
Computational Identification of Post Translational Modification Regulated RNA Binding Protein Motifs. PLoS One 2015; 10:e0137696. [PMID: 26368004 PMCID: PMC4569568 DOI: 10.1371/journal.pone.0137696] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 08/19/2015] [Indexed: 11/19/2022] Open
Abstract
RNA and its associated RNA binding proteins (RBPs) mitigate a diverse array of cellular functions and phenotypes. The interactions between RNA and RBPs are implicated in many roles of biochemical processing by the cell such as localization, protein translation, and RNA stability. Recent discoveries of novel mechanisms that are of significant evolutionary advantage between RBPs and RNA include the interaction of the RBP with the 3’ and 5’ untranslated region (UTR) of target mRNA. These mechanisms are shown to function through interaction of a trans-factor (RBP) and a cis-regulatory element (3’ or 5’ UTR) by the binding of a RBP to a regulatory-consensus nucleic acid motif region that is conserved throughout evolution. Through signal transduction, regulatory RBPs are able to temporarily dissociate from their target sites on mRNAs and induce translation, typically through a post-translational modification (PTM). These small, regulatory motifs located in the UTR of mRNAs are subject to a loss-of-function due to single polymorphisms or other mutations that disrupt the motif and inhibit the ability to associate into the complex with RBPs. The identification of a consensus motif for a given RBP is difficult, time consuming, and requires a significant degree of experimentation to identify each motif-containing gene on a genomic scale. We have developed a computational algorithm to analyze high-throughput genomic arrays that contain differential binding induced by a PTM for a RBP of interest–RBP-PTM Target Scan (RPTS). We demonstrate the ability of this application to accurately predict a PTM-specific binding motif to an RBP that has no antibody capable of distinguishing the PTM of interest, negating the use of in-vitro exonuclease digestion techniques.
Collapse
|
8
|
Khenoussi W, Vanhoutrève R, Poch O, Thompson JD. SIBIS: a Bayesian model for inconsistent protein sequence estimation. Bioinformatics 2014; 30:2432-9. [PMID: 24825613 DOI: 10.1093/bioinformatics/btu329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. RESULTS We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. AVAILABILITY AND IMPLEMENTATION Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS.
Collapse
Affiliation(s)
- Walyd Khenoussi
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, Strasbourg, F-67085, France
| | - Renaud Vanhoutrève
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, Strasbourg, F-67085, France
| | - Olivier Poch
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, Strasbourg, F-67085, France
| | - Julie D Thompson
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, Strasbourg, F-67085, France
| |
Collapse
|
9
|
Bermejo-Das-Neves C, Nguyen HN, Poch O, Thompson JD. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). BMC Bioinformatics 2014; 15:111. [PMID: 24742296 PMCID: PMC4021375 DOI: 10.1186/1471-2105-15-111] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/09/2014] [Indexed: 11/10/2022] Open
Abstract
Background Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. Results In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. Conclusions We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease.
Collapse
Affiliation(s)
| | | | | | - Julie D Thompson
- ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), University of Strasbourg and CNRS, Strasbourg, France.
| |
Collapse
|
10
|
Schwenzer H, Scheper GC, Zorn N, Moulinier L, Gaudry A, Leize E, Martin F, Florentz C, Poch O, Sissler M. Released selective pressure on a structural domain gives new insights on the functional relaxation of mitochondrial aspartyl-tRNA synthetase. Biochimie 2013; 100:18-26. [PMID: 24120687 DOI: 10.1016/j.biochi.2013.09.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 09/30/2013] [Indexed: 10/26/2022]
Abstract
Mammalian mitochondrial aminoacyl-tRNA synthetases are nuclear-encoded enzymes that are essential for mitochondrial protein synthesis. Due to an endosymbiotic origin of the mitochondria, many of them share structural domains with homologous bacterial enzymes of same specificity. This is also the case for human mitochondrial aspartyl-tRNA synthetase (AspRS) that shares the so-called bacterial insertion domain with bacterial homologs. The function of this domain in the mitochondrial proteins is unclear. Here, we show by bioinformatic analyses that the sequences coding for the bacterial insertion domain are less conserved in opisthokont and protist than in bacteria and viridiplantae. The divergence suggests a loss of evolutionary pressure on this domain for non-plant mitochondrial AspRSs. This discovery is further connected with the herein described occurrence of alternatively spliced transcripts of the mRNAs coding for some mammalian mitochondrial AspRSs. Interestingly, the spliced transcripts alternately lack one of the four exons that code for the bacterial insertion domain. Although we showed that the human alternative transcript is present in all tested tissues; co-exists with the full-length form, possesses 5'- and 3'-UTRs, a poly-A tail and is bound to polysomes, we were unable to detect the corresponding protein. The relaxed selective pressure combined with the occurrence of alternative splicing, involving a single structural sub-domain, favors the hypothesis of the loss of function of this domain for AspRSs of mitochondrial location. This evolutionary divergence is in line with other characteristics, established for the human mt-AspRS, that indicate a functional relaxation of non-viridiplantae mt-AspRSs when compared to bacterial and plant ones, despite their common ancestry.
Collapse
Affiliation(s)
- Hagen Schwenzer
- Architecture et Réactivité de l'ARN, CNRS, Université de Strasbourg, IBMC - 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Gert C Scheper
- Department of Pediatrics and Child Neurology, VU University Medical Center, 1081 HV Amsterdam, The Netherlands
| | - Nathalie Zorn
- Laboratoire de Spectrométrie de Masse des Interactions et des Systèmes, Chimie de la Matière Complexe, 1 rue Blaise Pascal, F-67008 Strasbourg Cedex, France
| | - Luc Moulinier
- Laboratoire de Bioinformatique et de Génomique Intégratives, IGBMC, 1 rue Laurent Fries BP-10142, F-67404 Illkirch Cedex, France
| | - Agnès Gaudry
- Architecture et Réactivité de l'ARN, CNRS, Université de Strasbourg, IBMC - 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Emmanuelle Leize
- Laboratoire de Spectrométrie de Masse des Interactions et des Systèmes, Chimie de la Matière Complexe, 1 rue Blaise Pascal, F-67008 Strasbourg Cedex, France
| | - Franck Martin
- Architecture et Réactivité de l'ARN, CNRS, Université de Strasbourg, IBMC - 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Catherine Florentz
- Architecture et Réactivité de l'ARN, CNRS, Université de Strasbourg, IBMC - 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| | - Olivier Poch
- Laboratoire de Bioinformatique et de Génomique Intégratives, IGBMC, 1 rue Laurent Fries BP-10142, F-67404 Illkirch Cedex, France
| | - Marie Sissler
- Architecture et Réactivité de l'ARN, CNRS, Université de Strasbourg, IBMC - 15 rue René Descartes, F-67084 Strasbourg Cedex, France.
| |
Collapse
|
11
|
Ortuño FM, Valenzuela O, Pomares H, Rojas F, Florido JP, Urquiza JM, Rojas I. Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucleic Acids Res 2012; 41:e26. [PMID: 23066102 PMCID: PMC3592395 DOI: 10.1093/nar/gks919] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased.
Collapse
Affiliation(s)
- Francisco M Ortuño
- Department of Computer Architecture and Computer Technology, University of Granada, 18071 Granada, Spain.
| | | | | | | | | | | | | |
Collapse
|
12
|
Levasseur A, Paganini J, Dainat J, Thompson JD, Poch O, Pontarotti P, Gouret P. The chordate proteome history database. Evol Bioinform Online 2012; 8:437-47. [PMID: 22904610 PMCID: PMC3418167 DOI: 10.4137/ebo.s9186] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
Collapse
Affiliation(s)
- Anthony Levasseur
- INRA, UMR1163 Biotechnologie des Champignons Filamenteux, Aix Marseille Université, ESIL Polytech, 163 avenue de Luminy, CP 925, 13288 Marseille Cedex 09, France
| | | | | | | | | | | | | |
Collapse
|
13
|
Luu TD, Rusu A, Walter V, Linard B, Poidevin L, Ripp R, Moulinier L, Muller J, Raffelsberger W, Wicker N, Lecompte O, Thompson JD, Poch O, Nguyen H. KD4v: Comprehensible Knowledge Discovery System for Missense Variant. Nucleic Acids Res 2012; 40:W71-5. [PMID: 22641855 PMCID: PMC3394327 DOI: 10.1093/nar/gks474] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products. The KD4v (Comprehensible Knowledge Discovery System for Missense Variant) server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants. The server provides a set of rules learned by Induction Logic Programming (ILP) on a set of missense variants described by conservation, physico-chemical, functional and 3D structure predicates. These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation. The web server is available at http://decrypthon.igbmc.fr/kd4v.
Collapse
Affiliation(s)
- Tien-Dao Luu
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire, 67404 Illkirch, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Luu TD, Rusu AM, Walter V, Ripp R, Moulinier L, Muller J, Toursel T, Thompson JD, Poch O, Nguyen H. MSV3d: database of human MisSense Variants mapped to 3D protein structure. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas018. [PMID: 22491796 PMCID: PMC3317913 DOI: 10.1093/database/bas018] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The elucidation of the complex relationships linking genotypic and phenotypic variations to protein structure is a major challenge in the post-genomic era. We present MSV3d (Database of human MisSense Variants mapped to 3D protein structure), a new database that contains detailed annotation of missense variants of all human proteins (20 199 proteins). The multi-level characterization includes details of the physico-chemical changes induced by amino acid modification, as well as information related to the conservation of the mutated residue and its position relative to functional features in the available or predicted 3D model. Major releases of the database are automatically generated and updated regularly in line with the dbSNP (database of Single Nucleotide Polymorphism) and SwissVar releases, by exploiting the extensive Décrypthon computational grid resources. The database (http://decrypthon.igbmc.fr/msv3d) is easily accessible through a simple web interface coupled to a powerful query engine and a standard web service. The content is completely or partially downloadable in XML or flat file formats. Database URL:http://decrypthon.igbmc.fr/msv3d
Collapse
Affiliation(s)
- Tien-Dao Luu
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire (UMR7104), 67404 Illkirch
| | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Prosdocimi F, Linard B, Pontarotti P, Poch O, Thompson JD. Controversies in modern evolutionary biology: the imperative for error detection and quality control. BMC Genomics 2012; 13:5. [PMID: 22217008 PMCID: PMC3311146 DOI: 10.1186/1471-2164-13-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 01/04/2012] [Indexed: 12/03/2022] Open
Abstract
Background The data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes. Results We investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events. Conclusions Initial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data.
Collapse
Affiliation(s)
- Francisco Prosdocimi
- Department of Integrated Structural Biology, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire) CNRS/INSERM/Université de Strasbourg, 1 rue Laurent Fries, Illkirch, F-67404, France
| | | | | | | | | |
Collapse
|
16
|
Yuzawa Y, Nishihara H, Haraguchi T, Masuda S, Shimojima M, Shimoyama A, Yuasa H, Okada N, Ohta H. Phylogeny of galactolipid synthase homologs together with their enzymatic analyses revealed a possible origin and divergence time for photosynthetic membrane biogenesis. DNA Res 2011; 19:91-102. [PMID: 22210603 PMCID: PMC3276260 DOI: 10.1093/dnares/dsr044] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The photosynthetic membranes of cyanobacteria and chloroplasts of higher plants have remarkably similar lipid compositions. In particular, thylakoid membranes of both cyanobacteria and chloroplasts are composed of galactolipids, of which monogalactosyldiacylglycerol (MGDG) is the most abundant, although MGDG biosynthetic pathways are different in these organisms. Comprehensive phylogenetic analysis revealed that MGDG synthase (MGD) homologs of filamentous anoxygenic phototrophs Chloroflexi have a close relationship with MGDs of Viridiplantae (green algae and land plants). Furthermore, analyses for the sugar specificity and anomeric configuration of the sugar head groups revealed that one of the MGD homologs exhibited a true MGDG synthetic activity. We therefore presumed that higher plant MGDs are derived from this ancestral type of MGD genes, and genes involved in membrane biogenesis and photosystems have been already functionally associated at least at the time of Chloroflexi divergence. As MGD gene duplication is an important event during plastid evolution, we also estimated the divergence time of type A and B MGDs. Our analysis indicated that these genes diverged ∼323 million years ago, when Spermatophyta (seed plants) were appearing. Galactolipid synthesis is required to produce photosynthetic membranes; based on MGD gene sequences and activities, we have proposed a novel evolutionary model that has increased our understanding of photosynthesis evolution.
Collapse
Affiliation(s)
- Yuichi Yuzawa
- Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 B-65 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Linard B, Nguyen NH, Prosdocimi F, Poch O, Thompson JD. EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data. Evol Bioinform Online 2011; 8:61-77. [PMID: 22267905 PMCID: PMC3256995 DOI: 10.4137/ebo.s8814] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes.
Collapse
Affiliation(s)
- Benjamin Linard
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| | - Ngoc Hoan Nguyen
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| | | | - Olivier Poch
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| | - Julie D. Thompson
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| |
Collapse
|
18
|
Overton IM, Barton GJ. Computational approaches to selecting and optimising targets for structural biology. Methods 2011; 55:3-11. [PMID: 21906678 PMCID: PMC3202631 DOI: 10.1016/j.ymeth.2011.08.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 08/18/2011] [Accepted: 08/22/2011] [Indexed: 11/29/2022] Open
Abstract
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline. Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.
Collapse
Affiliation(s)
- Ian M Overton
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, United Kingdom.
| | | |
Collapse
|
19
|
Thompson JD, Linard B, Lecompte O, Poch O. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 2011; 6:e18093. [PMID: 21483869 PMCID: PMC3069049 DOI: 10.1371/journal.pone.0018093] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/21/2011] [Indexed: 12/18/2022] Open
Abstract
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.
Collapse
Affiliation(s)
- Julie D Thompson
- Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg, Illkirch, France.
| | | | | | | |
Collapse
|
20
|
Gagnière N, Jollivet D, Boutet I, Brélivet Y, Busso D, Da Silva C, Gaill F, Higuet D, Hourdez S, Knoops B, Lallier F, Leize-Wagner E, Mary J, Moras D, Perrodou E, Rees JF, Segurens B, Shillito B, Tanguy A, Thierry JC, Weissenbach J, Wincker P, Zal F, Poch O, Lecompte O. Insights into metazoan evolution from Alvinella pompejana cDNAs. BMC Genomics 2010; 11:634. [PMID: 21080938 PMCID: PMC3018142 DOI: 10.1186/1471-2164-11-634] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 11/16/2010] [Indexed: 11/29/2022] Open
Abstract
Background Alvinella pompejana is a representative of Annelids, a key phylum for evo-devo studies that is still poorly studied at the sequence level. A. pompejana inhabits deep-sea hydrothermal vents and is currently known as one of the most thermotolerant Eukaryotes in marine environments, withstanding the largest known chemical and thermal ranges (from 5 to 105°C). This tube-dwelling worm forms dense colonies on the surface of hydrothermal chimneys and can withstand long periods of hypo/anoxia and long phases of exposure to hydrogen sulphides. A. pompejana specifically inhabits chimney walls of hydrothermal vents on the East Pacific Rise. To survive, Alvinella has developed numerous adaptations at the physiological and molecular levels, such as an increase in the thermostability of proteins and protein complexes. It represents an outstanding model organism for studying adaptation to harsh physicochemical conditions and for isolating stable macromolecules resistant to high temperatures. Results We have constructed four full length enriched cDNA libraries to investigate the biology and evolution of this intriguing animal. Analysis of more than 75,000 high quality reads led to the identification of 15,858 transcripts and 9,221 putative protein sequences. Our annotation reveals a good coverage of most animal pathways and networks with a prevalence of transcripts involved in oxidative stress resistance, detoxification, anti-bacterial defence, and heat shock protection. Alvinella proteins seem to show a slow evolutionary rate and a higher similarity with proteins from Vertebrates compared to proteins from Arthropods or Nematodes. Their composition shows enrichment in positively charged amino acids that might contribute to their thermostability. The gene content of Alvinella reveals that an important pool of genes previously considered to be specific to Deuterostomes were in fact already present in the last common ancestor of the Bilaterian animals, but have been secondarily lost in model invertebrates. This pool is enriched in glycoproteins that play a key role in intercellular communication, hormonal regulation and immunity. Conclusions Our study starts to unravel the gene content and sequence evolution of a deep-sea annelid, revealing key features in eukaryote adaptation to extreme environmental conditions and highlighting the proximity of Annelids and Vertebrates.
Collapse
Affiliation(s)
- Nicolas Gagnière
- Department of Structural Biology and Genomics, Institut de Génétique et de Biologie Moléculaire et Cellulaire, CERBM F-67400 Illkirch, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Wilke S, Krausze J, Gossen M, Groebe L, Jäger V, Gherardi E, van den Heuvel J, Büssow K. Glycoprotein production for structure analysis with stable, glycosylation mutant CHO cell lines established by fluorescence-activated cell sorting. Protein Sci 2010; 19:1264-71. [PMID: 20512979 DOI: 10.1002/pro.390] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Stable mammalian cell lines are excellent tools for the expression of secreted and membrane glycoproteins. However, structural analysis of these molecules is generally hampered by the complexity of N-linked carbohydrate side chains. Cell lines with mutations are available that result in shorter and more homogenous carbohydrate chains. Here, we use preparative fluorescence-activated cell sorting (FACS) and site-specific gene excision to establish high-yield glycoprotein expression for structural studies with stable clones derived from the well-established Lec3.2.8.1 glycosylation mutant of the Chinese hamster ovary (CHO) cell line. We exemplify the strategy by describing novel clones expressing single-chain hepatocyte growth factor/scatter factor (HGF/SF, a secreted glycoprotein) and a domain of lysosome-associated membrane protein 3 (LAMP3d). In both cases, stable GFP-expressing cell lines were established by transfection with a genetic construct including a GFP marker and two rounds of cell sorting after 1 and 2 weeks. The GFP marker was subsequently removed by heterologous expression of Flp recombinase. Production of HGF/SF and LAMP3d was stable over several months. 1.2 mg HGF/SF and 0.9 mg LAMP3d were purified per litre of culture, respectively. Homogenous glycoprotein preparations were amenable to enzymatic deglycosylation under native conditions. Purified and deglycosylated LAMP3d protein was readily crystallized. The combination of FACS and gene excision described here constitutes a robust and fast procedure for maximizing the yield of glycoproteins for structural analysis from glycosylation mutant cell lines.
Collapse
Affiliation(s)
- Sonja Wilke
- Division of Structural Biology, Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Friedrich A, Garnier N, Gagnière N, Nguyen H, Albou LP, Biancalana V, Bettler E, Deléage G, Lecompte O, Muller J, Moras D, Mandel JL, Toursel T, Moulinier L, Poch O. SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases. Hum Mutat 2010; 31:127-35. [PMID: 19921752 DOI: 10.1002/humu.21155] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Understanding how genetic alterations affect gene products at the molecular level represents a first step in the elucidation of the complex relationships between genotypic and phenotypic variations, and is thus a major challenge in the postgenomic era. Here, we present SM2PH-db (http://decrypthon.igbmc.fr/sm2ph), a new database designed to investigate structural and functional impacts of missense mutations and their phenotypic effects in the context of human genetic diseases. A wealth of up-to-date interconnected information is provided for each of the 2,249 disease-related entry proteins (August 2009), including data retrieved from biological databases and data generated from a Sequence-Structure-Evolution Inference in Systems-based approach, such as multiple alignments, three-dimensional structural models, and multidimensional (physicochemical, functional, structural, and evolutionary) characterizations of mutations. SM2PH-db provides a robust infrastructure associated with interactive analysis tools supporting in-depth study and interpretation of the molecular consequences of mutations, with the more long-term goal of elucidating the chain of events leading from a molecular defect to its pathology. The entire content of SM2PH-db is regularly and automatically updated thanks to a computational grid data federation facilities provided in the context of the Decrypthon program.
Collapse
Affiliation(s)
- Anne Friedrich
- Département de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (UMR7104), Centre National de la Recherche Scientifique/Institut National de la Santé et de la Recherche Médicale/Université de Strasbourg, Illkirch, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Procter JB, Thompson J, Letunic I, Creevey C, Jossinet F, Barton GJ. Visualization of multiple alignments, phylogenies and gene family evolution. Nat Methods 2010; 7:S16-25. [PMID: 20195253 DOI: 10.1038/nmeth.1434] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Software for visualizing sequence alignments and trees are essential tools for life scientists. In this review, we describe the major features and capabilities of a selection of stand-alone and web-based applications useful when investigating the function and evolution of a gene family. These range from simple viewers, to systems that provide sophisticated editing and analysis functions. We conclude with a discussion of the challenges that these tools now face due to the flood of next generation sequence data and the increasingly complex network of bioinformatics information sources.
Collapse
|
24
|
Gouret P, Thompson JD, Pontarotti P. PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees. BMC Bioinformatics 2009; 10:298. [PMID: 19765311 PMCID: PMC2759962 DOI: 10.1186/1471-2105-10-298] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2009] [Accepted: 09/19/2009] [Indexed: 11/23/2022] Open
Abstract
Background To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted. Results Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other. Conclusion PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.
Collapse
Affiliation(s)
- Philippe Gouret
- UMR 6632, Evolutionary Biology and Modeling, University of Provence, Marseille, France.
| | | | | |
Collapse
|
25
|
Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009; 25:1189-91. [PMID: 19151095 PMCID: PMC2672624 DOI: 10.1093/bioinformatics/btp033] [Citation(s) in RCA: 6663] [Impact Index Per Article: 444.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Revised: 11/24/2008] [Accepted: 01/08/2009] [Indexed: 12/11/2022] Open
Abstract
UNLABELLED Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server. AVAILABILITY The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org.
Collapse
Affiliation(s)
- Andrew M Waterhouse
- School of Life Sciences Research, College of Life Sciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK
| | | | | | | | | |
Collapse
|
26
|
Lassmann T, Frings O, Sonnhammer ELL. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res 2008; 37:858-65. [PMID: 19103665 PMCID: PMC2647288 DOI: 10.1093/nar/gkn1006] [Citation(s) in RCA: 179] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site (http://msa.sbc.su.se/).
Collapse
Affiliation(s)
- Timo Lassmann
- Department of Cell and Molecular Biology, Karolinska Institutet, SE-17177 Stockholm, Sweden
| | | | | |
Collapse
|
27
|
Levasseur A, Pontarotti P, Poch O, Thompson JD. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform Online 2008; 4:121-37. [PMID: 19204813 PMCID: PMC2614184 DOI: 10.4137/ebo.s597] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Collapse
Affiliation(s)
- Anthony Levasseur
- Phylogenomics Laboratory, EA 3781 Evolution Biologique, Université de Provence, 13331 Marseille, France
| | | | | | | |
Collapse
|
28
|
Perrodou E, Chica C, Poch O, Gibson TJ, Thompson JD. A new protein linear motif benchmark for multiple sequence alignment software. BMC Bioinformatics 2008; 9:213. [PMID: 18439277 PMCID: PMC2374782 DOI: 10.1186/1471-2105-9-213] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/25/2008] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. RESULTS We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. CONCLUSION We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.
Collapse
Affiliation(s)
- Emmanuel Perrodou
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Department of Structural Biology and Genomics, F-67400 Illkirch, France.
| | | | | | | | | |
Collapse
|
29
|
Overton IM, van Niekerk CAJ, Carter LG, Dawson A, Martin DMA, Cameron S, McMahon SA, White MF, Hunter WN, Naismith JH, Barton GJ. TarO: a target optimisation system for structural biology. Nucleic Acids Res 2008; 36:W190-6. [PMID: 18385152 PMCID: PMC2447720 DOI: 10.1093/nar/gkn141] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
TarO (http://www.compbio.dundee.ac.uk/taro) offers a single point of reference for key bioinformatics analyses relevant to selecting proteins or domains for study by structural biology techniques. The protein sequence is analysed by 17 algorithms and compared to 8 databases. TarO gathers putative homologues, including orthologues, and then obtains predictions of properties for these sequences including crystallisation propensity, protein disorder and post-translational modifications. Analyses are run on a high-performance computing cluster, the results integrated, stored in a database and accessed through a web-based user interface. Output is in tabulated format and in the form of an annotated multiple sequence alignment (MSA) that may be edited interactively in the program Jalview. TarO also simplifies the gathering of additional annotations via the Distributed Annotation System, both from the MSA in Jalview and through links to Dasty2. Routes to other information gateways are included, for example to relevant pages from UniProt, COG and the Conserved Domains Database. Open access to TarO is available from a guest account with private accounts for academic use available on request. Future development of TarO will include further analysis steps and integration with the Protein Information Management System (PIMS), a sister project in the BBSRC ‘Structural Proteomics of Rational Targets’ initiative
Collapse
Affiliation(s)
- Ian M Overton
- School of Life Sciences Research, University of Dundee, Dow Street, Dundee, UK
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
31
|
Caffrey DR, Dana PH, Mathur V, Ocano M, Hong EJ, Wang YE, Somaroo S, Caffrey BE, Potluri S, Huang ES. PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments. BMC Bioinformatics 2007; 8:381. [PMID: 17931421 PMCID: PMC2092438 DOI: 10.1186/1471-2105-8-381] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2007] [Accepted: 10/11/2007] [Indexed: 11/15/2022] Open
Abstract
Background By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Results Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. Conclusion PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
Collapse
Affiliation(s)
- Daniel R Caffrey
- Pfizer Global Research and Development, 620 Memorial Drive, Cambridge, MA 02139, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Lassmann T, Sonnhammer ELL. Automatic extraction of reliable regions from multiple sequence alignments. BMC Bioinformatics 2007; 8 Suppl 5:S9. [PMID: 17570868 PMCID: PMC1892097 DOI: 10.1186/1471-2105-8-s5-s9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background High quality multiple alignments are crucial in the transfer of annotation from one genome to another. Multiple alignment methods strive to achieve ever increasing levels of average accuracy on benchmark sets while the accuracy of individual alignments is often overlooked. Results We have previously developed a method to automatically assess the accuracy and overall difficulty of multiple alignments. This was achieved by a per-residue comparison between alternate alignments of the same sequences. Here we present a key extension to this method, an algorithm to extract similarly aligned regions from several alignments and merge them into a new consensus alignment. Conclusion We demonstrate that the fraction of correctly aligned residues within the resulting alignments is increased by 25 – 100 percent compared to the original input alignments, as only the most reliably aligned parts are considered.
Collapse
Affiliation(s)
- Timo Lassmann
- Department of Cell and Molecular Biology, Karolinska Institutet, SE-171 77, Stockholm, Sweden
| | - Erik LL Sonnhammer
- Department of Cell and Molecular Biology, Karolinska Institutet, SE-171 77, Stockholm, Sweden
- Stockholm Bioinformatics Center, Stockholm University, S-106 91 Stockholm, Sweden
| |
Collapse
|
33
|
Garnier N, Friedrich A, Bolze R, Bettler E, Moulinier L, Geourjon C, Thompson JD, Deléage G, Poch O. MAGOS: multiple alignment and modelling server. Bioinformatics 2006; 22:2164-5. [PMID: 16820425 DOI: 10.1093/bioinformatics/btl349] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED MAGOS is a web server allowing automated protein modelling coupled to the creation of a hierarchical and annotated multiple alignment of complete sequences. MAGOS is designed for an interactive approach of structural information within the framework of the evolutionary relevance of mined and predicted sequence information. AVAILABILITY The web server is freely available at http://pig-pbil.ibcp.fr/magos.
Collapse
Affiliation(s)
- N Garnier
- Institut de Biologie et Chimie des Protéines (IBCP UMR 5086),CNRS, Univ. Lyon1, IFR128 BioSciences Lyon-Gerland, 7, passage du Vercors, 69367 Lyon cedex 07, France
| | | | | | | | | | | | | | | | | |
Collapse
|