1
|
Torres-Tiji Y, Sethuram H, Gupta A, McCauley J, Dutra-Molino JV, Pathania R, Saxton L, Kang K, Hillson NJ, Mayfield SP. Bioinformatic Prediction and High Throughput In Vivo Screening to Identify Cis-Regulatory Elements for the Development of Algal Synthetic Promoters. ACS Synth Biol 2024; 13:2150-2165. [PMID: 38986010 PMCID: PMC11264317 DOI: 10.1021/acssynbio.4c00199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Algae biotechnology holds immense promise for revolutionizing the bioeconomy through the sustainable and scalable production of various bioproducts. However, their development has been hindered by the lack of advanced genetic tools. This study introduces a synthetic biology approach to develop such tools, focusing on the construction and testing of synthetic promoters. By analyzing conserved DNA motifs within the promoter regions of highly expressed genes across six different algal species, we identified cis-regulatory elements (CREs) associated with high transcriptional activity. Combining the algorithms POWRS, STREME, and PhyloGibbs, we predicted 1511 CREs and inserted them into a minimal synthetic promoter sequence in 1, 2, or 3 copies, resulting in 4533 distinct synthetic promoters. These promoters were evaluated in vivo for their capacity to drive the expression of a transgene in a high-throughput manner through next-generation sequencing post antibiotic selection and fluorescence-activated cell sorting. To validate our approach, we sequenced hundreds of transgenic lines showing high levels of GFP expression. Further, we individually tested 14 identified promoters, revealing substantial increases in GFP expression─up to nine times higher than the baseline synthetic promoter, with five matching or even surpassing the performance of the native AR1 promoter. As a result of this study, we identified a catalog of CREs that can now be used to build superior synthetic algal promoters. More importantly, here we present a validated pipeline to generate building blocks for innovative synthetic genetic tools applicable to any algal species with a sequenced genome and transcriptome data set.
Collapse
Affiliation(s)
- Y. Torres-Tiji
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - H. Sethuram
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - A. Gupta
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - J. McCauley
- Biological
Systems & Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
- DOE
Agile BioFoundry, Emeryville, California 94608, United States
| | - J.-V. Dutra-Molino
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - R. Pathania
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - L. Saxton
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - K. Kang
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| | - N. J. Hillson
- Biological
Systems & Engineering Division, Lawrence
Berkeley National Laboratory, Berkeley, California 94720, United States
- DOE
Agile BioFoundry, Emeryville, California 94608, United States
| | - S. P. Mayfield
- Division
of Biological Sciences, University of California
San Diego, La Jolla, California 92093, United States
| |
Collapse
|
2
|
Rajczewski AT, Jagtap PD, Griffin TJ. An overview of technologies for MS-based proteomics-centric multi-omics. Expert Rev Proteomics 2022; 19:165-181. [PMID: 35466851 PMCID: PMC9613604 DOI: 10.1080/14789450.2022.2070476] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Mass spectrometry-based proteomics reveals dynamic molecular signatures underlying phenotypes reflecting normal and perturbed conditions in living systems. Although valuable on its own, the proteome has only one level of moleclar information, with the genome, epigenome, transcriptome, and metabolome, all providing complementary information. Multi-omic analysis integrating information from one or more of these other domains with proteomic information provides a more complete picture of molecular contributors to dynamic biological systems. AREAS COVERED Here, we discuss the improvements to mass spectrometry-based technologies, focused on peptide-based, bottom-up approaches that have enabled deep, quantitative characterization of complex proteomes. These advances are facilitating the integration of proteomics data with other 'omic information, providing a more complete picture of living systems. We also describe the current state of bioinformatics software and approaches for integrating proteomics and other 'omics data, critical for enabling new discoveries driven by multi-omics. EXPERT COMMENTARY Multi-omics, centered on the integration of proteomics information with other 'omic information, has tremendous promise for biological and biomedical studies. Continued advances in approaches for generating deep, reliable proteomic data and bioinformatics tools aimed at integrating data across 'omic domains will ensure the discoveries offered by these multi-omic studies continue to increase.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Coauthor, Research Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| |
Collapse
|
3
|
Rehman HA, Zafar K, Khan A, Imtiaz A. Multiple sequence alignment using enhanced bird swarm align algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Discovering structural, functional and evolutionary information in biological sequences have been considered as a core research area in Bioinformatics. Multiple Sequence Alignment (MSA) tries to align all sequences in a given query set to provide us ease in annotation of new sequences. Traditional methods to find the optimal alignment are computationally expensive in real time. This research presents an enhanced version of Bird Swarm Algorithm (BSA), based on bio inspired optimization. Enhanced Bird Swarm Align Algorithm (EBSAA) is proposed for multiple sequence alignment problem to determine the optimal alignment among different sequences. Twenty-one different datasets have been used in order to compare performance of EBSAA with Genetic Algorithm (GA) and Particle Swarm Align Algorithm (PSAA). The proposed technique results in better alignment as compared to GA and PSAA in most of the cases.
Collapse
Affiliation(s)
- Hafiz Asadul Rehman
- Department of Computer Science, NationalUniversity of Computer and Emerging Science Lahore, Pakistan
| | - Kashif Zafar
- Department of Computer Science, NationalUniversity of Computer and Emerging Science Lahore, Pakistan
| | - Ayesha Khan
- University of Management & Technology, Lahore, Pakistan
| | | |
Collapse
|
4
|
Zhao Y, Broholm SK, Wang F, Rijpkema AS, Lan T, Albert VA, Teeri TH, Elomaa P. TCP and MADS-Box Transcription Factor Networks Regulate Heteromorphic Flower Type Identity in Gerbera hybrida. PLANT PHYSIOLOGY 2020; 184:1455-1468. [PMID: 32900982 PMCID: PMC7608168 DOI: 10.1104/pp.20.00702] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 08/25/2020] [Indexed: 05/19/2023]
Abstract
The large sunflower family, Asteraceae, is characterized by compressed, flower-like inflorescences that may bear phenotypically distinct flower types. The CYCLOIDEA (CYC)/TEOSINTE BRANCHED1-like transcription factors (TFs) belonging to the TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR (TCP) protein family are known to regulate bilateral symmetry in single flowers. In Asteraceae, they function at the inflorescence level, and were recruited to define differential flower type identities. Here, we identified upstream regulators of GhCYC3, a gene that specifies ray flower identity at the flower head margin in the model plant Gerbera hybrida We discovered a previously unidentified expression domain and functional role for the paralogous CINCINNATA-like TCP proteins. They function upstream of GhCYC3 and affect the developmental delay of marginal ray primordia during their early ontogeny. At the level of single flowers, the Asteraceae CYC genes show a unique function in regulating the elongation of showy ventral ligules that play a major role in pollinator attraction. We discovered that during ligule development, the E class MADS-box TF GRCD5 activates GhCYC3 expression. We propose that the C class MADS-box TF GAGA1 contributes to stamen development upstream of GhCYC3 Our data demonstrate how interactions among and between the conserved floral regulators, TCP and MADS-box TFs, contribute to the evolution of the elaborate inflorescence architecture of Asteraceae.
Collapse
Affiliation(s)
- Yafei Zhao
- Department of Agricultural Sciences, Viikki Plant Science Centre, University of Helsinki, 00014 Helsinki, Finland
| | - Suvi K Broholm
- Department of Agricultural Sciences, Viikki Plant Science Centre, University of Helsinki, 00014 Helsinki, Finland
| | - Feng Wang
- Department of Agricultural Sciences, Viikki Plant Science Centre, University of Helsinki, 00014 Helsinki, Finland
| | - Anneke S Rijpkema
- Department of Agricultural Sciences, Viikki Plant Science Centre, University of Helsinki, 00014 Helsinki, Finland
| | - Tianying Lan
- Department of Biological Sciences, University at Buffalo, Buffalo, New York 14260
| | - Victor A Albert
- Department of Biological Sciences, University at Buffalo, Buffalo, New York 14260
| | - Teemu H Teeri
- Department of Agricultural Sciences, Viikki Plant Science Centre, University of Helsinki, 00014 Helsinki, Finland
| | - Paula Elomaa
- Department of Agricultural Sciences, Viikki Plant Science Centre, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
5
|
Magnus representation of genome sequences. J Theor Biol 2019; 480:104-111. [DOI: 10.1016/j.jtbi.2019.08.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/30/2019] [Accepted: 08/05/2019] [Indexed: 11/24/2022]
|
6
|
Leimeister CA, Dencker T, Morgenstern B. Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points. Bioinformatics 2019; 35:211-218. [PMID: 29992260 PMCID: PMC6330006 DOI: 10.1093/bioinformatics/bty592] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 07/09/2018] [Indexed: 01/30/2023] Open
Abstract
Motivation Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. Results In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. Availability and implementation http://spacedanchor.gobics.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics.,Center for Computational Sciences, University of Goettingen, Goettingen, Germany
| |
Collapse
|
7
|
Abstract
The increasing importance of Next Generation Sequencing (NGS) techniques has highlighted the key role of multiple sequence alignment (MSA) in comparative structure and function analysis of biological sequences. MSA often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments, although many biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, to serve as a helpful guide or starting point for researchers who aim to construct a reliable MSA.
Collapse
|
8
|
Ye Y, Lam TW, Ting HF. PnpProbs: a better multiple sequence alignment tool by better handling of guide trees. BMC Bioinformatics 2016; 17 Suppl 8:285. [PMID: 27585754 PMCID: PMC5009527 DOI: 10.1186/s12859-016-1121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related sequences, it uses an adaptive approach to construct the guide tree needed for progressive alignment; it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the better method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree and uses instead some non-progressive alignment method to generate the alignment. RESULTS To evaluate PnpProbs, we have compared it with thirteen other popular MSA tools, and PnpProbs has the best alignment scores in all but one test. We have also used it for phylogenetic analysis, and found that the phylogenetic trees constructed from PnpProbs' alignments are closest to the model trees. CONCLUSIONS By combining the strength of the progressive and non-progressive alignment methods, we have developed an MSA tool called PnpProbs. We have compared PnpProbs with thirteen other popular MSA tools and our results showed that our tool usually constructed the best alignments.
Collapse
Affiliation(s)
- Yongtao Ye
- HKU-BGI Bioinformatics Algorithms & Core Technology Research Lab, Computer Science Department, University of Hong Kong, Hong Kong, China
| | - Tak-Wah Lam
- HKU-BGI Bioinformatics Algorithms & Core Technology Research Lab, Computer Science Department, University of Hong Kong, Hong Kong, China
| | - Hing-Fung Ting
- HKU-BGI Bioinformatics Algorithms & Core Technology Research Lab, Computer Science Department, University of Hong Kong, Hong Kong, China.
| |
Collapse
|
9
|
Bérard S, Chateau A, Pompidor N, Guertin P, Bergeron A, Swenson KM. Aligning the unalignable: bacteriophage whole genome alignments. BMC Bioinformatics 2016; 17:30. [PMID: 26757899 PMCID: PMC4711071 DOI: 10.1186/s12859-015-0869-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 12/22/2015] [Indexed: 11/19/2022] Open
Abstract
Background In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. Results In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressiveMauve aligner – which implements a partial order strategy, but whose alignments are linearized – shows a greatly improved interactive graphic display, while avoiding misalignments. Conclusions Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha). Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0869-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sèverine Bérard
- ISEM, CNRS - Univ. Montpellier, Montpellier, France. .,LIRMM, CNRS - Univ. Montpellier, 161 rue Ada, Montpellier, 34392, France.
| | - Annie Chateau
- LIRMM, CNRS - Univ. Montpellier, 161 rue Ada, Montpellier, 34392, France. .,IBC Institut de Biologie Computationnelle, Montpellier, France.
| | - Nicolas Pompidor
- LIRMM, CNRS - Univ. Montpellier, 161 rue Ada, Montpellier, 34392, France.
| | - Paul Guertin
- LaCIM, Université du Québec à Montréal, Montréal, Canada. .,Département de mathématiques, Collège André-Grasset, Montréal, Canada.
| | - Anne Bergeron
- LaCIM, Université du Québec à Montréal, Montréal, Canada.
| | - Krister M Swenson
- LIRMM, CNRS - Univ. Montpellier, 161 rue Ada, Montpellier, 34392, France. .,IBC Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|
10
|
Zemali EA, Boukra A. Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations. J Bioinform Comput Biol 2015; 13:1550016. [PMID: 26055803 DOI: 10.1142/s021972001550016x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.
Collapse
Affiliation(s)
- El-Amine Zemali
- LSI Laboratory, Department of Computer Science, University of Science and Technologies, Houari Boumediene (USTHB), BP 32 El Alia 16111, Bab Ezzouar, Algeria
| | - Abdelmadjid Boukra
- LSI Laboratory, Department of Computer Science, University of Science and Technologies, Houari Boumediene (USTHB), BP 32 El Alia 16111, Bab Ezzouar, Algeria
| |
Collapse
|
11
|
Kumar M. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm. EXCLI JOURNAL 2015; 14:1232-55. [PMID: 27065770 PMCID: PMC4820728 DOI: 10.17179/excli2015-302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2015] [Accepted: 11/19/2015] [Indexed: 11/10/2022]
Abstract
One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality.
Collapse
Affiliation(s)
- Manish Kumar
- Department of Computer Science and Engineering, Indian School of Mines, Dhanbad, Jharkhand, India
| |
Collapse
|
12
|
Abstract
DIALIGN is a software tool for multiple sequence alignment by combining global and local alignment features. It composes multiple alignments from local pairwise sequence similarities. This approach is particularly useful to discover conserved functional regions in sequences that share only local homologies but are otherwise unrelated. An anchoring option allows to use external information and expert knowledge in addition to primary-sequence similarity alone. The latest version of DIALIGN optionally uses matches to the PFAM database to detect weak homologies. Various versions of the program are available through Göttingen Bioinformatics Compute Server (GOBICS) at http://www.gobics.de/department/software.
Collapse
|
13
|
Schulze S, Mallmann J, Burscheidt J, Koczor M, Streubel M, Bauwe H, Gowik U, Westhoff P. Evolution of C4 photosynthesis in the genus flaveria: establishment of a photorespiratory CO2 pump. THE PLANT CELL 2013; 25:2522-35. [PMID: 23847152 PMCID: PMC3753380 DOI: 10.1105/tpc.113.114520] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 06/20/2013] [Accepted: 06/28/2013] [Indexed: 05/18/2023]
Abstract
C4 photosynthesis is nature's most efficient answer to the dual activity of ribulose-1,5-bisphosphate carboxylase/oxygenase and the resulting loss of CO(2) by photorespiration. Gly decarboxylase (GDC) is the key component of photorespiratory CO(2) release in plants and is active in all photosynthetic tissues of C(3) plants, but only in the bundle sheath cells of C(4) plants. The restriction of GDC to the bundle sheath is assumed to be an essential and early step in the evolution of C(4) photosynthesis, leading to a photorespiratory CO(2) concentrating mechanism. In this study, we analyzed how the P-protein of GDC (GLDP) became restricted to the bundle sheath during the transition from C(3) to C(4) photosynthesis in the genus Flaveria. We found that C(3) Flaveria species already contain a bundle sheath-expressed GLDP gene in addition to a ubiquitously expressed second gene, which became a pseudogene in C(4) Flaveria species. Analyses of C(3)-C(4) intermediate Flaveria species revealed that the photorespiratory CO(2) pump was not established in one single step, but gradually. The knowledge gained by this study sheds light on the early steps in C(4) evolution.
Collapse
Affiliation(s)
- Stefanie Schulze
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
- Cluster of Excellence on Plant Sciences “From Complex Traits towards Synthetic Modules,” 40225 Duesseldorf, Germany
| | - Julia Mallmann
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
| | - Janet Burscheidt
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
| | - Maria Koczor
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
| | - Monika Streubel
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
| | - Hermann Bauwe
- Universität Rostock, Abteilung Pflanzenphysiologie, 18059 Rostock, Germany
| | - Udo Gowik
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
- Cluster of Excellence on Plant Sciences “From Complex Traits towards Synthetic Modules,” 40225 Duesseldorf, Germany
| | - Peter Westhoff
- Heinrich-Heine-Universität, Department Biologie, 40225 Duesseldorf, Germany
- Cluster of Excellence on Plant Sciences “From Complex Traits towards Synthetic Modules,” 40225 Duesseldorf, Germany
| |
Collapse
|
14
|
Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns. Bioinformatics 2013; 29:2112-21. [DOI: 10.1093/bioinformatics/btt360] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
15
|
Al Ait L, Yamak Z, Morgenstern B. DIALIGN at GOBICS--multiple sequence alignment using various sources of external information. Nucleic Acids Res 2013; 41:W3-7. [PMID: 23620293 PMCID: PMC3692126 DOI: 10.1093/nar/gkt283] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
DIALIGN is an established tool for multiple sequence alignment that is particularly useful to detect local homologies in sequences with low overall similarity. In recent years, various versions of the program have been developed, some of which are fully automated, whereas others are able to accept user-specified external information. In this article, we review some versions of the program that are available through ‘Göttingen Bioinformatics Compute Server’. In addition to previously described implementations, we present a new release of DIALIGN called ‘DIALIGN-PFAM’, which uses hits to the PFAM database for improved protein alignment. Our software is available through http://dialign.gobics.de/.
Collapse
Affiliation(s)
- Layal Al Ait
- Department of Bioinformatics, University of Göttingen, Institute of Microbiology and Genetics, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | | | | |
Collapse
|
16
|
Astakhova TV, Lobanov MN, Poverennaya IV, Roytberg MA, Yacovlev VV. Verification of the PREFAB alignment database. Biophysics (Nagoya-shi) 2012. [DOI: 10.1134/s0006350912020030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
17
|
Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment. BMC Bioinformatics 2011; 12:353. [PMID: 21867510 PMCID: PMC3180391 DOI: 10.1186/1471-2105-12-353] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2011] [Accepted: 08/25/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. RESULTS In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. CONCLUSIONS The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research.
Collapse
|
18
|
Castrillo G, Turck F, Leveugle M, Lecharny A, Carbonero P, Coupland G, Paz-Ares J, Oñate-Sánchez L. Speeding cis-trans regulation discovery by phylogenomic analyses coupled with screenings of an arrayed library of Arabidopsis transcription factors. PLoS One 2011; 6:e21524. [PMID: 21738689 PMCID: PMC3124521 DOI: 10.1371/journal.pone.0021524] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Accepted: 05/31/2011] [Indexed: 01/27/2023] Open
Abstract
Transcriptional regulation is an important mechanism underlying gene expression and has played a crucial role in evolution. The number, position and interactions between cis-elements and transcription factors (TFs) determine the expression pattern of a gene. To identify functionally relevant cis-elements in gene promoters, a phylogenetic shadowing approach with a lipase gene (LIP1) was used. As a proof of concept, in silico analyses of several Brassicaceae LIP1 promoters identified a highly conserved sequence (LIP1 element) that is sufficient to drive strong expression of a reporter gene in planta. A collection of ca. 1,200 Arabidopsis thaliana TF open reading frames (ORFs) was arrayed in a 96-well format (RR library) and a convenient mating based yeast one hybrid (Y1H) screening procedure was established. We constructed an episomal plasmid (pTUY1H) to clone the LIP1 element and used it as bait for Y1H screenings. A novel interaction with an HD-ZIP (AtML1) TF was identified and abolished by a 2 bp mutation in the LIP1 element. A role of this interaction in transcriptional regulation was confirmed in planta. In addition, we validated our strategy by reproducing the previously reported interaction between a MYB-CC (PHR1) TF, a central regulator of phosphate starvation responses, with a conserved promoter fragment (IPS1 element) containing its cognate binding sequence. Finally, we established that the LIP1 and IPS1 elements were differentially bound by HD-ZIP and MYB-CC family members in agreement with their genetic redundancy in planta. In conclusion, combining in silico analyses of orthologous gene promoters with Y1H screening of the RR library represents a powerful approach to decipher cis- and trans-regulatory codes.
Collapse
Affiliation(s)
- Gabriel Castrillo
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Cantoblanco, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C. T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res 2011; 39:W13-7. [PMID: 21558174 PMCID: PMC3125728 DOI: 10.1093/nar/gkr245] [Citation(s) in RCA: 800] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10,000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.
Collapse
Affiliation(s)
- Paolo Di Tommaso
- Centre For Genomic Regulation (Pompeu Fabra University), Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Jin H, Kanthasamy A, Anantharam V, Rana A, Kanthasamy AG. Transcriptional regulation of pro-apoptotic protein kinase Cdelta: implications for oxidative stress-induced neuronal cell death. J Biol Chem 2011; 286:19840-59. [PMID: 21467032 DOI: 10.1074/jbc.m110.203687] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
We previously demonstrated that protein kinase Cδ (PKCδ; PKC delta) is an oxidative stress-sensitive kinase that plays a causal role in apoptotic cell death in neuronal cells. Although PKCδ activation has been extensively studied, relatively little is known about the molecular mechanisms controlling PKCδ expression. To characterize the regulation of PKCδ expression, we cloned an ∼2-kbp 5'-promoter segment of the mouse Prkcd gene. Deletion analysis indicated that the noncoding exon 1 region contained multiple Sp sites, including four GC boxes and one CACCC box, which directed the highest levels of transcription in neuronal cells. In addition, an upstream regulatory region containing adjacent repressive and anti-repressive elements with opposing regulatory activities was identified within the region -712 to -560. Detailed mutagenesis studies revealed that each Sp site made a positive contribution to PKCδ promoter expression. Overexpression of Sp family proteins markedly stimulated PKCδ promoter activity without any synergistic transactivating effect. Furthermore, experiments in Sp-deficient SL2 cells indicated long isoform Sp3 as the essential activator of PKCδ transcription. Importantly, both PKCδ promoter activity and endogenous PKCδ expression in NIE115 cells and primary striatal cultures were inhibited by mithramycin A. The results from chromatin immunoprecipitation and gel shift assays further confirmed the functional binding of Sp proteins to the PKCδ promoter. Additionally, we demonstrated that overexpression of p300 or CREB-binding protein increases the PKCδ promoter activity. This stimulatory effect requires intact Sp-binding sites and is independent of p300 histone acetyltransferase activity. Finally, modulation of Sp transcriptional activity or protein level profoundly altered the cell death induced by oxidative insult, demonstrating the functional significance of Sp-dependent PKCδ gene expression. Collectively, our findings may have implications for development of new translational strategies against oxidative damage.
Collapse
Affiliation(s)
- Huajun Jin
- Parkinson's Disorder Research Laboratory, Iowa Center for Advanced Neurotoxicology, Department of Biomedical Sciences, Iowa State University, Ames, Iowa 50011, USA
| | | | | | | | | |
Collapse
|
21
|
α-Synuclein negatively regulates protein kinase Cδ expression to suppress apoptosis in dopaminergic neurons by reducing p300 histone acetyltransferase activity. J Neurosci 2011; 31:2035-51. [PMID: 21307242 DOI: 10.1523/jneurosci.5634-10.2011] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
We recently demonstrated that protein kinase Cδ (PKCδ), an important member of the novel PKC family, is a key oxidative stress-sensitive kinase that can be activated by caspase-3-dependent proteolytic cleavage to induce dopaminergic neuronal cell death. We now report a novel association between α-synuclein (αsyn), a protein associated with the pathogenesis of Parkinson's disease, and PKCδ, in which αsyn negatively modulates the p300- and nuclear factor-κB (NFκB)-dependent transactivation to downregulate proapoptotic kinase PKCδ expression and thereby protects against apoptosis in dopaminergic neuronal cells. Stable expression of human wild-type αsyn at physiological levels in dopaminergic neuronal cells resulted in an isoform-dependent transcriptional suppression of PKCδ expression without changes in the stability of mRNA and protein or DNA methylation. The reduction in PKCδ transcription was mediated, in part, through the suppression of constitutive NFκB activity targeted at two proximal PKCδ promoter κB sites. This occurred independently of NFκB/IκBα (inhibitor of κBα) nuclear translocation but was associated with decreased NFκB-p65 acetylation. Also, αsyn reduced p300 levels and its HAT (histone acetyltransferase) activity, thereby contributing to diminished PKCδ transactivation. Importantly, reduced PKCδ and p300 expression also were observed within nigral dopaminergic neurons in αsyn-transgenic mice. These findings expand the role of αsyn in neuroprotection by modulating the expression of the key proapoptotic kinase PKCδ in dopaminergic neurons.
Collapse
|
22
|
Bessadok A, Garcia E, Jacquet H, Martin S, Garrigues A, Loiseau N, André F, Orlowski S, Vivaudou M. Recognition of sulfonylurea receptor (ABCC8/9) ligands by the multidrug resistance transporter P-glycoprotein (ABCB1): functional similarities based on common structural features between two multispecific ABC proteins. J Biol Chem 2010; 286:3552-69. [PMID: 21098040 DOI: 10.1074/jbc.m110.155200] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
ATP-sensitive K(+) (K(ATP)) channels are the target of a number of pharmacological agents, blockers like hypoglycemic sulfonylureas and openers like the hypotensive cromakalim and diazoxide. These agents act on the channel regulatory subunit, the sulfonylurea receptor (SUR), which is an ABC protein with homologies to P-glycoprotein (P-gp). P-gp is a multidrug transporter expressed in tumor cells and in some healthy tissues. Because these two ABC proteins both exhibit multispecific recognition properties, we have tested whether SUR ligands could be substrates of P-gp. Interaction with P-gp was assayed by monitoring ATPase activity of P-gp-enriched vesicles. The blockers glibenclamide, tolbutamide, and meglitinide increased ATPase activity, with a rank order of potencies that correlated with their capacity to block K(ATP) channels. P-gp ATPase activity was also increased by the openers SR47063 (a cromakalim analog), P1075 (a pinacidil analog), and diazoxide. Thus, these molecules bind to P-gp (although with lower affinities than for SUR) and are possibly transported by P-gp. Competition experiments among these molecules as well as with typical P-gp substrates revealed a structural similarity between drug binding domains in the two proteins. To rationalize the observed data, we addressed the molecular features of these proteins and compared structural models, computerized by homology from the recently solved structures of murine P-gp and bacterial ABC transporters MsbA and Sav1866. Considering the various residues experimentally assigned to be involved in drug binding, we uncovered several hot spots, which organized spatially in two main binding domains, selective for SR47063 and for glibenclamide, in matching regions of both P-gp and SUR.
Collapse
Affiliation(s)
- Anis Bessadok
- Service de Bioénergétique, Biologie Structurale et Mécanismes, URA 2096 CNRS, iBiTec-S, Commissariat à l'Energie Atomique-Saclay, 91191 Gif-sur-Yvette Cedex, France
| | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Simmons MP, Müller KF, Norton AP. Alignment of, and phylogenetic inference from, random sequences: the susceptibility of alternative alignment methods to creating artifactual resolution and support. Mol Phylogenet Evol 2010; 57:1004-16. [PMID: 20849963 DOI: 10.1016/j.ympev.2010.09.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Revised: 04/05/2010] [Accepted: 09/06/2010] [Indexed: 10/19/2022]
Abstract
We used random sequences to determine which alignment methods are most susceptible to aligning sequences so as to create artifactual resolution and branch support in phylogenetic trees derived from those alignments. We compared four alignment methods (progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment, and direct optimization) to determine which methods are most susceptible to creating false positives in phylogenetic trees. Implied alignments created using direct optimization provided more artifactual support than progressive pairwise alignment methods, which in turn generally provided more artifactual support than simultaneous and local alignment methods. Artifactual support derived from base pairs was generally reinforced by the incorporation of gap characters for progressive pairwise alignment, local pairwise alignment, and implied alignments. The amount of artifactual resolution and support was generally greater for simulated nucleotide sequences than for simulated amino acid sequences. In the context of direct optimization, the differences between static and dynamic approaches to calculating support were extreme, ranging from maximal to nearly minimal support. When applied to highly divergent sequences, it is important that dynamic, rather than static, characters be used whenever calculating branch support using direct optimization. In contrast to the tree-based approaches to alignment, simultaneous alignment of sequences using the similarity criterion generally does not create alignments that are biased in favor of any particular tree topology.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523-1878, USA.
| | | | | |
Collapse
|
24
|
Bustos R, Castrillo G, Linhares F, Puga MI, Rubio V, Pérez-Pérez J, Solano R, Leyva A, Paz-Ares J. A central regulatory system largely controls transcriptional activation and repression responses to phosphate starvation in Arabidopsis. PLoS Genet 2010; 6:e1001102. [PMID: 20838596 PMCID: PMC2936532 DOI: 10.1371/journal.pgen.1001102] [Citation(s) in RCA: 462] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2009] [Accepted: 07/29/2010] [Indexed: 01/22/2023] Open
Abstract
Plants respond to different stresses by inducing or repressing transcription of partially overlapping sets of genes. In Arabidopsis, the PHR1 transcription factor (TF) has an important role in the control of phosphate (Pi) starvation stress responses. Using transcriptomic analysis of Pi starvation in phr1, and phr1 phr1-like (phl1) mutants and in wild type plants, we show that PHR1 in conjunction with PHL1 controls most transcriptional activation and repression responses to phosphate starvation, regardless of the Pi starvation specificity of these responses. Induced genes are enriched in PHR1 binding sequences (P1BS) in their promoters, whereas repressed genes do not show such enrichment, suggesting that PHR1(-like) control of transcriptional repression responses is indirect. In agreement with this, transcriptomic analysis of a transgenic plant expressing PHR1 fused to the hormone ligand domain of the glucocorticoid receptor showed that PHR1 direct targets (i.e., displaying altered expression after GR:PHR1 activation by dexamethasone in the presence of cycloheximide) corresponded largely to Pi starvation-induced genes that are highly enriched in P1BS. A minimal promoter containing a multimerised P1BS recapitulates Pi starvation-specific responsiveness. Likewise, mutation of P1BS in the promoter of two Pi starvation-responsive genes impaired their responsiveness to Pi starvation, but not to other stress types. Phylogenetic footprinting confirmed the importance of P1BS and PHR1 in Pi starvation responsiveness and indicated that P1BS acts in concert with other cis motifs. All together, our data show that PHR1 and PHL1 are partially redundant TF acting as central integrators of Pi starvation responses, both specific and generic. In addition, they indicate that transcriptional repression responses are an integral part of adaptive responses to stress. As sessile organisms, plants are often exposed to stress conditions, and have evolved adaptive responses to protect themselves from different types of stress. Some responses are stress type-specific whereas others are common to different stress types. Understanding how these responses are controlled is crucial for rational improvement of stress tolerance, a limiting factor in crop productivity. Here we examined the physiological and molecular responses to phosphate starvation and found that a single transcription factor family, represented by PHOSPHATE STARVATION RESPONSE REGULATOR 1 (PHR1), has a central role in the control of specific and shared phosphate starvation stress responses. In consonance with the importance of PHR1, we found that the PHR1-binding sequence, present in most PHR1 direct targets, is a crucial cis motif for Pi starvation responsiveness. An artificial promoter controlled by PHR1 recapitulates responsiveness to Pi starvation and to modulators of this response, qualifying PHR1 family members as central integrators in Pi starvation signalling. This central integrator system also controls most transcriptional repression responses to Pi starvation, indicating that they are an integral part of the adaptive response, and not a consequence of plant malfunction due to stress.
Collapse
Affiliation(s)
- Regla Bustos
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Gabriel Castrillo
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Francisco Linhares
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - María Isabel Puga
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Vicente Rubio
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Julian Pérez-Pérez
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Roberto Solano
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Antonio Leyva
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
| | - Javier Paz-Ares
- Department of Plant Molecular Genetics, Centro Nacional de Biotecnología-CSIC, Madrid, Spain
- * E-mail:
| |
Collapse
|
25
|
Pitschi F, Devauchelle C, Corel E. Automatic detection of anchor points for multiple sequence alignment. BMC Bioinformatics 2010; 11:445. [PMID: 20813050 PMCID: PMC2942857 DOI: 10.1186/1471-2105-11-445] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2009] [Accepted: 09/02/2010] [Indexed: 11/18/2022] Open
Abstract
Background Determining beforehand specific positions to align (anchor points) has proved valuable for the accuracy of automated multiple sequence alignment (MSA) software. This feature can be used manually to include biological expertise, or automatically, usually by pairwise similarity searches. Multiple local similarities are be expected to be more adequate, as more biologically relevant. However, even good multiple local similarities can prove incompatible with the ordering of an alignment. Results We use a recently developed algorithm to detect multiple local similarities, which returns subsets of positions in the sequences sharing similar contexts of appearence. In this paper, we describe first how to get, with the help of this method, subsets of positions that could form partial columns in an alignment. We introduce next a graph-theoretic algorithm to detect (and remove) positions in the partial columns that are inconsistent with a multiple alignment. Partial columns can be used, for the time being, as guide only by a few MSA programs: ClustalW 2.0, DIALIGN 2 and T-Coffee. We perform tests on the effect of introducing these columns on the popular benchmark BAliBASE 3. Conclusions We show that the inclusion of our partial alignment columns, as anchor points, improve on the whole the accuracy of the aligner ClustalW on the benchmark BAliBASE 3.
Collapse
Affiliation(s)
- Florian Pitschi
- Partner Institute for Computational Biology, CAS-MPG, 320 Yue Yang Rd, 200031 Shanghai, China
| | | | | |
Collapse
|
26
|
Finn S, Civetta A. Sexual selection and the molecular evolution of ADAM proteins. J Mol Evol 2010; 71:231-40. [PMID: 20730583 DOI: 10.1007/s00239-010-9382-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2010] [Accepted: 08/09/2010] [Indexed: 12/12/2022]
Abstract
Rapid evolution has been identified for many reproductive genes and recent studies have combined phylogenetic tests and information on species mating systems to test sexual selection. Here we examined the molecular evolution of the ADAM gene family, a diverse group of 35 proteins capable of adhesion to and cleavage of other proteins, using sequence data from 25 mammalian genes. Out of the 25 genes analyzed, all those expressed in male reproductive tissue showed evidence of positive selection. Positively selected amino acids within the protein adhesion domain were only found in sperm surface ADAM proteins (ADAMs 1, 2, 3, 4, and 32) suggesting selection driven by male x female interactions. We tested heterogeneity in rates of evolution of the adhesion domain of ADAM proteins by using sequence data from Hominidae and macaques. The use of the branch and branch-site models (PAML) showed evidence of higher d (N)/d (S) and/or positive selection linked to branches experiencing high postmating selective pressures (chimpanzee and macaque) for Adams 2, 18, and 23. Moreover, we found consistent higher proportion of nonsynonymous relative to synonymous and noncoding sequence substitutions in chimpanzee and/or macaque only for Adams 2, 18, and 23. Our results suggest that lineage-specific sexual selection bouts might have driven the evolution of the adhesion sperm protein surface domains of ADAMs 2 and 18 in primates. Adams 2 and 18 are localized in chromosome 8 of primates and adjacent to each other, so their evolution might have also been influenced by their common genome localization.
Collapse
Affiliation(s)
- Scott Finn
- Department of Biology, University of Winnipeg, 515 Portage Ave., Winnipeg, MB, R3B 2E9, Canada
| | | |
Collapse
|
27
|
Frölich D, Giesecke C, Mei HE, Reiter K, Daridon C, Lipsky PE, Dörner T. Secondary immunization generates clonally related antigen-specific plasma cells and memory B cells. THE JOURNAL OF IMMUNOLOGY 2010; 185:3103-10. [PMID: 20693426 DOI: 10.4049/jimmunol.1000911] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Rechallenge with T cell-dependent Ags induces memory B cells to re-enter germinal centers (GCs) and undergo further expansion and differentiation into plasma cells (PCs) and secondary memory B cells. It is currently not known whether the expanded population of memory B cells and PCs generated in secondary GCs are clonally related, nor has the extent of proliferation and somatic hypermutation of their precursors been delineated. In this study, after secondary tetanus toxoid (TT) immunization, TT-specific PCs increased 17- to 80-fold on days 6-7, whereas TT-specific memory B cells peaked (delayed) on day 14 with a 2- to 22-fold increase. Molecular analyses of V(H)DJ(H) rearrangements of individual cells revealed no major differences of gene usage and CDR3 length between TT-specific PCs and memory B cells, and both contained extensive evidence of somatic hypermutation with a pattern consistent with GC reactions. This analysis identified clonally related TT-specific memory B cells and PCs. Within clusters of clonally related cells, sequences shared a number of mutations but also could contain additional base pair changes. The data indicate that although following secondary immunization PCs can derive from memory B cells without further somatic hypermutation, in some circumstances, likely within GC reactions, asymmetric mutation can occur. These results suggest that after the fate decision to differentiate into secondary memory B cells or PCs, some committed precursors continue to proliferate and mutate their V(H) genes.
Collapse
Affiliation(s)
- Daniela Frölich
- Department of Medicine, Rheumatology, and Clinical Immunology, Charité University Medicine Berlin, Germany
| | | | | | | | | | | | | |
Collapse
|
28
|
Simmons MP, Müller KF, Webb CT. The deterministic effects of alignment bias in phylogenetic inference. Cladistics 2010; 27:402-416. [DOI: 10.1111/j.1096-0031.2010.00333.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
29
|
Subramanian AR, Hiran S, Steinkamp R, Meinicke P, Corel E, Morgenstern B. DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS. Nucleic Acids Res 2010; 38:W19-22. [PMID: 20497995 PMCID: PMC2896137 DOI: 10.1093/nar/gkq442] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Revised: 05/04/2010] [Accepted: 05/09/2010] [Indexed: 12/29/2022] Open
Abstract
We introduce web interfaces for two recent extensions of the multiple-alignment program DIALIGN. DIALIGN-TX combines the greedy heuristic previously used in DIALIGN with a more traditional 'progressive' approach for improved performance on locally and globally related sequence sets. In addition, we offer a version of DIALIGN that uses predicted protein secondary structures together with primary sequence information to construct multiple protein alignments. Both programs are available through 'Göttingen Bioinformatics Compute Server' (GOBICS).
Collapse
Affiliation(s)
- Amarendran R. Subramanian
- Wilhelm-Schickard-Institut für Informatik, University of Tübingen, Sand 13, 72076 Tübingen, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany and Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
| | - Suvrat Hiran
- Wilhelm-Schickard-Institut für Informatik, University of Tübingen, Sand 13, 72076 Tübingen, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany and Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
| | - Rasmus Steinkamp
- Wilhelm-Schickard-Institut für Informatik, University of Tübingen, Sand 13, 72076 Tübingen, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany and Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
| | - Peter Meinicke
- Wilhelm-Schickard-Institut für Informatik, University of Tübingen, Sand 13, 72076 Tübingen, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany and Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
| | - Eduardo Corel
- Wilhelm-Schickard-Institut für Informatik, University of Tübingen, Sand 13, 72076 Tübingen, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany and Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
| | - Burkhard Morgenstern
- Wilhelm-Schickard-Institut für Informatik, University of Tübingen, Sand 13, 72076 Tübingen, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany and Department of Mathematics, Indian Institute of Technology, Kharagpur, 721 302, India
| |
Collapse
|
30
|
A min-cut algorithm for the consistency problem in multiple sequence alignment. Bioinformatics 2010; 26:1015-21. [DOI: 10.1093/bioinformatics/btq082] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
31
|
Mount DW. Comparing programs and methods to use for global multiple sequence alignment. Cold Spring Harb Protoc 2010; 2009:pdb.ip61. [PMID: 20147201 DOI: 10.1101/pdb.ip61] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
It is difficult to find a global optimal alignment of more than two sequences (and, especially, more than three) that includes matches, mismatches, and gaps and that takes into account the degree of variation in all of the sequences at the same time. Thus, approximate methods are used, such as progressive global alignment, iterative global alignment, alignments based on locally conserved patterns found in the same order in the sequences, statistical methods that generate probabilistic models of the sequences, and multiple sequence alignments produced by graph-based methods. When 10 or more sequences are being compared, it is common to begin by determining sequence similarities between all pairs of sequences in the set. A variety of methods are then available to cluster the sequences into the most related groups or into a phylogenetic tree. This article discusses several of these methods and provides data that compare their utility under various conditions.
Collapse
|
32
|
Kemena C, Notredame C. Upcoming challenges for multiple sequence alignment methods in the high-throughput era. Bioinformatics 2009; 25:2455-65. [PMID: 19648142 PMCID: PMC2752613 DOI: 10.1093/bioinformatics/btp452] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Revised: 06/24/2009] [Accepted: 07/16/2009] [Indexed: 12/22/2022] Open
Abstract
This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler alternative methods. The validation of existing methods is also discussed at length with the detailed description of recent results and some suggestions for future validation strategies. The last part of the review addresses future challenges for multiple sequence alignment methods in the genomic era, most notably the need to cope with very large sequences, the need to integrate large amounts of experimental data, the need to accurately align non-coding and non-transcribed sequences and finally, the need to integrate many alternative methods and approaches.
Collapse
Affiliation(s)
- Carsten Kemena
- Centre For Genomic Regulation, Pompeus Fabre University, Carrer del Doctor Aiguader 88, 08003 Barcelona, Spain
| | | |
Collapse
|
33
|
Abstract
BACKGROUND Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. RESULTS This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. CONCLUSION RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences.
Collapse
Affiliation(s)
- Javid Taheri
- School of Information Technologies, J12, The University of Sydney, Sydney, NSW 2006, Australia
| | - Albert Y Zomaya
- School of Information Technologies, J12, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
34
|
Abstract
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/. Biological sequence alignment is one of the fundamental problems in comparative genomics, yet it remains unsolved. Over sixty sequence alignment programs are listed on Wikipedia, and many new programs are published every year. However, many popular programs suffer from pathologies such as aligning unrelated sequences and producing discordant alignments in protein (amino acid) and codon (nucleotide) space, casting doubt on the accuracy of the inferred alignments. Inaccurate alignments can introduce large and unknown systematic biases into downstream analyses such as phylogenetic tree reconstruction and substitution rate estimation. We describe a new program for multiple sequence alignment which can align protein, RNA and DNA sequence and improves on the accuracy of existing approaches on benchmarks of protein and RNA structural alignments and simulated mammalian and fly genomic alignments. Our approach, which seeks to find the alignment which is closest to the truth under our statistical model, leaves unrelated sequences largely unaligned and produces concordant alignments in protein and codon space. It is fast enough for difficult problems such as aligning orthologous genomic regions or aligning hundreds or thousands of proteins. It furthermore has a companion GUI for visualizing the estimated alignment reliability.
Collapse
|
35
|
Misof B, Misof K. A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. Syst Biol 2009; 58:21-34. [PMID: 20525566 DOI: 10.1093/sysbio/syp006] [Citation(s) in RCA: 233] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Random similarity of sequences or sequence sections can impede phylogenetic analyses or the identification of gene homologies. Additionally, randomly similar sequences or ambiguously aligned sequence sections can negatively interfere with the estimation of substitution model parameters. Phylogenomic studies have shown that biases in model estimation and tree reconstructions do not disappear even with large data sets. In fact, these biases can become pronounced with more data. It is therefore important to identify possible random similarity within sequence alignments in advance of model estimation and tree reconstructions. Different approaches have been already suggested to identify and treat problematic alignment sections. We propose an alternative method that can identify random similarity within multiple sequence alignments (MSAs) based on Monte Carlo resampling within a sliding window. The method infers similarity profiles from pairwise sequence comparisons and subsequently calculates a consensus profile. This consensus profile represents a summary of all calculated single similarity profiles. In consequence, consensus profiles identify dominating patterns of nonrandom similarity or randomness within sections of MSAs. We show that the approach clearly identifies randomness in simulated and real data. After the exclusion of putative random sections, node support drastically improves in tree reconstructions of both data. It thus appears to be a powerful tool to identify possible biases of tree reconstructions or gene identification. The method is currently restricted to nucleotide data but will be extended to protein data in the near future.
Collapse
Affiliation(s)
- Bernhard Misof
- Zoologisches Forschungsmuseum Alexander Koenig, Molecular Biology Unit, Adenauerallee 160, 53113 Bonn, Germany.
| | | |
Collapse
|
36
|
Abstract
Multiple alignment of DNA sequences is an important step in various molecular biological analyses. As a large amount of sequence data is becoming available through genome and other large-scale sequencing projects, scalability, as well as accuracy, is currently required for a multiple sequence alignment (MSA) program. In this chapter, we outline the algorithms of an MSA program MAFFT and provide practical advice, focusing on several typical situations a biologist sometimes faces. For genome alignment, which is beyond the scope of MAFFT, we introduce two tools: TBA and MAUVE.
Collapse
Affiliation(s)
- Kazutaka Katoh
- Digital Medicine Initiative, Kyushu University, Fukuoka, Japan
| | | | | |
Collapse
|
37
|
Wagner H, Morgenstern B, Dress A. Stability of multiple alignments and phylogenetic trees: an analysis of ABC-transporter proteins family. Algorithms Mol Biol 2008; 3:15. [PMID: 18990223 PMCID: PMC2637874 DOI: 10.1186/1748-7188-3-15] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2008] [Accepted: 11/06/2008] [Indexed: 11/17/2022] Open
Abstract
Background Sequence-based phylogeny reconstruction is a fundamental task in Bioinformatics. Practically all methods for phylogeny reconstruction are based on multiple alignments. The quality and stability of the underlying alignments is therefore crucial for phylogenetic analysis. Results In this short report, we investigate alignments and alignment-based phylogenies constructed for a set of 22 ABC transporters using CLUSTAL W and DIALIGN. Comparing the 22 "one-out phylogenies" one can obtain for this sequence set, some intrinsic phylogenetic instability is observed — even if attention is restricted to branches with high bootstrapping frequencies, the so-called safe branches. We show that this instability is caused by the fact that both, CLUSTAL W as well as DIALIGN, apparently get "confused" by sequence repeats in some of the ABC-transporter. To deal with such problems, two new DIALIGN options are introduced that prove helpful in our context, the "exclude-fragment" (or "xfr") and the "self-comparison" (or "sc") option. Conclusion "One-out strategies", known to be a useful tool for testing the stability of all sorts of data-analysis procedures, can successfully be used also in testing alignment stability. In case instabilities are observed, the sequences under consideration should be carefully checked for putative causes. In case one suspects sequence repeats to be the cause, the new "sc" option can be used to detect such repeats, and the "xfr" option can help to resolve the resulting problems.
Collapse
|
38
|
Banerjee N, Sarani R, Ranjani CV, Sowmiya G, Michael D, Balakrishnan N, Sekar K. Algorithm to find distant repeats in a single protein sequence. Bioinformation 2008; 3:28-32. [PMID: 19052663 PMCID: PMC2586129 DOI: 10.6026/97320630003028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Accepted: 07/24/2008] [Indexed: 11/23/2022] Open
Abstract
Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies.
Collapse
Affiliation(s)
- Nirjhar Banerjee
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | - Rangarajan Sarani
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | | | - Govindaraj Sowmiya
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | - Daliah Michael
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | | | - Kanagaraj Sekar
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
39
|
Lu Y, Sze SH. Multiple Sequence Alignment Based on Profile Alignment of Intermediate Sequences. J Comput Biol 2008; 15:767-77. [PMID: 18662101 DOI: 10.1089/cmb.2007.0132] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Yue Lu
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
| | - Sing-Hoi Sze
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
- Computer Science, Texas A&M University, College Station, Texas
| |
Collapse
|
40
|
Abstract
We present a new method for multiple sequence alignment (MSA), which we call MSACSA. The method is based on the direct application of a global optimization method called the conformational space annealing (CSA) to a consistency-based score function constructed from pairwise sequence alignments between constituting sequences. We applied MSACSA to two MSA databases, the 82 families from the BAliBASE reference set 1 and the 366 families from the HOMSTRAD set. In all 450 cases, we obtained well optimized alignments satisfying more pairwise constraints producing, in consequence, more accurate alignments on average compared with a recent alignment method SPEM. One of the advantages of MSACSA is that it provides not just the global minimum alignment but also many distinct low-lying suboptimal alignments for a given objective function. This is due to the fact that conformational space annealing can maintain conformational diversity while searching for the conformations with low energies. This characteristics can help us to alleviate the problem arising from using an inaccurate score function. The method was the key factor for our success in the recent blind protein structure prediction experiment.
Collapse
|
41
|
Subramanian AR, Kaufmann M, Morgenstern B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol 2008; 3:6. [PMID: 18505568 PMCID: PMC2430965 DOI: 10.1186/1748-7188-3-6] [Citation(s) in RCA: 139] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 05/27/2008] [Indexed: 11/10/2022] Open
Abstract
Background DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach. Results Our new heuristic produces significantly better alignments, especially on globally related sequences, without increasing the CPU time and memory consumption exceedingly. The new method is based on a guide tree; to detect possible spurious sequence similarities, it employs a vertex-cover approximation on a conflict graph. We performed benchmarking tests on a large set of nucleic acid and protein sequences For protein benchmarks we used the benchmark database BALIBASE 3 and an updated release of the database IRMBASE 2 for assessing the quality on globally and locally related sequences, respectively. For alignment of nucleic acid sequences, we used BRAliBase II for global alignment and a newly developed database of locally related sequences called DIRM-BASE 1. IRMBASE 2 and DIRMBASE 1 are constructed by implanting highly conserved motives at random positions in long unalignable sequences. Conclusion On BALIBASE3, our new program performs significantly better than the previous program DIALIGN-T and outperforms the popular global aligner CLUSTAL W, though it is still outperformed by programs that focus on global alignment like MAFFT, MUSCLE and T-COFFEE. On the locally related test sets in IRMBASE 2 and DIRM-BASE 1, our method outperforms all other programs while MAFFT E-INSi is the only method that comes close to the performance of DIALIGN-TX.
Collapse
|
42
|
Simossis V, Kleinjung J, Heringa J. An overview of multiple sequence alignment. CURRENT PROTOCOLS IN BIOINFORMATICS 2008; Chapter 3:3.7.1-3.7.26. [PMID: 18428699 DOI: 10.1002/0471250953.bi0307s03] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Multiple sequence alignment is perhaps the most commonly applied bioinformatics technique. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. In this unit, an overview of multiple sequence alignment techniques is presented, covering a history of nearly 30 years from the early pioneering methods to the current state-of-the-art techniques. Methodological and biological issues and end-user considerations, as well as alignment evaluation issues, are discussed.
Collapse
Affiliation(s)
- Victor Simossis
- Integrative Bioinformatics Institute (IBIVU), Free University, Amsterdam, The Netherlands
| | | | | |
Collapse
|
43
|
Huang W, Nevins JR, Ohler U. Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools. Genome Biol 2008; 8:R225. [PMID: 17956628 PMCID: PMC2246299 DOI: 10.1186/gb-2007-8-10-r225] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 10/20/2007] [Accepted: 10/24/2007] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The phenomenon of functional site turnover has important implications for the study of regulatory region evolution, such as for promoter sequence alignments and transcription factor binding site (TFBS) identification. At present, it remains difficult to estimate TFBS turnover rates on real genomic sequences, as reliable mappings of functional sites across related species are often not available. As an alternative, we introduce a flexible new simulation system, Phylogenetic Simulation of Promoter Evolution (PSPE), designed to study functional site turnovers in regulatory sequences. RESULTS Using PSPE, we study replacement turnover rates of different individual TFBSs and simple modules of two sites under neutral evolutionary functional constraints. We find that TFBS replacement turnover can happen rapidly in promoters, and turnover rates vary significantly among different TFBSs and modules. We assess the influence of different constraints such as insertion/deletion rate and translocation distances. Complementing the simulations, we give simple but effective mathematical models for TFBS turnover rate prediction. As one important application of PSPE, we also present a first systematic evaluation of multiple sequence aligners regarding their capability of detecting TFBSs in promoters with site turnovers. CONCLUSION PSPE allows researchers for the first time to investigate TFBS replacement turnovers in promoters systematically. The assessment of alignment tools points out the limitations of current approaches to identify TFBSs in non-coding sequences, where turnover events of functional sites may happen frequently, and where we are interested in assessing the similarity on the functional level. PSPE is freely available at the authors' website.
Collapse
Affiliation(s)
- Weichun Huang
- Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA.
| | | | | |
Collapse
|
44
|
Simmons MP, Cappa JJ, Archer RH, Ford AJ, Eichstedt D, Clevinger CC. Phylogeny of the Celastreae (Celastraceae) and the relationships of Catha edulis (qat) inferred from morphological characters and nuclear and plastid genes. Mol Phylogenet Evol 2008; 48:745-57. [PMID: 18550389 DOI: 10.1016/j.ympev.2008.04.039] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Revised: 04/29/2008] [Accepted: 04/30/2008] [Indexed: 11/16/2022]
Abstract
The phylogeny of Celastraceae tribe Celastreae, which includes about 350 species of trees and shrubs in 15 genera, was inferred in a simultaneous analysis of morphological characters together with nuclear (ITS and 26S rDNA) and plastid (matK, trnL-F) genes. A strong correlation was found between the geography of the species sampled and their inferred relationships. Species of Maytenus and Gymnosporia from different regions were resolved as polyphyletic groups. Maytenus was resolved in three lineages (New World, African, and Austral-Pacific), while Gymnosporia was resolved in two lineages (New World and Old World). Putterlickia was resolved as nested within the Old World Gymnosporia. Catha edulis (qat, khat) was resolved as sister to the clade of Allocassine, Cassine, Lauridia, and Maurocenia. Gymnosporia cassinoides, which is reportedly chewed as a stimulant in the Canary Islands, was resolved as a derived member of Gymnosporia and is more closely related to Lydenburgia and Putterlickia than it is to Catha. Therefore, all eight of these genera are candidates for containing cathinone- and/or cathine-related alkaloids.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, E106 Anatomy/Zoology Building, Fort Collins, CO 80523-1878, USA.
| | | | | | | | | | | |
Collapse
|
45
|
Abstract
Multiple sequence alignment (MSA) has assumed a key role in comparative structure and function analysis of biological sequences. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments. It should be stressed, however, that many complex biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, so that this chapter might constitute a helpful guide or starting point for researchers who aim to construct a reliable MSA.
Collapse
Affiliation(s)
- Walter Pirovano
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
46
|
Abstract
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
47
|
Affiliation(s)
- Cédric Notredame
- Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology, Parc Scientifique de Luminy, Marseille, France.
| |
Collapse
|
48
|
van Nimwegen E. Finding regulatory elements and regulatory motifs: a general probabilistic framework. BMC Bioinformatics 2007; 8 Suppl 6:S4. [PMID: 17903285 PMCID: PMC1995539 DOI: 10.1186/1471-2105-8-s6-s4] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Over the last two decades a large number of algorithms has been developed for regulatory motif finding. Here we show how many of these algorithms, especially those that model binding specificities of regulatory factors with position specific weight matrices (WMs), naturally arise within a general Bayesian probabilistic framework. We discuss how WMs are constructed from sets of regulatory sites, how sites for a given WM can be discovered by scanning of large sequences, how to cluster WMs, and more generally how to cluster large sets of sites from different WMs into clusters. We discuss how 'regulatory modules', clusters of sites for subsets of WMs, can be found in large intergenic sequences, and we discuss different methods for ab initio motif finding, including expectation maximization (EM) algorithms, and motif sampling algorithms. Finally, we extensively discuss how module finding methods and ab initio motif finding methods can be extended to take phylogenetic relations between the input sequences into account, i.e. we show how motif finding and phylogenetic footprinting can be integrated in a rigorous probabilistic framework. The article is intended for readers with a solid background in applied mathematics, and preferably with some knowledge of general Bayesian probabilistic methods. The main purpose of the article is to elucidate that all these methods are not a disconnected set of individual algorithmic recipes, but that they are just different facets of a single integrated probabilistic theory.
Collapse
Affiliation(s)
- Erik van Nimwegen
- Biozentrum, University of Basel, and Swiss Institute of Bioinformatics, Klingelbergstrasse 50/70, Basel, Switzerland.
| |
Collapse
|
49
|
Jangam S, Chakraborti N. A novel method for alignment of two nucleic acid sequences using ant colony optimization and genetic algorithms. Appl Soft Comput 2007. [DOI: 10.1016/j.asoc.2006.11.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Chung YS, Lee WH, Tang CY, Lu CL. RE-MuSiC: a tool for multiple sequence alignment with regular expression constraints. Nucleic Acids Res 2007; 35:W639-44. [PMID: 17488842 PMCID: PMC1933182 DOI: 10.1093/nar/gkm275] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
RE-MuSiC is a web-based multiple sequence alignment tool that can incorporate biological knowledge about structure, function, or conserved patterns regarding the sequences of interest. It accepts amino acid or nucleic acid sequences and a set of constraints as inputs. The constraints are pattern descriptions, instead of exact positions of fragments to be aligned together. The output is an alignment where for each pattern (constraint), an occurrence on each sequence can be found aligned together with those on the other sequences, in a manner that the overall alignment is optimized. Its predecessor, MuSiC, has been found useful by researchers since its release in 2004. However, it is noticed in applications that the pattern formulation adopted in MuSiC, namely, plain strings allowing mismatches, is not expressive and flexible enough. The constraint formulation adopted in RE-MuSiC is therefore enhanced to be regular expressions, which is convenient in expressing many biologically significant patterns like those collected in the PROSITE database, or structural consensuses that often involve variable ranges between conserved parts. Experiments demonstrate that RE-MuSiC can be used to help predict important residues and locate phylogenetically conserved structural elements. RE-MuSiC is available on-line at http://140.113.239.131/RE-MUSIC.
Collapse
Affiliation(s)
- Yun-Sheng Chung
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, Institute of Bioinformatics, National Chiao Tung University, Hsinchu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Wei-Hsun Lee
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, Institute of Bioinformatics, National Chiao Tung University, Hsinchu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, Institute of Bioinformatics, National Chiao Tung University, Hsinchu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Hsinchu 300, Taiwan, Institute of Bioinformatics, National Chiao Tung University, Hsinchu 300, Taiwan and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
- *To whom correspondence should be addressed. +886-3-5712121+886-3-5729288
| |
Collapse
|