1
|
Rahman MT, Guan D, Chaminda Lakmal HH, Decker AM, Imler GH, Kerr AT, Harris DL, Jin C. Design, Synthesis, and Structure-Activity Relationship Studies of Novel GPR88 Agonists (4-Substituted-phenyl)acetamides Based on the Reversed Amide Scaffold. ACS Chem Neurosci 2024; 15:169-192. [PMID: 38086012 PMCID: PMC10843732 DOI: 10.1021/acschemneuro.3c00684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024] Open
Abstract
The development of synthetic agonists for the orphan receptor GPR88 has recently attracted significant interest, given the promise of GPR88 as a novel drug target for psychiatric and neurodegenerative disorders. Examination of structure-activity relationships of two known agonist scaffolds 2-PCCA and 2-AMPP, as well as the recently resolved cryo-EM structure of 2-PCCA-bound GPR88, led to the design of a new scaffold based on the "reversed amide" strategy of 2-AMPP. A series of novel (4-substituted-phenyl)acetamides were synthesized and assessed in cAMP accumulation assays as GPR88 agonists, which led to the discovery of several compounds with better or comparable potencies to 2-AMPP. Computational docking studies suggest that these novel GPR88 agonists bind to the same allosteric site of GPR88 that 2-PCCA occupies. Collectively, our findings provide structural insight and SAR requirement at the allosteric site of GPR88 and a new scaffold for further development of GPR88 allosteric agonists.
Collapse
Affiliation(s)
- Md Toufiqur Rahman
- Center for Drug Discovery, Research Triangle Institute, Research Triangle Park, North Carolina 27709, United States
| | - Dongliang Guan
- Center for Drug Discovery, Research Triangle Institute, Research Triangle Park, North Carolina 27709, United States
| | - Hetti Handi Chaminda Lakmal
- Center for Drug Discovery, Research Triangle Institute, Research Triangle Park, North Carolina 27709, United States
| | - Ann M Decker
- Center for Drug Discovery, Research Triangle Institute, Research Triangle Park, North Carolina 27709, United States
| | - Gregory H Imler
- Center for Biomolecular Science and Engineering, Naval Research Laboratory, Code 6920, Washington, District of Columbia 20375, United States
| | - Andrew T Kerr
- Center for Biomolecular Science and Engineering, Naval Research Laboratory, Code 6920, Washington, District of Columbia 20375, United States
| | - Danni L Harris
- Center for Drug Discovery, Research Triangle Institute, Research Triangle Park, North Carolina 27709, United States
| | - Chunyang Jin
- Center for Drug Discovery, Research Triangle Institute, Research Triangle Park, North Carolina 27709, United States
| |
Collapse
|
2
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
3
|
Gay EA, Harris DL, Wilson JW, Blough BE. The development of diphenyleneiodonium analogs as GPR3 agonists. Bioorg Med Chem Lett 2023; 94:129427. [PMID: 37541631 PMCID: PMC10631289 DOI: 10.1016/j.bmcl.2023.129427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 07/20/2023] [Accepted: 07/29/2023] [Indexed: 08/06/2023]
Abstract
G protein-coupled receptor 3 (GPR3) is an orphan receptor potentially involved in many important physiological processes such as drug abuse, neuropathic pain, and anxiety and depression related disorders. Pharmacological studies of GPR3 have been limited due to the restricted number of known agonists and inverse agonists for this constitutively active receptor. In this medicinal chemistry study, we report the discovery of GPR3 agonists based off the diphenyleneiodonium (DPI) scaffold. The most potent full agonist was the 3-trifluoromethoxy analog (32) with an EC50 of 260 nM and 90% efficacy compared to DPI. Investigation of a homology model of GPR3 from multiple sequence alignment resulted in the finding of a binding site rich in potential π-π and π-cation interactions stabilizing DPI-scaffold agonists. MMGBSA free energy analysis showed a good correlation with trends in observed EC50s. DPI analogs retained the same high receptor selectivity for GPR3 over GPR6 and GPR12 as observed with DPI. Collectively, the DPI analog series shows that order of magnitude improvements in potency with the scaffold were attainable; however, attempts to replace the iodonium ion to make the scaffold more druggable failed.
Collapse
Affiliation(s)
- Elaine A Gay
- Center for Drug Discovery, RTI International, Research Triangle Park, NC 27709, USA.
| | - Danni L Harris
- Center for Drug Discovery, RTI International, Research Triangle Park, NC 27709, USA
| | - Joseph W Wilson
- Center for Drug Discovery, RTI International, Research Triangle Park, NC 27709, USA
| | - Bruce E Blough
- Center for Drug Discovery, RTI International, Research Triangle Park, NC 27709, USA
| |
Collapse
|
4
|
Waldl M, Spicher T, Lorenz R, Beckmann IK, Hofacker IL, Löhneysen SV, Stadler PF. Local RNA folding revisited. J Bioinform Comput Biol 2023; 21:2350016. [PMID: 37522173 DOI: 10.1142/s0219720023500166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Most of the functional RNA elements located within large transcripts are local. Local folding therefore serves a practically useful approximation to global structure prediction. Due to the sensitivity of RNA secondary structure prediction to the exact definition of sequence ends, accuracy can be increased by averaging local structure predictions over multiple, overlapping sequence windows. These averages can be computed efficiently by dynamic programming. Here we revisit the local folding problem, present a concise mathematical formalization that generalizes previous approaches and show that correct Boltzmann samples can be obtained by local stochastic backtracing in McCaskill's algorithms but not from local folding recursions. Corresponding new features are implemented in the ViennaRNA package to improve the support of local folding. Applications include the computation of maximum expected accuracy structures from RNAplfold data and a mutual information measure to quantify the sensitivity of individual sequence positions.
Collapse
Affiliation(s)
- Maria Waldl
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
| | - Thomas Spicher
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
| | - Ronny Lorenz
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
| | - Irene K Beckmann
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
| | - Sarah Von Löhneysen
- Institute of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Institute of Computer Science and Interdisciplinary Center for Bioinformatics, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig, Germany
| |
Collapse
|
5
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
6
|
Poulsen TM, Frith M. Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads. BMC Bioinformatics 2017; 18:299. [PMID: 28606054 PMCID: PMC5469136 DOI: 10.1186/s12859-017-1710-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 05/25/2017] [Indexed: 01/11/2023] Open
Abstract
Background Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment. Results Compared with existing alignment methods, VarHMM is able to model higher order distributions and quantify alignment probabilities with greater detail and accuracy. In a series of comparison tests, in which Ion Torrent sequenced DNA was mapped to similar bacterial strains, VarHMM consistently provided better strain discrimination than any of the other alignment methods that we compared with. Conclusions Our results demonstrate the advantages of higher ordered probability distribution modeling and also suggest that further development of such models would benefit read mapping in a range of other applications as well. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1710-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Thomas M Poulsen
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | - Martin Frith
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26 Aomi, Koto-ku, Tokyo, 135-0064, Japan.,Department of Computational Biology and Medical Sciences, University of Tokyo, Kashiwa, 277-8562, Japan.,AIST-Waseda CBBD-OIL, Tokyo, 169-8555, Japan
| |
Collapse
|
7
|
Andreakis N, Høj L, Kearns P, Hall MR, Ericson G, Cobb RE, Gordon BR, Evans-Illidge E. Diversity of Marine-Derived Fungal Cultures Exposed by DNA Barcodes: The Algorithm Matters. PLoS One 2015; 10:e0136130. [PMID: 26308620 PMCID: PMC4550264 DOI: 10.1371/journal.pone.0136130] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 07/29/2015] [Indexed: 01/11/2023] Open
Abstract
Marine fungi are an understudied group of eukaryotic microorganisms characterized by unresolved genealogies and unstable classification. Whereas DNA barcoding via the nuclear ribosomal internal transcribed spacer (ITS) provides a robust and rapid tool for fungal species delineation, accurate classification of fungi is often arduous given the large number of partial or unknown barcodes and misidentified isolates deposited in public databases. This situation is perpetuated by a paucity of cultivable fungal strains available for phylogenetic research linked to these data sets. We analyze ITS barcodes produced from a subsample (290) of 1781 cultured isolates of marine-derived fungi in the Bioresources Library located at the Australian Institute of Marine Science (AIMS). Our analysis revealed high levels of under-explored fungal diversity. The majority of isolates were ascomycetes including representatives of the subclasses Eurotiomycetidae, Hypocreomycetidae, Sordariomycetidae, Pleosporomycetidae, Dothideomycetidae, Xylariomycetidae and Saccharomycetidae. The phylum Basidiomycota was represented by isolates affiliated with the genera Tritirachium and Tilletiopsis. BLAST searches revealed 26 unknown OTUs and 50 isolates corresponding to previously uncultured, unidentified fungal clones. This study makes a significant addition to the availability of barcoded, culturable marine-derived fungi for detailed future genomic and physiological studies. We also demonstrate the influence of commonly used alignment algorithms and genetic distance measures on the accuracy and comparability of estimating Operational Taxonomic Units (OTUs) by the automatic barcode gap finder (ABGD) method. Large scale biodiversity screening programs that combine datasets using algorithmic OTU delineation pipelines need to ensure compatible algorithms have been used because the algorithm matters.
Collapse
Affiliation(s)
- Nikos Andreakis
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Lone Høj
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Philip Kearns
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Michael R. Hall
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Gavin Ericson
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Rose E. Cobb
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Benjamin R. Gordon
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | | |
Collapse
|
8
|
Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C. Use of ChIP-Seq data for the design of a multiple promoter-alignment method. Nucleic Acids Res 2012; 40:e52. [PMID: 22230796 PMCID: PMC3326335 DOI: 10.1093/nar/gkr1292] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.
Collapse
Affiliation(s)
- Ionas Erb
- Bioinformatics and Genomics program, Centre for Genomic Regulation and UPF, 08003 Barcelona, Spain
| | | | | | | | | | | |
Collapse
|
9
|
Genea mexicana, sp. nov., and Geopora tolucana, sp. nov., new hypogeous Pyronemataceae from Mexico, and the taxonomy of Geopora reevaluated. Mycol Prog 2011. [DOI: 10.1007/s11557-011-0781-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
10
|
Bussotti G, Raineri E, Erb I, Zytnicki M, Wilm A, Beaudoing E, Bucher P, Notredame C. BlastR--fast and accurate database searches for non-coding RNAs. Nucleic Acids Res 2011; 39:6886-95. [PMID: 21624887 PMCID: PMC3167602 DOI: 10.1093/nar/gkr335] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.
Collapse
Affiliation(s)
- Giovanni Bussotti
- Bioinformatics and Genomics program, Center for Genomic Regulation (CRG) and UPF, Barcelona, C/ D. Aiguader, 88, 08003 Barcelona, Spain
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Göker M, Grimm GW, Auch AF, Aurahs R, Kučera M. A Clustering Optimization Strategy for Molecular Taxonomy Applied to Planktonic Foraminifera SSU rDNA. Evol Bioinform Online 2010; 6:97-112. [PMID: 21037964 PMCID: PMC2964048 DOI: 10.4137/ebo.s5504] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Identifying species is challenging in the case of organisms for which primarily molecular data are available. Even if morphological features are available, molecular taxonomy is often necessary to revise taxonomic concepts and to analyze environmental DNA sequences. However, clustering approaches to delineate molecular operational taxonomic units often rely on arbitrary parameter choices. Also, distance calculation is difficult for highly alignment-ambiguous sequences. Here, we applied a recently described clustering optimization method to highly divergent planktonic foraminifera SSU rDNA sequences. We determined the distance function and the clustering setting that result in the highest agreement with morphological reference data. Alignment-free distance calculation, when adapted to the use with partly non-homologous sequences caused by distinct primer pairs, outperformed multiple sequence alignment. Clustering optimization offers new perspectives for the barcoding of species diversity and for environmental sequencing. It bridges the gap between traditional and modern taxonomic disciplines by specifically addressing the issue of how to optimally account for both genetic divergence and given species concepts.
Collapse
Affiliation(s)
- Markus Göker
- German Collection of Microorganisms and Cell Cultures (DSMZ), Inhoffenstraße 7B, 38124 Braunschweig, Germany
| | - Guido W. Grimm
- Swedish Museum of Natural History, Box 50007, Stockholm, Sweden
| | - Alexander F. Auch
- Center for Bioinformatics Tübingen, Eberhard Karls University of Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Ralf Aurahs
- Institute of Geosciences, Eberhard Karls University of Tübingen, Sigwartstraße 10, 72076 Tübingen, Germany
| | - Michal Kučera
- Institute of Geosciences, Eberhard Karls University of Tübingen, Sigwartstraße 10, 72076 Tübingen, Germany
| |
Collapse
|
12
|
Sahraeian SME, Yoon BJ. PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 2010; 38:4917-28. [PMID: 20413579 PMCID: PMC2926610 DOI: 10.1093/nar/gkq255] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/25/2010] [Accepted: 03/26/2010] [Indexed: 11/13/2022] Open
Abstract
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
Collapse
Affiliation(s)
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
13
|
Hara T, Sato K, Ohya M. MTRAP: pairwise sequence alignment algorithm by a new measure based on transition probability between two consecutive pairs of residues. BMC Bioinformatics 2010; 11:235. [PMID: 20459682 PMCID: PMC2875243 DOI: 10.1186/1471-2105-11-235] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2009] [Accepted: 05/08/2010] [Indexed: 01/03/2023] Open
Abstract
Background Sequence alignment is one of the most important techniques to analyze biological systems. It is also true that the alignment is not complete and we have to develop it to look for more accurate method. In particular, an alignment for homologous sequences with low sequence similarity is not in satisfactory level. Usual methods for aligning protein sequences in recent years use a measure empirically determined. As an example, a measure is usually defined by a combination of two quantities (1) and (2) below: (1) the sum of substitutions between two residue segments, (2) the sum of gap penalties in insertion/deletion region. Such a measure is determined on the assumption that there is no an intersite correlation on the sequences. In this paper, we improve the alignment by taking the correlation of consecutive residues. Results We introduced a new method of alignment, called MTRAP by introducing a metric defined on compound systems of two sequences. In the benchmark tests by PREFAB 4.0 and HOMSTRAD, our pairwise alignment method gives higher accuracy than other methods such as ClustalW2, TCoffee, MAFFT. Especially for the sequences with sequence identity less than 15%, our method improves the alignment accuracy significantly. Moreover, we also showed that our algorithm works well together with a consistency-based progressive multiple alignment by modifying the TCoffee to use our measure. Conclusions We indicated that our method leads to a significant increase in alignment accuracy compared with other methods. Our improvement is especially clear in low identity range of sequences. The source code is available at our web page, whose address is found in the section "Availability and requirements".
Collapse
Affiliation(s)
- Toshihide Hara
- Department of Information Sciences, Tokyo University of Science, 2641 Yamazaki, Noda City, Chiba, Japan.
| | | | | |
Collapse
|
14
|
En route to a genome-based classification of Archaea and Bacteria? Syst Appl Microbiol 2010; 33:175-82. [PMID: 20409658 DOI: 10.1016/j.syapm.2010.03.003] [Citation(s) in RCA: 250] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Revised: 03/10/2010] [Accepted: 03/17/2010] [Indexed: 11/23/2022]
Abstract
Given the considerable promise whole-genome sequencing offers for phylogeny and classification, it is surprising that microbial systematics and genomics have not yet been reconciled. This might be due to the intrinsic difficulties in inferring reasonable phylogenies from genomic sequences, particularly in the light of the significant amount of lateral gene transfer in prokaryotic genomes. However, recent studies indicate that the species tree and the hierarchical classification based on it are still meaningful concepts, and that state-of-the-art phylogenetic inference methods are able to provide reliable estimates of the species tree to the benefit of taxonomy. Conversely, we suspect that the current lack of completely sequenced genomes for many of the major lineages of prokaryotes and for most type strains is a major obstacle in progress towards a genome-based classification of microorganisms. We conclude that phylogeny-driven microbial genome sequencing projects such as the Genomic Encyclopaedia of Archaea and Bacteria (GEBA) project are likely to rectify this situation.
Collapse
|
15
|
Aurahs R, Göker M, Grimm GW, Hemleben V, Hemleben C, Schiebel R, Kucera M. Using the Multiple Analysis Approach to Reconstruct Phylogenetic Relationships among Planktonic Foraminifera from Highly Divergent and Length-polymorphic SSU rDNA Sequences. Bioinform Biol Insights 2009; 3:155-77. [PMID: 20140067 PMCID: PMC2808177 DOI: 10.4137/bbi.s3334] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The high sequence divergence within the small subunit ribosomal RNA gene (SSU rDNA) of foraminifera makes it difficult to establish the homology of individual nucleotides across taxa. Alignment-based approaches so far relied on time-consuming manual alignments and discarded up to 50% of the sequenced nucleotides prior to phylogenetic inference. Here, we investigate the potential of the multiple analysis approach to infer a molecular phylogeny of all modern planktonic foraminiferal taxa by using a matrix of 146 new and 153 previously published SSU rDNA sequences. Our multiple analysis approach is based on eleven different automated alignments, analysed separately under the maximum likelihood criterion. The high degree of congruence between the phylogenies derived from our novel approach, traditional manually homologized culled alignments and the fossil record indicates that poorly resolved nucleotide homology does not represent the most significant obstacle when exploring the phylogenetic structure of the SSU rDNA in planktonic foraminifera. We show that approaches designed to extract phylogenetically valuable signals from complete sequences show more promise to resolve the backbone of the planktonic foraminifer tree than attempts to establish strictly homologous base calls in a manual alignment.
Collapse
Affiliation(s)
- Ralf Aurahs
- Department of Micropaleontology, Institute of Geosciences, Eberhard Karls University of Tübingen, Sigwartstrabetae 10, 72076 Tübingen, Germany
| | | | | | | | | | | | | |
Collapse
|