1
|
Wonok W, Sudmoon R, Tanee T, Lee SY, Chaveerach A. Complete Chloroplast Genome of Four Thai Native Dioscorea Species: Structural, Comparative and Phylogenetic Analyses. Genes (Basel) 2023; 14:genes14030703. [PMID: 36980975 PMCID: PMC10048501 DOI: 10.3390/genes14030703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023] Open
Abstract
The chloroplast genomes of Dioscorea brevipetiolata, D. depauperata, D. glabra, and D. pyrifolia are 153,370–153,503 bp in size. A total of 113 genes were predicted, including 79 protein-coding sequences (CDS), 30 tRNA, and four rRNA genes. The overall GC content for all four species was 37%. Only mono-, di-, and trinucleotides were present in the genome. Genes adjacent to the junction borders were similar in all species analyzed. Eight distinct indel variations were detected in the chloroplast genome alignment of 24 Dioscorea species. At a cut-off point of Pi = 0.03, a sliding window analysis based on 25 chloroplast genome sequences of Dioscorea species revealed three highly variable regions, which included three CDS (trnC, ycf1, and rpl32), as well as an intergenic spacer region, ndhF-rpl32. A phylogenetic tree based on the complete chloroplast genome sequence displayed an almost fully resolved relationship in Dioscorea. However, D. brevipetiolata, D. depauperata, and D. glabra were clustered together with D. alata, while D. pyrifolia was closely related to D. aspersa. As Dioscorea is a diverse genus, genome data generated in this study may contribute to a better understanding of the genetic identity of these species, which would be useful for future taxonomic work of Dioscorea.
Collapse
Affiliation(s)
- Warin Wonok
- Department of Biology, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand
| | | | - Tawatchai Tanee
- Faculty of Environment and Resource Studies, Mahasarakham University, Maha Sarakham 44150, Thailand
| | - Shiou Yih Lee
- Faculty of Health and Life Sciences, INTI International University, Nilai 71800, Negeri Sembilan, Malaysia
| | - Arunrat Chaveerach
- Department of Biology, Faculty of Science, Khon Kaen University, Khon Kaen 40002, Thailand
- Correspondence:
| |
Collapse
|
2
|
Singh J, Raina A, Sangwan N, Chauhan A, Avti PK. Structural, molecular hybridization and network based identification of miR-373-3p and miR-520e-3p as regulators of NR4A2 human gene involved in neurodegeneration. NUCLEOSIDES, NUCLEOTIDES & NUCLEIC ACIDS 2022; 41:419-443. [PMID: 35272569 DOI: 10.1080/15257770.2022.2048851] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
MicroRNAs (miRNAs) are short non-coding RNAs with a 22 nucleotide sequence length and docks to the 3'UTR/5'UTR of the gene to regulate their mRNA translation to play a vital role in neurodegenerative diseases. The Nuclear Receptor gene (NR4A2), a transcription factor, and a steroid-thyroid hormone retinoid receptor is involved in neural development, memory formation, dopaminergic neurotransmission, and cellular protection from inflammatory damage. Therefore, recognizing the miRNAs is essential to efficiently target the 3'UTR/5'UTR of the NR4A2 gene and regulate neurodegeneration. Highly stabilized top miRNA-mRNA hybridized structures, their homologs, and identification of the best structures based on their least free energy were evaluated using in silico techniques. The miR-gene, gene-gene network analysis, miR-disease association, and transcription factor binding sites were also investigated. Results suggest top 166 miRNAs targeting the NR4A2 mRNA, but with a total of 10 miRNAs bindings with 100% seed sequence identity (both at 3' and 5'UTR) at the same position on the NR4A2 mRNA region. The miR-373-3p and miR-520e-3p are considered the best candidate miRNAs hybridizing with high efficiency at both 3' and 5'UTR of NR4A2 mRNA. This could be due to the most significant seed sequence length complementary, supplementary pairing, and absence of non-canonical base pairs. Furthermore, the miR-gene network, target gene-gene interaction analysis, and miR-disease association provide an understanding of the molecular, cellular, and biological processes involved in various pathways regulated by four transcription factors (PPARG, ZNF740, NRF1, and RREB1). Therefore, miR-373-3p, 520e-3p, and four transcription factors can regulate the NR4A2 gene involved in the neurodegenerative process.
Collapse
Affiliation(s)
- Jitender Singh
- Department of Biophysics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
| | - Ashvinder Raina
- Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
| | - Namrata Sangwan
- Department of Biophysics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
| | - Arushi Chauhan
- Department of Biophysics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
| | - Pramod K Avti
- Department of Biophysics, Postgraduate Institute of Medical Education and Research (PGIMER), Chandigarh, India
| |
Collapse
|
3
|
The intratumoral microbiome: Characterization methods and functional impact. Cancer Lett 2021; 522:63-79. [PMID: 34517085 DOI: 10.1016/j.canlet.2021.09.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 09/01/2021] [Accepted: 09/06/2021] [Indexed: 12/24/2022]
Abstract
Live-pathogenic bacteria, which were identified inside tumors hundreds year ago, are key elements in modern cancer research. As they have a relatively accessible genome, they offer a multitude of metabolic engineering opportunities, useful in several clinical fields. Better understanding of the tumor microenvironment and its associated microbiome would help conceptualize new metabolically engineered species, triggering efficient therapeutic responses against cancer. Unfortunately, given the low microbial biomass nature of tumors, characterizing the tumor microbiome remains a challenge. Tumors have a high host versus bacterial DNA ratio, making it extremely complex to identify tumor-associated bacteria. Nevertheless, with the improvements in next-generation analytic tools, recent studies demonstrated the existence of intratumor bacteria inside defined tumors. It is now proven that each cancer subtype has a unique microbiome, characterized by bacterial communities with specific metabolic functions. This review provides a brief overview of the main approaches used to characterize the tumor microbiome, and of the recently proposed functions of intracellular bacteria identified in oncological entities. The therapeutic aspects of live-pathogenic microbes are also discussed, regarding the tumor microenvironment of each cancer type.
Collapse
|
4
|
Maiolo M, Gatti L, Frei D, Leidi T, Gil M, Anisimova M. ProPIP: a tool for progressive multiple sequence alignment with Poisson Indel Process. BMC Bioinformatics 2021; 22:518. [PMID: 34689750 PMCID: PMC8543915 DOI: 10.1186/s12859-021-04442-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 10/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences. RESULTS We present a new progressive multiple sequence alignment tool ProPIP. The process of insertions and deletions is described using an explicit evolutionary model-the Poisson Indel Process or PIP. The method is based on dynamic programming and is implemented in a frequentist framework. The source code can be compiled on Linux, macOS and Microsoft Windows platforms. The algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at https://github.com/acg-team/ProPIP and is distributed under the terms of the GNU GPL v3 license. CONCLUSIONS The use of an explicit indel evolution model allows to avoid over-alignment, to infer gaps in a phylogenetically consistent way and to make inferences about the rates of insertions and deletions. Instead of the arbitrary gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment. As a result, indel rate settings may be optimised in order to infer phylogenetically meaningful gap patterns.
Collapse
Affiliation(s)
- Massimo Maiolo
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Schloss 1, Postfach, 8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, 1015, Lausanne, Switzerland
| | - Lorenzo Gatti
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Schloss 1, Postfach, 8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, 1015, Lausanne, Switzerland
| | - Diego Frei
- Institute of Information Systems and Networking, University of Applied Sciences and Arts of Southern Switzerland, Galleria 2, Via Cantonale 2c, 6928, Manno, Switzerland
| | - Tiziano Leidi
- Institute of Information Systems and Networking, University of Applied Sciences and Arts of Southern Switzerland, Galleria 2, Via Cantonale 2c, 6928, Manno, Switzerland
| | - Manuel Gil
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Schloss 1, Postfach, 8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, 1015, Lausanne, Switzerland
| | - Maria Anisimova
- Institute of Applied Simulation, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Schloss 1, Postfach, 8820, Wädenswil, Switzerland. .,Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Batiment Amphipole, 1015, Lausanne, Switzerland.
| |
Collapse
|
5
|
Zhang X, Kaplow IM, Wirthlin M, Park TY, Pfenning AR. HALPER facilitates the identification of regulatory element orthologs across species. Bioinformatics 2020; 36:4339-4340. [PMID: 32407523 PMCID: PMC7520040 DOI: 10.1093/bioinformatics/btaa493] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/19/2020] [Accepted: 05/08/2020] [Indexed: 01/09/2023] Open
Abstract
SUMMARY Diverse traits have evolved through cis-regulatory changes in genome sequence that influence the magnitude, timing and cell type-specificity of gene expression. Advances in high-throughput sequencing and regulatory genomics have led to the identification of regulatory elements in individual species, but these genomic regions remain difficult to align across taxonomic orders due to their lack of sequence conservation relative to protein coding genes. The groundwork for tracing the evolution of regulatory elements is provided by the recent assembly of hundreds of genomes, the generation of reference-free Cactus multiple sequence alignments of these genomes, and the development of the halLiftover tool for mapping regions across these alignments. We present halLiftover Post-processing for the Evolution of Regulatory Elements (HALPER), a tool for constructing contiguous regulatory element orthologs from the outputs of halLiftover. We anticipate that this tool will enable users to efficiently identify orthologs of regulatory elements across hundreds of species, providing novel insights into the evolution of traits that have evolved through gene expression. AVAILABILITY AND IMPLEMENTATION HALPER is implemented in python and available on github: https://github.com/pfenninglab/halLiftover-postprocessing. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Irene M Kaplow
- Department of Computational Biology.,Neuroscience Institute
| | | | - Tae Yoon Park
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
6
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
7
|
Skutkova H, Vitek M, Bezdicek M, Brhelova E, Lengerova M. Advanced DNA fingerprint genotyping based on a model developed from real chip electrophoresis data. J Adv Res 2019; 18:9-18. [PMID: 30788173 PMCID: PMC6369143 DOI: 10.1016/j.jare.2019.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Revised: 01/06/2019] [Accepted: 01/10/2019] [Indexed: 11/25/2022] Open
Abstract
Mapping chip electrophoresis distortion based on real data measurement. Determining the transformation function for the adaptive correction of band size deviation. Improving the ability to distinguish closely related DNA fingerprints. Using hierarchical clustering to adjust the global band position. Genotyping all DNA fingerprints from multiple runs at once.
Large-scale comparative studies of DNA fingerprints prefer automated chip capillary electrophoresis over conventional gel planar electrophoresis due to the higher precision of the digitalization process. However, the determination of band sizes is still limited by the device resolution and sizing accuracy. Band matching, therefore, remains the key step in DNA fingerprint analysis. Most current methods evaluate only the pairwise similarity of the samples, using heuristically determined constant thresholds to evaluate the maximum allowed band size deviation; unfortunately, that approach significantly reduces the ability to distinguish between closely related samples. This study presents a new approach based on global multiple alignments of bands of all samples, with an adaptive threshold derived from the detailed migration analysis of a large number of real samples. The proposed approach allows the accurate automated analysis of DNA fingerprint similarities for extensive epidemiological studies of bacterial strains, thereby helping to prevent the spread of dangerous microbial infections.
Collapse
Key Words
- Automated chip capillary electrophoresis
- Band matching
- DBSCAN, density-based spatial clustering of applications with noise
- DNA fingerprinting
- DTW, dynamic time warping
- ESBL, extended spectrum beta-lactamases
- Gel sample distortion
- Genotyping
- KLPN, Klebsiella pneumonia
- MALDI-TOF, matrix assisted laser desorption ionization – time of flight
- Pattern recognition
- R-square, ratio of the sum of squares
- RMSE, root mean squared error
- SD, standard deviation
- SLINK, single linkage
- SSE, sum of squares due to error
- UPGMA, unweighted pair group method with arithmetic mean
- rep-PCR, repetitive element palindromic polymerase chain reaction
Collapse
Affiliation(s)
- Helena Skutkova
- Department of Biomedical Engineering, Brno University of Technology, Technicka 12, 616 00 Brno, Czech Republic
- Corresponding author.
| | - Martin Vitek
- Department of Biomedical Engineering, Brno University of Technology, Technicka 12, 616 00 Brno, Czech Republic
| | - Matej Bezdicek
- Department of Internal Medicine, Hematology and Oncology, Masaryk University and University Hospital Brno, Cernopolni 212/9, 662 63 Brno, Czech Republic
| | - Eva Brhelova
- Department of Internal Medicine, Hematology and Oncology, Masaryk University and University Hospital Brno, Cernopolni 212/9, 662 63 Brno, Czech Republic
| | - Martina Lengerova
- Department of Internal Medicine, Hematology and Oncology, Masaryk University and University Hospital Brno, Cernopolni 212/9, 662 63 Brno, Czech Republic
| |
Collapse
|
8
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
9
|
Ma Q, Tian X, Jiang Z, Huang J, Liu Q, Lu X, Luo Q, Zhou R. Neutralizing epitopes mapping of human adenovirus type 14 hexon. Vaccine 2015; 33:6659-65. [PMID: 26546264 DOI: 10.1016/j.vaccine.2015.10.117] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 10/16/2015] [Accepted: 10/19/2015] [Indexed: 11/16/2022]
Abstract
Human adenoviruses 14 (HAdV-14) caused several clusters of acute respiratory disease (ARD) outbreaks in both civilian and military settings. The identification of the neutralizing epitopes of HAdV-14 is important for the surveillance and control of infection. Since the previous studies had indicated that the adenoviruses neutralizing epitopes were likely to be exposed on the surface of the hexon, four epitope peptides, A14R1 (residues 141-157), A14R2 (residues 181-189), A14R4 (residues 252-260) and A14R7 (residues 430-442) were predicted and mapped onto the 3D structures of hexon by homology modeling approach. Then the four peptides were synthesized, and all the four putative epitopes were identified as neutralizing epitopes by enzyme-linked immunosorbent assay (ELISA) and neutralization tests (NT). Finally we incorporated the four epitopes into human adenoviruses 3 (HAdV-3) vectors using the "antigen capsid-incorporation" strategy, and two chimeric adenoviruses, A14R2A3 and A14R4A3, were successfully obtained which displayed A14R2 and A14R4 respectively on the hexon surface of HAdV-3 virions. Further analysis showed that the two chimeric viruses antiserum could neutralize both HAdV-14 and HAdV-3 infection. The neutralization titers of anti-A14R4A3 group were significantly higher than the anti-KLH-A14R4 group (P=0.0442). These findings have important implications for the development of peptide-based broadly protective HAdV-14 and HAdV-3 bivalent vaccine.
Collapse
Affiliation(s)
- Qiang Ma
- State Key Laboratory of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou Medical University, Guangzhou 510230, China; Dongguan Institute of Pediatrics, Dongguan Children's Hospital, Dongguan 523325, China.
| | - Xingui Tian
- State Key Laboratory of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou Medical University, Guangzhou 510230, China; Dongguan Institute of Pediatrics, Dongguan Children's Hospital, Dongguan 523325, China.
| | - Zaixue Jiang
- Dongguan Institute of Pediatrics, Dongguan Children's Hospital, Dongguan 523325, China.
| | - Junfeng Huang
- School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.
| | - Qian Liu
- Dongguan Institute of Pediatrics, Dongguan Children's Hospital, Dongguan 523325, China.
| | - Xiaomei Lu
- Dongguan Institute of Pediatrics, Dongguan Children's Hospital, Dongguan 523325, China.
| | - Qingming Luo
- Dongguan Institute of Pediatrics, Dongguan Children's Hospital, Dongguan 523325, China.
| | - Rong Zhou
- State Key Laboratory of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou Medical University, Guangzhou 510230, China.
| |
Collapse
|
10
|
Andreakis N, Høj L, Kearns P, Hall MR, Ericson G, Cobb RE, Gordon BR, Evans-Illidge E. Diversity of Marine-Derived Fungal Cultures Exposed by DNA Barcodes: The Algorithm Matters. PLoS One 2015; 10:e0136130. [PMID: 26308620 PMCID: PMC4550264 DOI: 10.1371/journal.pone.0136130] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 07/29/2015] [Indexed: 01/11/2023] Open
Abstract
Marine fungi are an understudied group of eukaryotic microorganisms characterized by unresolved genealogies and unstable classification. Whereas DNA barcoding via the nuclear ribosomal internal transcribed spacer (ITS) provides a robust and rapid tool for fungal species delineation, accurate classification of fungi is often arduous given the large number of partial or unknown barcodes and misidentified isolates deposited in public databases. This situation is perpetuated by a paucity of cultivable fungal strains available for phylogenetic research linked to these data sets. We analyze ITS barcodes produced from a subsample (290) of 1781 cultured isolates of marine-derived fungi in the Bioresources Library located at the Australian Institute of Marine Science (AIMS). Our analysis revealed high levels of under-explored fungal diversity. The majority of isolates were ascomycetes including representatives of the subclasses Eurotiomycetidae, Hypocreomycetidae, Sordariomycetidae, Pleosporomycetidae, Dothideomycetidae, Xylariomycetidae and Saccharomycetidae. The phylum Basidiomycota was represented by isolates affiliated with the genera Tritirachium and Tilletiopsis. BLAST searches revealed 26 unknown OTUs and 50 isolates corresponding to previously uncultured, unidentified fungal clones. This study makes a significant addition to the availability of barcoded, culturable marine-derived fungi for detailed future genomic and physiological studies. We also demonstrate the influence of commonly used alignment algorithms and genetic distance measures on the accuracy and comparability of estimating Operational Taxonomic Units (OTUs) by the automatic barcode gap finder (ABGD) method. Large scale biodiversity screening programs that combine datasets using algorithmic OTU delineation pipelines need to ensure compatible algorithms have been used because the algorithm matters.
Collapse
Affiliation(s)
- Nikos Andreakis
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Lone Høj
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Philip Kearns
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Michael R. Hall
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Gavin Ericson
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Rose E. Cobb
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | - Benjamin R. Gordon
- Australian Institute of Marine Science, PMB 3, Townsville, Queensland, 4810, Australia
| | | |
Collapse
|
11
|
Skutkova H, Vitek M, Babula P, Kizek R, Provaznik I. Classification of genomic signals using dynamic time warping. BMC Bioinformatics 2013; 14 Suppl 10:S1. [PMID: 24267034 PMCID: PMC3750471 DOI: 10.1186/1471-2105-14-s10-s1] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Classification methods of DNA most commonly use comparison of the differences in DNA symbolic records, which requires the global multiple sequence alignment. This solution is often inappropriate, causing a number of imprecisions and requires additional user intervention for exact alignment of the similar segments. The similar segments in DNA represented as a signal are characterized by a similar shape of the curve. The DNA alignment in genomic signals may adjust whole sections not only individual symbols. The dynamic time warping (DTW) is suitable for this purpose and can replace the multiple alignment of symbolic sequences in applications, such as phylogenetic analysis. METHODS The proposed method is composed of three main parts. The first part represent conversion of symbolic representation of DNA sequences in the form of a string of A,C,G,T symbols to signal representation in the form of cumulated phase of complex components defined for each symbol. Next part represents signals size adjustment realized by standard signal preprocessing methods: median filtration, detrendization and resampling. The final part necessary for genomic signals comparison is position and length alignment of genomic signals by dynamic time warping (DTW). RESULTS The application of the DTW on set of genomic signals was evaluated in dendrogram construction using cluster analysis. The resulting tree was compared with a classical phylogenetic tree reconstructed using multiple alignment. The classification of genomic signals using the DTW is evolutionary closer to phylogeny of organisms. This method is more resistant to errors in the sequences and less dependent on the number of input sequences. CONCLUSIONS Classification of genomic signals using dynamic time warping is an adequate variant to phylogenetic analysis using the symbolic DNA sequences alignment; in addition, it is robust, quick and more precise technique.
Collapse
|
12
|
Moradi S, Azerang P, Khalaj V, Sardari S. Antifungal indole and pyrrolidine-2,4-Dione derivative peptidomimetic lead design based on in silico study of bioactive Peptide families. Avicenna J Med Biotechnol 2013; 5:42-53. [PMID: 23626876 PMCID: PMC3572706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2012] [Accepted: 08/22/2012] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The rise of opportunistic fungal infections highlights the need for development of new antimicrobial agents. Antimicrobial Peptides (AMPs) and Antifungal Peptides (AFPs) are among the agents with minimal resistance being developed against them, therefore they can be used as structural templates for design of new antimicrobial agents. METHODS In the present study four antifungal peptidomimetic structures named C1 to C4 were designed based on plant defensin of Pisum sativum. Minimum inhibitory concentrations (MICs) for these structures were determined against Aspergillus niger N402, Candida albicans ATCC 10231, and Saccharomyces cerevisiae PTCC 5052. RESULTS C1 and C2 showed more potent antifungal activity against these fungal strains compared to C3 and C4. The structure C2 demonstrated a potent antifungal activity among them and could be used as a template for future study on antifungal peptidomemetics design. Sequences alignments led to identifying antifungal decapeptide (KTCENLADTY) named KTC-Y, which its MIC was determined on fungal protoplast showing 25 (µg/ml) against Aspergillus fumigatus Af293. CONCLUSION The present approach to reach the antifungal molecules seems to be a powerful approach in design of bioactive agents based on AMP mimetic identification.
Collapse
Affiliation(s)
- Shoeib Moradi
- Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Parisa Azerang
- Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Vahid Khalaj
- Fungal Biotechnology Group, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Soroush Sardari
- Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran,Corresponding author: Soroush Sardari, Ph.D., Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran. Tel: +98 21 66480780. Fax: +98 21 66953311. E-mail:
| |
Collapse
|
13
|
Kolekar P, Kale M, Kulkarni-Kale U. Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping. Mol Phylogenet Evol 2012; 65:510-22. [PMID: 22820020 DOI: 10.1016/j.ympev.2012.07.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2012] [Accepted: 07/08/2012] [Indexed: 11/30/2022]
Abstract
The data deluge in post-genomic era demands development of novel data mining tools. Existing molecular phylogeny analyses (MPAs) developed for individual gene/protein sequences are alignment-based. However, the size of genomic data and uncertainties associated with alignments, necessitate development of alignment-free methods for MPA. Derivation of distances between sequences is an important step in both, alignment-dependant and alignment-free methods. Various alignment-free distance measures based on oligo-nucleotide frequencies, information content, compression techniques, etc. have been proposed. However, these distance measures do not account for relative order of components viz. nucleotides or amino acids. A new distance measure, based on the concept of 'return time distribution' (RTD) of k-mers is proposed, which accounts for the sequence composition and their relative orders. Statistical parameters of RTDs are used to derive a distance function. The resultant distance matrix is used for clustering and phylogeny using Neighbor-joining. Its performance for MPA and subtyping was evaluated using simulated data generated by block-bootstrap, receiver operating characteristics and leave-one-out cross validation methods. The proposed method was successfully applied for MPA of family Flaviviridae and subtyping of Dengue viruses. It is observed that method retains resolution for classification and subtyping of viruses at varying levels of sequence similarity and taxonomic hierarchy.
Collapse
|
14
|
Jeon J, Nam HJ, Choi YS, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011; 28:2675-85. [PMID: 21470969 DOI: 10.1093/molbev/msr094] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
An improved understanding of protein conformational changes has broad implications for elucidating the mechanisms of various biological processes and for the design of protein engineering experiments. Understanding rearrangements of residue interactions is a key component in the challenge of describing structural transitions. Evolutionary properties of protein sequences and structures are extensively studied; however, evolution of protein motions, especially with respect to interaction rearrangements, has yet to be explored. Here, we investigated the relationship between sequence evolution and protein conformational changes and discovered that structural transitions are encoded in amino acid sequences as coevolving residue pairs. Furthermore, we found that highly coevolving residues are clustered in the flexible regions of proteins and facilitate structural transitions by forming and disrupting their interactions cooperatively. Our results provide insight into the evolution of protein conformational changes and help to identify residues important for structural transitions.
Collapse
Affiliation(s)
- Jouhyun Jeon
- Division of Molecular and Life Science, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | |
Collapse
|
15
|
Kim J, Sinha S. Towards realistic benchmarks for multiple alignments of non-coding sequences. BMC Bioinformatics 2010; 11:54. [PMID: 20102627 PMCID: PMC2823711 DOI: 10.1186/1471-2105-11-54] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 01/26/2010] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND With the continued development of new computational tools for multiple sequence alignment, it is necessary today to develop benchmarks that aid the selection of the most effective tools. Simulation-based benchmarks have been proposed to meet this necessity, especially for non-coding sequences. However, it is not clear if such benchmarks truly represent real sequence data from any given group of species, in terms of the difficulty of alignment tasks. RESULTS We find that the conventional simulation approach, which relies on empirically estimated values for various parameters such as substitution rate or insertion/deletion rates, is unable to generate synthetic sequences reflecting the broad genomic variation in conservation levels. We tackle this problem with a new method for simulating non-coding sequence evolution, by relying on genome-wide distributions of evolutionary parameters rather than their averages. We then generate synthetic data sets to mimic orthologous sequences from the Drosophila group of species, and show that these data sets truly represent the variability observed in genomic data in terms of the difficulty of the alignment task. This allows us to make significant progress towards estimating the alignment accuracy of current tools in an absolute sense, going beyond only a relative assessment of different tools. We evaluate six widely used multiple alignment tools in the context of Drosophila non-coding sequences, and find the accuracy to be significantly different from previously reported values. Interestingly, the performance of most tools degrades more rapidly when there are more insertions than deletions in the data set, suggesting an asymmetric handling of insertions and deletions, even though none of the evaluated tools explicitly distinguishes these two types of events. We also examine the accuracy of two existing tools for annotating insertions versus deletions, and find their performance to be close to optimal in Drosophila non-coding sequences if provided with the true alignments. CONCLUSION We have developed a method to generate benchmarks for multiple alignments of Drosophila non-coding sequences, and shown it to be more realistic than traditional benchmarks. Apart from helping to select the most effective tools, these benchmarks will help practitioners of comparative genomics deal with the effects of alignment errors, by providing accurate estimates of the extent of these errors.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | |
Collapse
|
16
|
Prakash A, Tompa M. Assessing the discordance of multiple sequence alignments. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:542-551. [PMID: 19875854 DOI: 10.1109/tcbb.2007.70271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Multiple sequence alignments have wide applicability in many areas of computational biology, including comparative genomics, functional annotation of proteins, gene finding, and modeling evolutionary processes. Because of the computational difficulty of multiple sequence alignment and the availability of numerous tools, it is critical to be able to assess the reliability of multiple alignments. We present a tool called StatSigMA to assess whether multiple alignments of nucleotide or amino acid sequences are contaminated with one or more unrelated sequences. There are numerous applications for which StatSigMA can be used. Two such applications are to distinguish homologous sequences from nonhomologous ones and to compare alignments produced by various multiple alignment tools. We present examples of both types of applications.
Collapse
Affiliation(s)
- Amol Prakash
- Biomarker Research Initiative in Mass Spectrometry Center, Thermo, 790 Memorial Drive, Suite 201, Cambridge, MA 02139, USA.
| | | |
Collapse
|
17
|
Yuan X, Qu Z, Wu X, Wang Y, Liu L, Wei F, Gao H, Shang L, Zhang H, Cui H, Zhao Y, Wu N, Tang Y, Qin L. Molecular modeling and epitopes mapping of human adenovirus type 3 hexon protein. Vaccine 2009; 27:5103-10. [PMID: 19573641 DOI: 10.1016/j.vaccine.2009.06.041] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2008] [Revised: 04/26/2009] [Accepted: 06/10/2009] [Indexed: 11/29/2022]
Abstract
The hexon protein of human adenovirus (HAdV) processes type-specific B-cell neutralizing epitopes. We developed a new effective, reliable approach to map these epitopes on hexon protein of HAdVs. A three-dimensional (3D) model of the HAdV3 hexon was obtained by homology modeling and refined by molecular mechanics and molecular dynamics simulations. A modified evolutionary trace (ET) analysis called reverse ET (RET) was used to predict the type-specific B-cell neutralizing epitopes. An epitope-screening algorithm based on analyzing the solvent accessibility surface (SAS) area from the 3D model and calculation of sites homology using RET was designed and implemented in the BioPerl script language. Five epitope polypeptide segments were predicted and mapped onto the 3D model. Finally five polypeptides were synthesized and the predicted epitopes were identified by enzyme-linked immunosorbent assay (ELISA) and Neutralization Test (NT). It was found that the type-specific neutralizing epitopes of HAdV3 are located at the top surface of hexon tower regions (residue numbers: 135-146, 169-178, 237-251, 262-272, 420-434). This work is of great significance to the molecular design of a multivalent HAdVs vaccine.
Collapse
Affiliation(s)
- Xiaohui Yuan
- Department of Hygienic Microbiology, Harbin Medical University, PR China
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Löytynoja A, Goldman N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 2008; 320:1632-5. [PMID: 18566285 DOI: 10.1126/science.1158395] [Citation(s) in RCA: 566] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Genetic sequence alignment is the basis of many evolutionary and comparative studies, and errors in alignments lead to errors in the interpretation of evolutionary information in genomes. Traditional multiple sequence alignment methods disregard the phylogenetic implications of gap patterns that they create and infer systematically biased alignments with excess deletions and substitutions, too few insertions, and implausible insertion-deletion-event histories. We present a method that prevents these systematic errors by recognizing insertions and deletions as distinct evolutionary events. We show theoretically and practically that this improves the quality of sequence alignments and downstream analyses over a wide range of realistic alignment problems. These results suggest that insertions and sequence turnover are more common than is currently thought and challenge the conventional picture of sequence evolution and mechanisms of functional and structural changes.
Collapse
Affiliation(s)
- Ari Löytynoja
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | |
Collapse
|
19
|
Hall BG. How Well Does the HoT Score Reflect Sequence Alignment Accuracy? Mol Biol Evol 2008; 25:1576-80. [DOI: 10.1093/molbev/msn103] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
20
|
Wang AX, Ruzzo WL, Tompa M. How accurately is ncRNA aligned within whole-genome multiple alignments? BMC Bioinformatics 2007; 8:417. [PMID: 17963514 PMCID: PMC2206062 DOI: 10.1186/1471-2105-8-417] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 10/26/2007] [Indexed: 11/12/2022] Open
Abstract
Background Multiple alignment of homologous DNA sequences is of great interest to biologists since it provides a window into evolutionary processes. At present, the accuracy of whole-genome multiple alignments, particularly in noncoding regions, has not been thoroughly evaluated. Results We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment. Conclusion MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.
Collapse
Affiliation(s)
- Adrienne X Wang
- Department of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA 98195, USA.
| | | | | |
Collapse
|
21
|
Abstract
Multi-sequence alignments of large genomic regions are at the core of many computational genome-annotation approaches aimed at identifying coding regions, RNA genes, regulatory regions, and other functional features. Such alignments also underlie many genome-evolution studies. Here we review recent computational advances in the area of multi-sequence alignment, focusing on methods suitable for aligning whole vertebrate genomes. We introduce the key algorithmic ideas in use today, and identify publicly available resources for computing, accessing, and visualizing genomic alignments. Finally, we describe the latest alignment-based approaches to identify and characterize various types of functional sequences. Key areas of research are identified and directions for future improvements are suggested.
Collapse
Affiliation(s)
- Mathieu Blanchette
- McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
22
|
Colbourn CJ, Kumar S. Lower bounds on multiple sequence alignment using exact 3-way alignment. BMC Bioinformatics 2007; 8:140. [PMID: 17470273 PMCID: PMC1890300 DOI: 10.1186/1471-2105-8-140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2006] [Accepted: 04/30/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations. RESULTS Two standard and two new methods are considered for using exact 2-way and 3-way alignments to compute lower bounds on total SP alignment cost; one new method fares well with respect to accuracy, while the other reduces the computation time. The first employs exhaustive computation of exact 3-way alignments, while the second employs an efficient heuristic to compute a much smaller number of exact 3-way alignments. Calculating all 3-way alignments exactly and computing their average improves lower bounds on sum of SP cost in v-way alignments. However judicious selection of a subset of all 3-way alignments can yield a further improvement with minimal additional effort. On the other hand, a simple heuristic to select a random subset of 3-way alignments (a random packing) yields accuracy comparable to averaging all 3-way alignments with substantially less computational effort. CONCLUSION Calculation of lower bounds on SP cost (and thus the quality of an alignment) can be improved by employing a mixture of 3-way and 2-way alignments.
Collapse
Affiliation(s)
- Charles J Colbourn
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, PO Box 875301, Tempe, AZ 85287-5301, USA
- School of Computing and Informatics, Arizona State University, PO Box 878809, Tempe, AZ 85287-8809, USA
| | - Sudhir Kumar
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, PO Box 875301, Tempe, AZ 85287-5301, USA
- School of Life Sciences, Arizona State University, PO Box 875301, Tempe, AZ 85287-5301, USA
| |
Collapse
|
23
|
Abstract
DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Center for Evolutionary Functional Genomics, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, Arizona 85287-5301, USA.
| | | |
Collapse
|
24
|
Ogden TH, Rosenberg MS. Alignment and Topological Accuracy of the Direct Optimization approach via POY and Traditional Phylogenetics via ClustalW + PAUP*. Syst Biol 2007; 56:182-93. [PMID: 17454974 DOI: 10.1080/10635150701281102] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Direct optimization frameworks for simultaneously estimating alignments and phylogenies have recently been developed. One such method, implemented in the program POY, is becoming more common for analyses of variable length sequences (e.g., analyses using ribosomal genes) and for combined evidence analyses (morphology + multiple genes). Simulation of sequences containing insertion and deletion events was performed in order to directly compare a widely used method of multiple sequence alignment (ClustalW) and subsequent parsimony analysis in PAUP* with direct optimization via POY. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (clocklike, non-clocklike, and ultrametric). Alignment accuracy scores for the implied alignments from POY and the multiple sequence alignments from ClustalW were calculated and compared. In almost all cases (99.95%), ClustalW produced more accurate alignments than POY-implied alignments, judged by the proportion of correctly identified homologous sites. Topological accuracy (distance to the true tree) for POY topologies and topologies generated under parsimony in PAUP* from the ClustalW alignments were also compared. In 44.94% of the cases, Clustal alignment tree reconstructions via PAUP* were more accurate than POY, whereas in 16.71% of the cases POY reconstructions were more topologically accurate (38.38% of the time they were equally accurate). Comparisons between POY hypothesized alignments and the true alignments indicated that, on average, as alignment error increased, topological accuracy decreased.
Collapse
Affiliation(s)
- T Heath Ogden
- Department of Biological Sciences, Idaho State University, Idaho 83209, USA.
| | | |
Collapse
|
25
|
Nuin PAS, Wang Z, Tillier ERM. The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 2006; 7:471. [PMID: 17062146 PMCID: PMC1633746 DOI: 10.1186/1471-2105-7-471] [Citation(s) in RCA: 151] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Accepted: 10/24/2006] [Indexed: 11/24/2022] Open
Abstract
Background There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs. Results We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases. Conclusion Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
Collapse
Affiliation(s)
- Paulo AS Nuin
- Division of Cancer Genomics and Proteomics, Ontario Cancer Institute, University Health Network, 101 College St, M5G 1L7, Toronto, Ontario, Canada
| | - Zhouzhi Wang
- Division of Cancer Genomics and Proteomics, Ontario Cancer Institute, University Health Network, 101 College St, M5G 1L7, Toronto, Ontario, Canada
| | - Elisabeth RM Tillier
- Division of Cancer Genomics and Proteomics, Ontario Cancer Institute, University Health Network, 101 College St, M5G 1L7, Toronto, Ontario, Canada
- Dept. Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
26
|
Ogden TH, Rosenberg MS. How should gaps be treated in parsimony? A comparison of approaches using simulation. Mol Phylogenet Evol 2006; 42:817-26. [PMID: 17011794 DOI: 10.1016/j.ympev.2006.07.021] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 07/07/2006] [Accepted: 07/22/2006] [Indexed: 10/24/2022]
Abstract
Simulation with indels was used to produce alignments where true site homologies in DNA sequences were known; the gaps from these datasets were removed and the sequences were then aligned to produce hypothesized alignments. Both alignments were then analyzed under three widely used methods of treating gaps during tree reconstruction under the maximum parsimony principle. With the true alignments, for many cases (82%), there was no difference in topological accuracy for the different methods of gap coding. However, in cases where a difference was present, coding gaps as a fifth state character or as separate presence/absence characters outperformed treating gaps as unknown/missing data nearly 90% of the time. For the hypothesized alignments, on average, all gap treatment approaches performed equally well. Data sets with higher sequence divergence and more pectinate tree shapes with variable branch lengths are more affected by gap coding than datasets associated with shallower non-pectinate tree shapes.
Collapse
Affiliation(s)
- T Heath Ogden
- Department of Biological Sciences, Idaho State University, Pocatello, ID 83209, USA.
| | | |
Collapse
|
27
|
Pollard DA, Moses AM, Iyer VN, Eisen MB. Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics 2006; 7:376. [PMID: 16904011 PMCID: PMC1613255 DOI: 10.1186/1471-2105-7-376] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Accepted: 08/14/2006] [Indexed: 01/01/2023] Open
Abstract
Background Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. Results Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. Conclusion Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.
Collapse
Affiliation(s)
- Daniel A Pollard
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Alan M Moses
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Venky N Iyer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, CA 94720, USA
- Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
28
|
Abstract
Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.
Collapse
Affiliation(s)
- T Heath Ogden
- Center for Evolutionary Functional Genomics, The Biodesign Institute, and the School of Life Sciences, Arizona State University, Tempe, Arizona 85287-4501, USA.
| | | |
Collapse
|
29
|
Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006; 16:374-84. [PMID: 16713709 DOI: 10.1016/j.sbi.2006.05.006] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Revised: 03/22/2006] [Accepted: 05/08/2006] [Indexed: 10/24/2022]
Abstract
Sequence comparison is a major step in the prediction of protein structure from existing templates in the Protein Data Bank. The identification of potentially remote homologues to be used as templates for modeling target sequences of unknown structure and their accurate alignment remain challenges, despite many years of study. The most recent advances have been in combining as many sources of information as possible--including amino acid variation in the form of profiles or hidden Markov models for both the target and template families, known and predicted secondary structures of the template and target, respectively, the combination of structure alignment for distant homologues and sequence alignment for close homologues to build better profiles, and the anchoring of certain regions of the alignment based on existing biological data. Newer technologies have been applied to the problem, including the use of support vector machines to tackle the fold classification problem for a target sequence and the alignment of hidden Markov models. Finally, using the consensus of many fold recognition methods, whether based on profile-profile alignments, threading or other approaches, continues to be one of the most successful strategies for both recognition and alignment of remote homologues. Although there is still room for improvement in identification and alignment methods, additional progress may come from model building and refinement methods that can compensate for large structural changes between remotely related targets and templates, as well as for regions of misalignment.
Collapse
Affiliation(s)
- Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
| |
Collapse
|