1
|
Xu P, Meng M, Wu F, Zhang J. A comparative plastome approach enhances the assessment of genetic variation in the Melilotus genus. BMC Genomics 2024; 25:556. [PMID: 38831327 PMCID: PMC11149310 DOI: 10.1186/s12864-024-10476-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 05/29/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Melilotus, a member of the Fabaceae family, is a pivotal forage crop that is extensively cultivated in livestock regions globally due to its notable productivity and ability to withstand abiotic stress. However, the genetic attributes of the chloroplast genome and the evolutionary connections among different Melilotus species remain unresolved. RESULTS In this study, we compiled the chloroplast genomes of 18 Melilotus species and performed a comprehensive comparative analysis. Through the examination of protein-coding genes, we successfully established a robust phylogenetic tree for these species. This conclusion is further supported by the phylogeny derived from single-nucleotide polymorphisms (SNPs) across the entire chloroplast genome. Notably, our findings revealed that M. infestus, M. siculus, M. sulcatus, and M. speciosus formed a distinct subgroup within the phylogenetic tree. Additionally, the chloroplast genomes of these four species exhibit two shared inversions. Moreover, inverted repeats were observed to have reemerged in six species within the IRLC. The distribution patterns of single-nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) within protein-coding genes indicated that ycf1 and ycf2 accumulated nonconservative alterations during evolutionary development. Furthermore, an examination of the evolutionary rate of protein-coding genes revealed that rps18, rps7, and rpl16 underwent positive selection specifically in Melilotus. CONCLUSIONS We present a comparative analysis of the complete chloroplast genomes of Melilotus species. This study represents the most thorough and detailed exploration of the evolution and variability within the genus Melilotus to date. Our study provides valuable chloroplast genomic information for improving phylogenetic reconstructions and making biogeographic inferences about Melilotus and other Papilionoideae species.
Collapse
Affiliation(s)
- Pan Xu
- State Key Laboratory of Grassland Agro-ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, College of Pastoral Agriculture Science and Technology, Ministry of Education, Lanzhou University, Lanzhou, 730000, China
| | - Minghui Meng
- State Key Laboratory of Grassland Agro-ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, College of Pastoral Agriculture Science and Technology, Ministry of Education, Lanzhou University, Lanzhou, 730000, China
| | - Fan Wu
- State Key Laboratory of Grassland Agro-ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, College of Pastoral Agriculture Science and Technology, Ministry of Education, Lanzhou University, Lanzhou, 730000, China
| | - Jiyu Zhang
- State Key Laboratory of Grassland Agro-ecosystems, Key Laboratory of Grassland Livestock Industry Innovation, Ministry of Agriculture and Rural Affairs, Engineering Research Center of Grassland Industry, College of Pastoral Agriculture Science and Technology, Ministry of Education, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
2
|
Zhang H, Yang MF, Zhang Q, Yan B, Jiang YL. Screening for broad-spectrum antimicrobial endophytes from Rosa roxburghii and multi-omic analyses of biosynthetic capacity. FRONTIERS IN PLANT SCIENCE 2022; 13:1060478. [PMID: 36466255 PMCID: PMC9709285 DOI: 10.3389/fpls.2022.1060478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/28/2022] [Indexed: 06/17/2023]
Abstract
Plants with certain medicinal values are a good source for isolating function-specific endophytes. Rosa roxburghii Tratt. has been reported to be a botanical source of antimicrobial compounds, which may represent a promising candidate for screening endophytic fungi with antimicrobial potential. In this study, 54 endophytes were isolated and molecularly identified from R. roxburghii. The preliminary screening using the plate confrontation method resulted in 15 different endophytic strains showing at least one strong inhibition or three or more moderate inhibition against the 12 tested strains. Further re-screening experiments based on the disc diffusion method demonstrated that Epicoccum latusicollum HGUP191049 and Setophoma terrestris HGUP190028 had excellent antagonistic activity. The minimum inhibitory concentration (MIC) test for extracellular metabolites finally indicated that HGUP191049 had lower MIC values and a broader antimicrobial spectrum, compared to HGUP190028. Genomic, non-target metabolomic, and comparative genomic studies were performed to understand the biosynthetic capacity of the screened-out endophytic fungus. Genome sequencing and annotation of HGUP191049 revealed a size of 33.24 megabase pairs (Mbp), with 24 biosynthetic gene clusters (BGCs), where the putative antimicrobial compounds, oxyjavanicin, patulin and squalestatin S1 were encoded by three different BGCs, respectively. In addition, the non-targeted metabolic results demonstrated that the strain contained approximately 120 antimicrobial secondary metabolites and was structurally diverse. Finally, comparative genomics revealed differences in pathogenicity, virulence, and carbohydrate-active enzymes in the genome of Epicoccum spp. Moreover, the results of the comparative analyses presumed that Epicoccum is a promising source of antimicrobial terpenes, while oxyjavanicin and squalestatin S1 are antimicrobial compounds shared by the genus. In conclusion, R. roxburghii and the endophytic HGUP191049 isolated from it are promising sources of broad-spectrum antimicrobial agents.
Collapse
Affiliation(s)
- Hong Zhang
- Department of Plant Pathology, College of Agriculture, Guizhou University, Guiyang, China
- Guizhou Academy of Testing and Analysis, Guiyang, China
| | - Mao-Fa Yang
- Institute of Entomology, Guizhou University, Guiyang, China
- College of Tobacco Science, Guizhou University, Guiyang, China
| | - Qian Zhang
- Department of Plant Pathology, College of Agriculture, Guizhou University, Guiyang, China
| | - Bin Yan
- Institute of Entomology, Guizhou University, Guiyang, China
| | - Yu-Lan Jiang
- Department of Plant Pathology, College of Agriculture, Guizhou University, Guiyang, China
| |
Collapse
|
3
|
Ahmad M. Genomics and transcriptomics to protect rice ( Oryza sativa. L.) from abiotic stressors: -pathways to achieving zero hunger. FRONTIERS IN PLANT SCIENCE 2022; 13:1002596. [PMID: 36340401 PMCID: PMC9630331 DOI: 10.3389/fpls.2022.1002596] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]
Abstract
More over half of the world's population depends on rice as a major food crop. Rice (Oryza sativa L.) is vulnerable to abiotic challenges including drought, cold, and salinity since it grown in semi-aquatic, tropical, or subtropical settings. Abiotic stress resistance has bred into rice plants since the earliest rice cultivation techniques. Prior to the discovery of the genome, abiotic stress-related genes were identified using forward genetic methods, and abiotic stress-tolerant lines have developed using traditional breeding methods. Dynamic transcriptome expression represents the degree of gene expression in a specific cell, tissue, or organ of an individual organism at a specific point in its growth and development. Transcriptomics can reveal the expression at the entire genome level during stressful conditions from the entire transcriptional level, which can be helpful in understanding the intricate regulatory network relating to the stress tolerance and adaptability of plants. Rice (Oryza sativa L.) gene families found comparatively using the reference genome sequences of other plant species, allowing for genome-wide identification. Transcriptomics via gene expression profiling which have recently dominated by RNA-seq complements genomic techniques. The identification of numerous important qtl,s genes, promoter elements, transcription factors and miRNAs involved in rice response to abiotic stress was made possible by all of these genomic and transcriptomic techniques. The use of several genomes and transcriptome methodologies to comprehend rice (Oryza sativa, L.) ability to withstand abiotic stress have been discussed in this review.
Collapse
Affiliation(s)
- Mushtaq Ahmad
- Visiting Scientist Plant Sciences, University of Nebraska, Lincoln, NE, United States
| |
Collapse
|
4
|
Liu C, Kenney T, Beiko RG, Gu H. The Community Coevolution Model with Application to the Study of Evolutionary Relationships between Genes based on Phylogenetic Profiles. Syst Biol 2022:6651862. [PMID: 35904761 DOI: 10.1093/sysbio/syac052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
Organismal traits can evolve in a coordinated way, with correlated patterns of gains and losses reflecting important evolutionary associations. Discovering these associations can reveal important information about the functional and ecological linkages among traits. Phylogenetic profiles treat individual genes as traits distributed across sets of genomes and can provide a fine-grained view of the genetic underpinnings of evolutionary processes in a set of genomes. Phylogenetic profiling has been used to identify genes that are functionally linked, and to identify common patterns of lateral gene transfer in microorganisms. However, comparative analysis of phylogenetic profiles and other trait distributions should take into account the phylogenetic relationships among the organisms under consideration. Here we propose the Community Coevolution Model (CCM), a new coevolutionary model to analyze the evolutionary associations among traits, with a focus on phylogenetic profiles. In the CCM, traits are considered to evolve as a community with interactions, and the transition rate for each trait depends on the current states of other traits. Surpassing other comparative methods for pairwise trait analysis, CCM has the additional advantage of being able to examine multiple traits as a community to reveal more dependency relationships. We also develop a simulation procedure to generate phylogenetic profiles with correlated evolutionary patterns that can be used as benchmark data for evaluation purposes. A simulation study demonstrates that CCM is more accurate than other methods including the Jaccard Index and three tree-aware methods. The parameterization of CCM makes the interpretation of the relations between genes more direct, which leads to Darwin's scenario being identified easily based on the estimated parameters. We show that CCM is more efficient and fits real data better than other methods resulting in higher likelihood scores with fewer parameters. An examination of 3786 phylogenetic profiles across a set of 659 bacterial genomes highlights linkages between genes with common functions, including many patterns that would not have been identified under a non-phylogenetic model of common distribution. We also applied the CCM to 44 proteins in the well-studied Mitochondrial Respiratory Complex I and recovered associations that mapped well onto the structural associations that exist in the complex.
Collapse
Affiliation(s)
- Chaoyue Liu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada.,Faculty of Computer Science, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada
| |
Collapse
|
5
|
Dvorak P, Leupen S, Soucek P. Functionally Significant Features in the 5' Untranslated Region of the ABCA1 Gene and Their Comparison in Vertebrates. Cells 2019; 8:cells8060623. [PMID: 31234415 PMCID: PMC6627321 DOI: 10.3390/cells8060623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 06/17/2019] [Accepted: 06/19/2019] [Indexed: 02/07/2023] Open
Abstract
Single nucleotide polymorphisms located in 5′ untranslated regions (5′UTRs) can regulate gene expression and have clinical impact. Recognition of functionally significant sequences within 5′UTRs is crucial in next-generation sequencing applications. Furthermore, information about the behavior of 5′UTRs during gene evolution is scarce. Using the example of the ATP-binding cassette transporter A1 (ABCA1) gene (Tangier disease), we describe our algorithm for functionally significant sequence finding. 5′UTR features (upstream start and stop codons, open reading frames (ORFs), GC content, motifs, and secondary structures) were studied using freely available bioinformatics tools in 55 vertebrate orthologous genes obtained from Ensembl and UCSC. The most conserved sequences were suggested as hot spots. Exon and intron enhancers and silencers (sc35, ighg2 cgamma2, ctnt, gh-1, and fibronectin eda exon), transcription factors (TFIIA, TATA, NFAT1, NFAT4, and HOXA13), some of them cancer related, and microRNA (hsa-miR-4474-3p) were localized to these regions. An upstream ORF, overlapping with the main ORF in primates and possibly coding for a small bioactive peptide, was also detected. Moreover, we showed several features of 5′UTRs, such as GC content variation, hairpin structure conservation or 5′UTR segmentation, which are interesting from a phylogenetic point of view and can stimulate further evolutionary oriented research.
Collapse
Affiliation(s)
- Pavel Dvorak
- Department of Biology, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300 Pilsen, Czech Republic.
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300 Pilsen, Czech Republic.
| | - Sarah Leupen
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
| | - Pavel Soucek
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300 Pilsen, Czech Republic.
- Toxicogenomics Unit, National Institute of Public Health, Srobarova 48, 100 42 Prague 10, Czech Republic.
| |
Collapse
|
6
|
Alotaibi H, Yaman E, Salvatore D, Di Dato V, Telkoparan P, Di Lauro R, Tazebay UH. Intronic elements in the Na+/I- symporter gene (NIS) interact with retinoic acid receptors and mediate initiation of transcription. Nucleic Acids Res 2010; 38:3172-85. [PMID: 20123735 PMCID: PMC2879507 DOI: 10.1093/nar/gkq023] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Activity of the sodium/iodide symporter (NIS) in lactating breast is essential for iodide (I(-)) accumulation in milk. Significant NIS upregulation was also reported in breast cancer, indicating a potential use of radioiodide treatment. All-trans-retinoic acid (tRA) is a potent ligand that enhances NIS expression in a subset of breast cancer cell lines and in experimental breast cancer models. Indirect tRA stimulation of NIS in breast cancer cells is very well documented; however, direct upregulation by tRA-activated nuclear receptors has not been identified yet. Aiming to uncover cis-acting elements directly regulating NIS expression, we screened evolutionary-conserved non-coding genomic sequences for responsiveness to tRA in MCF-7. Here, we report that a potent enhancer in the first intron of NIS mediates direct regulation by tRA-stimulated nuclear receptors. In vitro as well as in vivo DNA-protein interaction assays revealed direct association between retinoic acid receptor-alpha (RARalpha) and retinoid-X-receptor (RXR) with this enhancer. Moreover, using chromatin immunoprecipitation (ChIP) we uncovered early events of NIS transcription in response to tRA, which require the interaction of several novel intronic tRA responsive elements. These findings indicate a complex interplay between nuclear receptors, RNA Pol-II and multiple intronic RAREs in NIS gene, and they establish a novel mechanistic model for tRA-induced gene transcription.
Collapse
Affiliation(s)
- Hani Alotaibi
- Department of Molecular Biology and Genetics, Bilkent University, 06800 Bilkent, Ankara, Turkey
| | | | | | | | | | | | | |
Collapse
|
7
|
Lee J, Li Z, Brower-Sinning R, John B. Regulatory circuit of human microRNA biogenesis. PLoS Comput Biol 2007; 3:e67. [PMID: 17447837 PMCID: PMC1853126 DOI: 10.1371/journal.pcbi.0030067] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2006] [Accepted: 02/27/2007] [Indexed: 01/07/2023] Open
Abstract
miRNAs (microRNAs) are a class of endogenous small RNAs that are thought to negatively regulate protein production. Aberrant expression of many miRNAs is linked to cancer and other diseases. Little is known about the factors that regulate the expression of miRNAs. We have identified numerous regulatory elements upstream of miRNA genes that are likely to be essential to the transcriptional and posttranscriptional regulation of miRNAs. Newly identified regulatory motifs occur frequently and in multiple copies upstream of miRNAs. The motifs are highly enriched in G and C nucleotides, in comparison with the nucleotide composition of miRNA upstream sequences. Although the motifs were predicted using sequences that are upstream of miRNAs, we find that 99% of the top-predicted motifs preferentially occur within the first 500 nucleotides upstream of the transcription start sites of protein-coding genes; the observed preference in location underscores the validity and importance of the motifs identified in this study. Our study also raises the possibility that a considerable number of well-characterized, disease-associated transcription factors (TFs) of protein-coding genes contribute to the abnormal miRNA expression in diseases such as cancer. Further analysis of predicted miRNA–protein interactions lead us to hypothesize that TFs that include c-Myb, NF-Y, Sp-1, MTF-1, and AP-2α are master-regulators of miRNA expression. Our predictions are a solid starting point for the systematic elucidation of the causative basis for aberrant expression patterns of disease-related (e.g., cancer) miRNAs. Thus, we point out that focused studies of the TFs that regulate miRNAs will be paramount in developing cures for miRNA-related diseases. The identification of the miRNA regulatory motifs was facilitated by a new computational method, K-Factor. K-Factor predicts regulatory motifs in a set of functionally related sequences, without relying on evolutionary conservation. microRNAs (miRNAs) are unusually small RNAs that are thought to control the production of proteins in the cell. Recent studies have linked miRNAs to several types of cancers. Several studies strongly suggest that miRNAs could be useful as diagnostic and prognostic markers of various cancers. Thus, although miRNAs appear to have opened up a new chapter in cancer biology, the fundamental question regarding why miRNAs are strongly associated with diseases such as cancer remain unclear. Here, we endeavored to systematically identify the factors that regulate miRNA biogenesis. We first identified a large number of DNA sequence elements that are characteristic of miRNA genes, using a new computational method named K-Factor. The sequence elements were then used to match known protein binding sites to identify specific proteins (transcription factors (TF)) that regulate miRNA biogenesis. Based on our observations, we put forward the hypothesis that a number of known TFs are primarily responsible for the aberrant regulation of miRNAs in cancer and other diseases.
Collapse
Affiliation(s)
- Ji Lee
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
- Department of Bioengineering, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Zhihua Li
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Rachel Brower-Sinning
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Bino John
- Department of Computational Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
8
|
Abstract
The present review considered: (a) the factors that conditioned the early transition from non-life to life; (b) genome structure and complexity in prokaryotes, eukaryotes, and organelles; (c) comparative human chromosome genomics; and (d) the Brazilian contribution to some of these studies. Understanding the dialectical conflict between freedom and organization is fundamental to give meaning to the patterns and processes of organic evolution.
Collapse
Affiliation(s)
- Francisco M Salzano
- Departamento de Genética, Instituto de Biociências, Universidade Federal do Rio Grande do Sul, Caixa Postal 15053, 91501-970 Porto Alegre, RS, Brazil.
| |
Collapse
|
9
|
Fujii Y, Itoh T, Sakate R, Koyanagi KO, Matsuya A, Habara T, Yamaguchi K, Kaneko Y, Gojobori T, Imanishi T. A web tool for comparative genomics: G-compass. Gene 2005; 364:45-52. [PMID: 16169162 DOI: 10.1016/j.gene.2005.05.043] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Revised: 05/09/2005] [Accepted: 05/30/2005] [Indexed: 11/22/2022]
Abstract
In order to assist the progression of comparative genomics, we have developed a new web-based tool, named G-compass, for browsing and analysis of genome alignments. G-compass utilizes 829,311 pieces of genome alignments between human and mouse that were originally produced for this tool. The quality of the genome alignment set was evaluated by using several statistics. As a result, the alignment set is found to cover approximately 17% of the human genome and 82% of the annotated exons. The averages of nucleotide sequence identity and sequence length are 71.2% and 673.6 bp, respectively. In comparison with public data, it appeared that our data is more expansive and possesses greater genome coverage. G-compass incorporates unique functions such as window analysis of individual alignments. Furthermore, with G-compass and the joint help of H-InvDB, we were able to find highly conserved genomic segments and a human specific antisense transcript candidate, demonstrating that G-compass is useful for facilitating biological discoveries. G-compass is publicly accessible on the WWW at http://www.jbirc.aist.go.jp/g-compass/.
Collapse
Affiliation(s)
- Yasuyuki Fujii
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research Building 7F, 2-42, Aomi, Koto-ku, Tokyo, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Bininda-Emonds ORP. transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinformatics 2005; 6:156. [PMID: 15969769 PMCID: PMC1175081 DOI: 10.1186/1471-2105-6-156] [Citation(s) in RCA: 155] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2005] [Accepted: 06/22/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets. RESULTS transAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences. CONCLUSION transAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").
Collapse
Affiliation(s)
- Olaf R P Bininda-Emonds
- Lehrstuhl für Tierzucht, Technical University of Munich, Hochfeldweg 1, 85354 Freising-Weihenstephan, Germany.
| |
Collapse
|
11
|
Caicedo AL, Purugganan MD. Comparative plant genomics. Frontiers and prospects. PLANT PHYSIOLOGY 2005; 138:545-7. [PMID: 15955910 PMCID: PMC1150366 DOI: 10.1104/pp.104.900148] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Affiliation(s)
- Ana L Caicedo
- Department of Genetics, North Carolina State University, Raleigh, North Carolina 27695, USA
| | | |
Collapse
|
12
|
Haubold B, Pierstorff N, Möller F, Wiehe T. Genome comparison without alignment using shortest unique substrings. BMC Bioinformatics 2005; 6:123. [PMID: 15910684 PMCID: PMC1166540 DOI: 10.1186/1471-2105-6-123] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2004] [Accepted: 05/23/2005] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. RESULTS We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We derive an analytical expression for the null distribution of shortest unique substrings, given the GC-content of the query sequences. Furthermore, we apply our method to rapidly detect unique genomic regions in the genome of Staphylococcus aureus strain MSSA476 compared to four other staphylococcal genomes. CONCLUSION We combine a method to rapidly search for shortest unique substrings in DNA sequences and a derivation of their null distribution. We show that unique regions in an arbitrary sample of genomes can be efficiently detected with this method. The corresponding programs shustring (SHortest Unique subSTRING) and shulen are written in C and available at http://adenine.biz.fh-weihenstephan.de/shustring/.
Collapse
Affiliation(s)
- Bernhard Haubold
- Department of Biotechnology & Bioinformatics, University of Applied Sciences, Weihenstephan, Germany
| | - Nora Pierstorff
- Institute of Genetics, Universität zu Köln, Zülpicher Straße 47, 50674 Köln, Germany
| | - Friedrich Möller
- Berlin Center for Genome Based Bioinformatics and Freie Universität, Berlin, Germany
| | - Thomas Wiehe
- Institute of Genetics, Universität zu Köln, Zülpicher Straße 47, 50674 Köln, Germany
| |
Collapse
|
13
|
Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 2005; 33:1141-53. [PMID: 15728743 PMCID: PMC549432 DOI: 10.1093/nar/gki242] [Citation(s) in RCA: 299] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2004] [Revised: 01/10/2005] [Accepted: 01/23/2005] [Indexed: 12/21/2022] Open
Abstract
Among bacteria, many species have synonymous codon usage patterns that have been influenced by natural selection for those codons that are translated more accurately and/or efficiently. However, in other species selection appears to have been ineffective. Here, we introduce a population genetics-based model for quantifying the extent to which selection has been effective. The approach is applied to 80 phylogenetically diverse bacterial species for which whole genome sequences are available. The strength of selected codon usage bias, S, is found to vary substantially among species; in 30% of the genomes examined, there was no significant evidence that selection had been effective. Values of S are highly positively correlated with both the number of rRNA operons and the number of tRNA genes. These results are consistent with the hypothesis that species exposed to selection for rapid growth have more rRNA operons, more tRNA genes and more strongly selected codon usage bias. For example, Clostridium perfringens, the species with the highest value of S, can have a generation time as short as 7 min.
Collapse
Affiliation(s)
- Paul M Sharp
- Institute of Genetics, University of Nottingham, Queens Medical Centre, Nottingham NG7 2UH, UK.
| | | | | | | | | |
Collapse
|