1
|
Sidorczuk K, Mackiewicz P, Pietluch F, Gagat P. Characterization of signal and transit peptides based on motif composition and taxon-specific patterns. Sci Rep 2023; 13:15751. [PMID: 37735485 PMCID: PMC10514287 DOI: 10.1038/s41598-023-42987-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 09/17/2023] [Indexed: 09/23/2023] Open
Abstract
Targeting peptides or presequences are N-terminal extensions of proteins that encode information about their cellular localization. They include signal peptides (SP), which target proteins to the endoplasmic reticulum, and transit peptides (TP) directing proteins to the organelles of endosymbiotic origin: chloroplasts and mitochondria. TPs were hypothesized to have evolved from antimicrobial peptides (AMPs), which are responsible for the host defence against microorganisms, including bacteria, fungi and viruses. In this study, we performed comprehensive bioinformatic analyses of amino acid motifs of targeting peptides and AMPs using a curated set of experimentally verified proteins. We identified motifs frequently occurring in each type of presequence showing specific patterns associated with their amino acid composition, and investigated their position within the presequence. We also compared motif patterns among different taxonomic groups and identified taxon-specific features, providing some evolutionary insights. Considering the functional relevance and many practical applications of targeting peptides and AMPs, we believe that our analyses will prove useful for their design, and better understanding of protein import mechanism and presequence evolution.
Collapse
Affiliation(s)
- Katarzyna Sidorczuk
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Paweł Mackiewicz
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Filip Pietluch
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland
| | - Przemysław Gagat
- Department of Bioinformatics and Genomics, Faculty of Biotechnology, University of Wrocław, Wrocław, Poland.
| |
Collapse
|
2
|
Christian RW, Hewitt SL, Nelson G, Roalson EH, Dhingra A. Plastid transit peptides-where do they come from and where do they all belong? Multi-genome and pan-genomic assessment of chloroplast transit peptide evolution. PeerJ 2020; 8:e9772. [PMID: 32913678 PMCID: PMC7456531 DOI: 10.7717/peerj.9772] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 07/30/2020] [Indexed: 01/22/2023] Open
Abstract
Subcellular relocalization of proteins determines an organism's metabolic repertoire and thereby its survival in unique evolutionary niches. In plants, the plastid and its various morphotypes import a large and varied number of nuclear-encoded proteins to orchestrate vital biochemical reactions in a spatiotemporal context. Recent comparative genomics analysis and high-throughput shotgun proteomics data indicate that there are a large number of plastid-targeted proteins that are either semi-conserved or non-conserved across different lineages. This implies that homologs are differentially targeted across different species, which is feasible only if proteins have gained or lost plastid targeting peptides during evolution. In this study, a broad, multi-genome analysis of 15 phylogenetically diverse genera and in-depth analyses of pangenomes from Arabidopsis and Brachypodium were performed to address the question of how proteins acquire or lose plastid targeting peptides. The analysis revealed that random insertions or deletions were the dominant mechanism by which novel transit peptides are gained by proteins. While gene duplication was not a strict requirement for the acquisition of novel subcellular targeting, 40% of novel plastid-targeted genes were found to be most closely related to a sequence within the same genome, and of these, 30.5% resulted from alternative transcription or translation initiation sites. Interestingly, analysis of the distribution of amino acids in the transit peptides of known and predicted chloroplast-targeted proteins revealed monocot and eudicot-specific preferences in residue distribution.
Collapse
Affiliation(s)
- Ryan W. Christian
- Molecular Plant Sciences, Washington State University, Pullman, WA, USA
| | - Seanna L. Hewitt
- Molecular Plant Sciences, Washington State University, Pullman, WA, USA
| | - Grant Nelson
- Molecular Plant Sciences, Washington State University, Pullman, WA, USA
| | - Eric H. Roalson
- Molecular Plant Sciences, Washington State University, Pullman, WA, USA
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | - Amit Dhingra
- Molecular Plant Sciences, Washington State University, Pullman, WA, USA
- Department of Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
3
|
Brain-related genes are specifically enriched with long phase 1 introns. PLoS One 2020; 15:e0233978. [PMID: 32470086 PMCID: PMC7259759 DOI: 10.1371/journal.pone.0233978] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 05/16/2020] [Indexed: 11/19/2022] Open
Abstract
Intronic gene regions are mostly considered in the scope of gene expression regulation, such as alternative splicing. However, relations between basic statistical properties of introns are much rarely studied in detail, despite vast available data. Particularly, little is known regarding the relationship between the intron length and the intron phase. Intron phase distribution is significantly different at different intron length thresholds. In this study, we performed GO enrichment analysis of gene sets with a particular intron phase at varying intron length thresholds using a list of 13823 orthologous human-mouse gene pairs. We found a specific group of 153 genes with phase 1 introns longer than 50 kilobases that were specifically expressed in brain, functionally related to synaptic signaling, and strongly associated with schizophrenia and other mental disorders. We propose that the prevalence of long phase 1 introns arises from the presence of the signal peptide sequence and is connected with 1–1 exon shuffling.
Collapse
|
4
|
Wang X, Gao B, Zhu S. Exon Shuffling and Origin of Scorpion Venom Biodiversity. Toxins (Basel) 2016; 9:toxins9010010. [PMID: 28035955 PMCID: PMC5308243 DOI: 10.3390/toxins9010010] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 12/13/2016] [Accepted: 12/21/2016] [Indexed: 12/01/2022] Open
Abstract
Scorpion venom is a complex combinatorial library of peptides and proteins with multiple biological functions. A combination of transcriptomic and proteomic techniques has revealed its enormous molecular diversity, as identified by the presence of a large number of ion channel-targeted neurotoxins with different folds, membrane-active antimicrobial peptides, proteases, and protease inhibitors. Although the biodiversity of scorpion venom has long been known, how it arises remains unsolved. In this work, we analyzed the exon-intron structures of an array of scorpion venom protein-encoding genes and unexpectedly found that nearly all of these genes possess a phase-1 intron (one intron located between the first and second nucleotides of a codon) near the cleavage site of a signal sequence despite their mature peptides remarkably differ. This observation matches a theory of exon shuffling in the origin of new genes and suggests that recruitment of different folds into scorpion venom might be achieved via shuffling between body protein-coding genes and ancestral venom gland-specific genes that presumably contributed tissue-specific regulatory elements and secretory signal sequences.
Collapse
Affiliation(s)
- Xueli Wang
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects & Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.
| | - Bin Gao
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects & Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.
| | - Shunyi Zhu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects & Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.
| |
Collapse
|
5
|
Zhou K, Kuo A, Grigoriev IV. Reverse transcriptase and intron number evolution. Stem Cell Investig 2014; 1:17. [PMID: 27358863 DOI: 10.3978/j.issn.2306-9759.2014.08.01] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Accepted: 08/04/2014] [Indexed: 11/14/2022]
Abstract
BACKGROUND Introns are universal in eukaryotic genomes and play important roles in transcriptional regulation, mRNA export to the cytoplasm, nonsense-mediated decay as both a regulatory and a splicing quality control mechanism, R-loop avoidance, alternative splicing, chromatin structure, and evolution by exon-shuffling. METHODS Sixteen complete fungal genomes were used 13 of which were sequenced and annotated by JGI. Ustilago maydis, Cryptococcus neoformans, and Coprinus cinereus (also named Coprinopsis cinerea) were from the Broad Institute. Gene models from JGI-annotated genomes were taken from the GeneCatalog track that contained the best representative gene models. Varying fractions of the GeneCatalog were manually curated by external users. For clarity, we used the JGI unique database identifier. RESULTS The last common ancestor of eukaryotes (LECA) has an estimated 6.4 coding exons per gene (EPG) and evolved into the diverse eukaryotic life forms, which is recapitulated by the development of a stem cell. We found a parallel between the simulated reverse transcriptase (RT)-mediated intron loss and the comparative analysis of 16 fungal genomes that spanned a wide range of intron density. Although footprints of RT (RTF) were dynamic, relative intron location (RIL) to the 5'-end of mRNA faithfully traced RT-mediated intron loss and revealed 7.7 EPG for LECA. The mode of exon length distribution was conserved in simulated intron loss, which was exemplified by the shared mode of 75 nt between fungal and Chlamydomonas genomes. The dominant ancient exon length was corroborated by the average exon length of the most intron-rich genes in fungal genomes and consistent with ancient protein modules being ~25 aa. Combined with the conservation of a protein length of 400 aa, the earliest ancestor of eukaryotes could have 16 EPG. During earlier evolution, Ascomycota's ancestor had significantly more 3'-biased RT-mediated intron loss that was followed by dramatic RTF loss. There was a down trend of EPG from more conserved to less conserved genes. Moreover, species-specific genes have higher exon-densities, shorter exons, and longer introns when compared to genes conserved at the phylum level. However, intron length in species-specific genes became shorter than that of genes conserved in all species after genomes experiencing drastic intron loss. The estimated EPG from the most frequent exon length is more than double that from the RIL method. CONCLUSIONS This implies significant intron loss during the very early period of eukaryotic evolution. De novo gene-birth contributes to shorter exons, longer introns, and higher exon-density in species-specific genes relative to conserved genes.
Collapse
Affiliation(s)
- Kemin Zhou
- 1 Computational Genomics, Bristol-Myers Squibb, 311 Pennington Rocky Hill Road, Pennington, NJ 08534, USA ; 2 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Alan Kuo
- 1 Computational Genomics, Bristol-Myers Squibb, 311 Pennington Rocky Hill Road, Pennington, NJ 08534, USA ; 2 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| | - Igor V Grigoriev
- 1 Computational Genomics, Bristol-Myers Squibb, 311 Pennington Rocky Hill Road, Pennington, NJ 08534, USA ; 2 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA
| |
Collapse
|
6
|
Zhu S, Peigneur S, Gao B, Umetsu Y, Ohki S, Tytgat J. Experimental conversion of a defensin into a neurotoxin: implications for origin of toxic function. Mol Biol Evol 2014; 31:546-59. [PMID: 24425781 DOI: 10.1093/molbev/msu038] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Scorpion K(+) channel toxins and insect defensins share a conserved three-dimensional structure and related biological activities (defense against competitors or invasive microbes by disrupting their membrane functions), which provides an ideal system to study how functional evolution occurs in a conserved structural scaffold. Using an experimental approach, we show that the deletion of a small loop of a parasitoid venom defensin possessing the "scorpion toxin signature" (STS) can remove steric hindrance of peptide-channel interactions and result in a neurotoxin selectively inhibiting K(+) channels with high affinities. This insect defensin-derived toxin adopts a hallmark scorpion toxin fold with a common cysteine-stabilized α-helical and β-sheet motif, as determined by nuclear magnetic resonance analysis. Mutations of two key residues located in STS completely diminish or significantly decrease the affinity of the toxin on the channels, demonstrating that this toxin binds to K(+) channels in the same manner as scorpion toxins. Taken together, these results provide new structural and functional evidence supporting the predictability of toxin evolution. The experimental strategy is the first employed to establish an evolutionary relationship of two distantly related protein families.
Collapse
Affiliation(s)
- Shunyi Zhu
- Group of Animal Innate Immunity, State Key Laboratory of Integrated Management of Pest Insects & Rodents, Institute of Zoology, Chinese Academy of Sciences, Chaoyang District, Beijing, China
| | | | | | | | | | | |
Collapse
|
7
|
Zhu S, Gao B. Evolutionary origin of β-defensins. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2013; 39:79-84. [PMID: 22369779 DOI: 10.1016/j.dci.2012.02.011] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2012] [Revised: 02/18/2012] [Accepted: 02/18/2012] [Indexed: 05/31/2023]
Abstract
β-Defensins are a group of vertebrate-specific antimicrobial peptides (AMPs) with microbicidal and immune regulatory functions. In spite of their conservation across the vertebrate lineage ranging from bony fish to human, the evolutionary origin of these molecules remains unsolved. We addressed this issue by comparing three-dimensional (3D) structure and genomic organization of β-defensins with those of big defensins, a family of invertebrate-derived β-defensin-related peptides with two distinct structural and functional domains. β-Defensins and the carboxyl-terminal domain of big defensins adopt a conserved β-sheet topology stabilized by three identical disulfide bridges. Genomic organization analysis revealed that the defensin domain of these two classes of molecules is encoded by a single exon with a positionally conserved phase-1 intron in its upstream. The genomic and 3D structural conservation provides convincing evidence for their evolutionary relationship, in which β-defensins emerged from an ancestral big defensin through exon shuffling or intronization of exonic sequences. The phylogenetic distribution of big defensins in Arthropoda, Mollusca and Cephalochordata suggests an early origin of the β-defensin domain, which can be traced to the common ancestor of bilateral metazoans.
Collapse
Affiliation(s)
- Shunyi Zhu
- Group of Animal Innate Immunity, State Key Laboratory of Integrated Management of Pest Insects & Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, 100101 Beijing, China.
| | | |
Collapse
|
8
|
Abstract
FGLamide allatostatins are invertebrate neuropeptides which inhibit juvenile hormone biosynthesis in Dictyoptera and related orders and also show myomodulatory activity. The FGLamide allatostatin (AST) gene structure in Dictyoptera is intronless within the ORF, whereas in 9 species of Diptera, the FGLamide AST ORF has one intron. To investigate the evolutionary history of AST intron structure, (intron early versus intron late hypothesis), all available Arthropoda FGLamide AST gene sequences were examined from genome databases with reference to intron presence and position/phase. Three types of FGLamide AST ORF organization were found: intronless in I. scapularis and P. humanus corporis; one intron in D. pulex, A. pisum, A. mellifera and five Drosophila sp.; two introns in N. vitripennis, B. mori strains, A. aegypti, A. gambiae and C. quinquefasciatus. The literature suggests that for the majority of genes examined, most introns exist between codons (phase 0) which may reflect an ancient function of introns to separate protein modules. 60% of the FGLamide AST ORFs introns were between the first and second base within a codon (phase 1), 28% were between the second and third nucleotides within a codon (phase two) and 12% were phase 0. As would be required for correct intron splicing consensus sequence, 84% of introns were in codons starting with guanine. The positioning of introns was a maximum of 9 codons from a dibasic cleavage site. Our results suggest that the introns in the analyzed species support the intron late model.
Collapse
|
9
|
Abstract
Although introns were first discovered almost 30 years ago, their evolutionary origin remains elusive. In this work, we used multispecies whole-genome alignments to map Drosophila melanogaster introns onto 10 other fully sequenced Drosophila genomes. We were able to find 1,944 sites where an intron was missing in one or more species. We show that for most (>80%) of these cases, there is no leftover intronic sequence or any missing exonic sequence, indicating exact intron loss or gain events. We used parsimony to classify these differences as 1,754 intron loss events and 213 gain events. We show that lost and gained introns are significantly shorter than average and flanked by longer than average exons. They also display quite distinct phase distributions and show greater than average similarity between the 5' splice site and its 3' partner splice site. Introns that have been lost in one or more species evolve faster than other introns, occur in slowly evolving genes, and are found adjacent to each other more often than would be expected for independent single losses. Our results support the cDNA recombination mechanism of intron loss, suggest that selective pressures affect site-specific loss rates, and show conclusively that intron gain has occurred within the Drosophila lineage, solidifying the "introns-middle" hypothesis and providing some hints about the gain mechanism.
Collapse
|
10
|
Nielsen H, Wernersson R. An overabundance of phase 0 introns immediately after the start codon in eukaryotic genes. BMC Genomics 2006; 7:256. [PMID: 17034638 PMCID: PMC1626468 DOI: 10.1186/1471-2164-7-256] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Accepted: 10/11/2006] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND A knowledge of the positions of introns in eukaryotic genes is important for understanding the evolution of introns. Despite this, there has been relatively little focus on the distribution of intron positions in genes. RESULTS In proteins with signal peptides, there is an overabundance of phase 1 introns around the region of the signal peptide cleavage site. This has been described before. But in proteins without signal peptides, a novel phenomenon is observed: There is a sharp peak of phase 0 intron positions immediately following the start codon, i.e. between codons 1 and 2. This effect is seen in a wide range of eukaryotes: Vertebrates, arthropods, fungi, and flowering plants. Proteins carrying this start codon intron are found to comprise a special class of relatively short, lysine-rich and conserved proteins with an overrepresentation of ribosomal proteins. In addition, there is a peak of phase 0 introns at position 5 in Drosophila genes with signal peptides, predominantly representing cuticle proteins. CONCLUSION There is an overabundance of phase 0 introns immediately after the start codon in eukaryotic genes, which has been described before only for human ribosomal proteins. We give a detailed description of these start codon introns and the proteins that contain them.
Collapse
Affiliation(s)
- Henrik Nielsen
- Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| | - Rasmus Wernersson
- Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| |
Collapse
|