1
|
Patthy L. Exon Shuffling Played a Decisive Role in the Evolution of the Genetic Toolkit for the Multicellular Body Plan of Metazoa. Genes (Basel) 2021; 12:382. [PMID: 33800339 PMCID: PMC8001218 DOI: 10.3390/genes12030382] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 03/01/2021] [Accepted: 03/04/2021] [Indexed: 11/30/2022] Open
Abstract
Division of labor and establishment of the spatial pattern of different cell types of multicellular organisms require cell type-specific transcription factor modules that control cellular phenotypes and proteins that mediate the interactions of cells with other cells. Recent studies indicate that, although constituent protein domains of numerous components of the genetic toolkit of the multicellular body plan of Metazoa were present in the unicellular ancestor of animals, the repertoire of multidomain proteins that are indispensable for the arrangement of distinct body parts in a reproducible manner evolved only in Metazoa. We have shown that the majority of the multidomain proteins involved in cell-cell and cell-matrix interactions of Metazoa have been assembled by exon shuffling, but there is no evidence for a similar role of exon shuffling in the evolution of proteins of metazoan transcription factor modules. A possible explanation for this difference in the intracellular and intercellular toolkits is that evolution of the transcription factor modules preceded the burst of exon shuffling that led to the creation of the proteins controlling spatial patterning in Metazoa. This explanation is in harmony with the temporal-to-spatial transition hypothesis of multicellularity that proposes that cell differentiation may have predated spatial segregation of cell types in animal ancestors.
Collapse
Affiliation(s)
- Laszlo Patthy
- Institute of Enzymology, Research Centre for Natural Sciences, H-1117 Budapest, Hungary
| |
Collapse
|
2
|
Varga J, Dobson L, Tusnády GE. TOPDOM: database of conservatively located domains and motifs in proteins. Bioinformatics 2016; 32:2725-6. [PMID: 27153630 PMCID: PMC5013901 DOI: 10.1093/bioinformatics/btw193] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 04/04/2016] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The TOPDOM database-originally created as a collection of domains and motifs located consistently on the same side of the membranes in α-helical transmembrane proteins-has been updated and extended by taking into consideration consistently localized domains and motifs in globular proteins, too. By taking advantage of the recently developed CCTOP algorithm to determine the type of a protein and predict topology in case of transmembrane proteins, and by applying a thorough search for domains and motifs as well as utilizing the most up-to-date version of all source databases, we managed to reach a 6-fold increase in the size of the whole database and a 2-fold increase in the number of transmembrane proteins. AVAILABILITY AND IMPLEMENTATION TOPDOM database is available at http://topdom.enzim.hu The webpage utilizes the common Apache, PHP5 and MySQL software to provide the user interface for accessing and searching the database. The database itself is generated on a high performance computer. CONTACT tusnady.gabor@ttk.mta.hu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Julia Varga
- 'Momentum' Membrane Protein Bioinformatics Research Group, Institute of Enzymology, RCNS, HAS, Budapest H-1518, Hungary
| | - László Dobson
- 'Momentum' Membrane Protein Bioinformatics Research Group, Institute of Enzymology, RCNS, HAS, Budapest H-1518, Hungary
| | - Gábor E Tusnády
- 'Momentum' Membrane Protein Bioinformatics Research Group, Institute of Enzymology, RCNS, HAS, Budapest H-1518, Hungary
| |
Collapse
|
3
|
Basu MK, Poliakov E, Rogozin IB. Domain mobility in proteins: functional and evolutionary implications. Brief Bioinform 2009; 10:205-16. [PMID: 19151098 DOI: 10.1093/bib/bbn057] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or 'promiscuous'). These promiscuous domains are typically involved in protein-protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
Collapse
Affiliation(s)
- Malay Kumar Basu
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA.
| | | | | |
Collapse
|
4
|
Baertsch R, Diekhans M, Kent WJ, Haussler D, Brosius J. Retrocopy contributions to the evolution of the human genome. BMC Genomics 2008; 9:466. [PMID: 18842134 PMCID: PMC2584115 DOI: 10.1186/1471-2164-9-466] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Accepted: 10/08/2008] [Indexed: 02/06/2023] Open
Abstract
Background Evolution via point mutations is a relatively slow process and is unlikely to completely explain the differences between primates and other mammals. By contrast, 45% of the human genome is composed of retroposed elements, many of which were inserted in the primate lineage. A subset of retroposed mRNAs (retrocopies) shows strong evidence of expression in primates, often yielding functional retrogenes. Results To identify and analyze the relatively recently evolved retrogenes, we carried out BLASTZ alignments of all human mRNAs against the human genome and scored a set of features indicative of retroposition. Of over 12,000 putative retrocopy-derived genes that arose mainly in the primate lineage, 726 with strong evidence of transcript expression were examined in detail. These mRNA retroposition events fall into three categories: I) 34 retrocopies and antisense retrocopies that added potential protein coding space and UTRs to existing genes; II) 682 complete retrocopy duplications inserted into new loci; and III) an unexpected set of 13 retrocopies that contributed out-of-frame, or antisense sequences in combination with other types of transposed elements (SINEs, LINEs, LTRs), even unannotated sequence to form potentially novel genes with no homologs outside primates. In addition to their presence in human, several of the gene candidates also had potentially viable ORFs in chimpanzee, orangutan, and rhesus macaque, underscoring their potential of function. Conclusion mRNA-derived retrocopies provide raw material for the evolution of genes in a wide variety of ways, duplicating and amending the protein coding region of existing genes as well as generating the potential for new protein coding space, or non-protein coding RNAs, by unexpected contributions out of frame, in reverse orientation, or from previously non-protein coding sequence.
Collapse
Affiliation(s)
- Robert Baertsch
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, California 95064, USA.
| | | | | | | | | |
Collapse
|
5
|
Herpin A, Lelong C, Becker T, Favrel P, Cunningham C. A tolloid homologue from the Pacific oyster Crassostrea gigas. Gene Expr Patterns 2007; 7:700-8. [PMID: 17433792 DOI: 10.1016/j.modgep.2007.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2007] [Revised: 02/23/2007] [Accepted: 03/01/2007] [Indexed: 10/23/2022]
Abstract
The genes governing mesoderm specification have been extensively studied in vertebrates, arthropods and nematodes. The latter two phyla belong to the Ecdysozoan clade but little is understood of the role that these genes might play in the development of the other major protostomal clade, the Lophotrochozoa. As part of a wider project to analyze the functions associated with transforming growth factor beta superfamily members in Lophotrochozoa, we have cloned a gene encoding a tolloid homologue from the bivalve mollusc Crassostrea gigas. Tolloid is a key developmental protein that regulates the activity of bone morphogenetic proteins (BMPs). We have determined the intron-exon structure of the gene encoding C. gigas tolloid and have compared it with those of homologous genes from both protostomes and deuterostomes. In order to analyze the functionality of oyster tolloid the zebrafish embryo has been employed as a reporter organism and we show that over-expression of this protein results in the ventralization of zebrafish embryos at 24h post fertilization. The expression of the C. gigas tolloid gene during embryonic and larval development as well as in adult tissues is also explored.
Collapse
Affiliation(s)
- Amaury Herpin
- Sars International Centre for Marine Molecular Biology, High Technology Centre, Bergen, Norway
| | | | | | | | | |
Collapse
|
6
|
de Roos ADG. Conserved intron positions in ancient protein modules. Biol Direct 2007; 2:7. [PMID: 17288589 PMCID: PMC1800838 DOI: 10.1186/1745-6150-2-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 02/08/2007] [Indexed: 12/31/2022] Open
Abstract
Background The timing of the origin of introns is of crucial importance for an understanding of early genome architecture. The Exon theory of genes proposed a role for introns in the formation of multi-exon proteins by exon shuffling and predicts the presence of conserved splice sites in ancient genes. In this study, large-scale analysis of potential conserved splice sites was performed using an intron-exon database (ExInt) derived from GenBank. Results A set of conserved intron positions was found by matching identical splice sites sequences from distantly-related eukaryotic kingdoms. Most amino acid sequences with conserved introns were homologous to consensus sequences of functional domains from conserved proteins including kinases, phosphatases, small GTPases, transporters and matrix proteins. These included ancient proteins that originated before the eukaryote-prokaryote split, for instance the catalytic domain of protein phosphatase 2A where a total of eleven conserved introns were found. Using an experimental setup in which the relation between a splice site and the ancientness of its surrounding sequence could be studied, it was found that the presence of an intron was positively correlated to the ancientness of its surrounding sequence. Intron phase conservation was linked to the conservation of the gene sequence and not to the splice site sequence itself. However, no apparent differences in phase distribution were found between introns in conserved versus non-conserved sequences. Conclusion The data confirm an origin of introns deep in the eukaryotic branch and is in concordance with the presence of introns in the first functional protein modules in an 'Exon theory of genes' scenario. A model is proposed in which shuffling of primordial short exonic sequences led to the formation of the first functional protein modules, in line with hypotheses that see the formation of introns integral to the origins of genome evolution. Reviewers This article was reviewed by Scott Roy (nominated by Anthony Poole), Sandro de Souza (nominated by Manyuan Long), and Gáspár Jékely.
Collapse
Affiliation(s)
- Albert D G de Roos
- Syncyte BioIntelligence, P.O. Box 600, 1000 AP, Amsterdam, The Netherlands.
| |
Collapse
|
7
|
Kim H, Sung S, Klein R. Expansion of symmetric exon-bordering domains does not explain evolution of lineage specific genes in mammals. Genetica 2006; 131:59-68. [PMID: 17082903 DOI: 10.1007/s10709-006-9113-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 09/26/2006] [Indexed: 10/24/2022]
Abstract
In order to examine the evolution of lineage specific genes, we analyzed intron phase distributions and exon-bordering domains in primate and rodent specific genes. We found that the expansion of symmetric exon-bordering domains could not explain the evolution of lineage specific genes. Rather internal intron loss of a domain can partially explain the excess of class 1-1 intron phases in the lineage specific genes. We suggest the event that led to excess of symmetric exons in lineage specific genes had little bearing on shaping the phenotypes specific to the individual lineage. Instead, Kruppel-associated box (KRAB) proteins associated with zinc finger C2H2 (zf-C2H2) type are likely to be responsible for the lineage specific function.
Collapse
Affiliation(s)
- Heebal Kim
- Laboratory of Bioinformtics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-742, Korea.
| | | | | |
Collapse
|
8
|
Chabasse C, Bailly X, Sanchez S, Rousselot M, Zal F. Gene structure and molecular phylogeny of the linker chains from the giant annelid hexagonal bilayer hemoglobins. J Mol Evol 2006; 63:365-74. [PMID: 16838215 DOI: 10.1007/s00239-005-0198-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2005] [Accepted: 03/31/2006] [Indexed: 10/24/2022]
Abstract
Giant extracellular hexagonal bilayer hemoglobin (HBL-Hb), found only in annelids, is an approximately 3500-kDa heteropolymeric structure involved in oxygen transport. The HBL-Hbs are comprised of globin and linker chains, the latter being required for the assembly of the quaternary structure. The linker chains, varying in size from 225 to 283 amino acids, have a conserved cysteine-rich domain within their N-terminal moiety that is homologous to the cysteine-rich modules constituting the ligand binding domain of the low-density lipoprotein receptor (LDLR) protein family found in many metazoans. We have investigated the gene structure of linkers from Arenicola marina, Alvinella pompejana, Nereis diversicolor, Lumbricus terrestris, and Riftia pachyptila. We found, contrary to the results obtained earlier with linker genes from N. diversicolor and L. terrestris, that in all of the foregoing cases, the linker LDL-A module is flanked by two phase 1 introns, as in the human LDLR gene, with two more introns in the 3' side whose positions varied with the species. In addition, we obtained 13 linker cDNAs that have been determined experimentally or found in the EST database LumbriBASE. A molecular phylogenetic analysis of the linker primary sequences demonstrated that they cluster into two distinct families of linker proteins. We propose that the common gene ancestor to annelid linker genes exhibited a four-intron and five-exon structure and gave rise to the two families subsequent to a duplication event.
Collapse
Affiliation(s)
- Christine Chabasse
- Equipe Ecophysiologie, Adaptation et Evolution Moléculaires, UPMC-CNRS UMR 7144, Station Biologique, BP 74, 29682, Roscoff cedex, France.
| | | | | | | | | |
Collapse
|
9
|
Mason TA, McIlroy PJ, Shain DH. Structural model of an antistasin/notch-like fusion protein from the cocoon wall of the aquatic leech, Theromyzon tessulatum. J Mol Model 2006; 12:829-34. [PMID: 16523290 DOI: 10.1007/s00894-006-0107-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2005] [Accepted: 01/11/2006] [Indexed: 11/26/2022]
Abstract
The aquatic leech, Theromyzon tessulatum, secretes a proteinaceous cocoon with extraordinary physical properties (e.g., proteolytic, thermal resiliency). The deduced amino acid sequence of a major protein (Tcp-Theromyzon cocoon protein) from the T. tessulatum cocoon wall has been used to model the endogenous structure of the Tcp protein. The Tcp protein sequence comprises six internal repeats, each containing 12 ordered Cys residues. Amino acid alignments suggest that the region Cys1-->6 is homologous to antistasin, a leech anticoagulant, and Cys7-->12 is homologous to an epidermal growth factor-like domain found in notch-class proteins, which play critical roles in development, signaling, and adhesion throughout the Animalia. Modeling of individual domains (i.e., antistasin and notch) positions multiple hydrophobic and charged residues on the surface. When the antistasin and notch domains were fused, hydrophobic pockets appeared that may facilitate a polymerization mechanism. Collectively, the predicted features of our Tcp model are consistent with the physical properties of the leech cocoon wall.
Collapse
Affiliation(s)
- Tarin A Mason
- Biology Department, The State University of New Jersey, 315 Penn Street, Rutgers, Camden, NJ 08102, USA
| | | | | |
Collapse
|
10
|
Benito-Gutiérrez E, Garcia-Fernàndez J, Comella JX. Origin and evolution of the Trk family of neurotrophic receptors. Mol Cell Neurosci 2005; 31:179-92. [PMID: 16253518 DOI: 10.1016/j.mcn.2005.09.007] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2005] [Revised: 08/11/2005] [Accepted: 09/08/2005] [Indexed: 01/19/2023] Open
Abstract
Among the numerous tyrosine kinase receptors, those belonging to the Trk family are distinctively involved in the development of complex traits within the vertebrate nervous system. Until recently, the lack of a proper Nt/Trk system in invertebrates has lead to the belief that they were a vertebrate innovation. Recent data, however, have challenged the field, and proved that bona fide Trk receptors do exist in invertebrates. Here, we review and discuss the evolutionary history of the Trk receptor family, and draw a comprehensive scenario that situates the origin of the Nt/Trk signalling prior to the origin of vertebrates. Probably, a ProtoTrk receptor was invented by means of domain and exon shuffling from pieces of ancient genes, generating the unique combination of domains found in extant Trk receptors. It is suggestive to propose that subtle protein mutations, gene duplications, and co-options in particular territories of a primitive Nt/Trk system were instrumental to the development of a complex vertebrate nervous system.
Collapse
Affiliation(s)
- Elia Benito-Gutiérrez
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Avinguda Diagonal 645, Barcelona E-08028, Spain.
| | | | | |
Collapse
|
11
|
Rádis-Baptista G, Kubo T, Oguiura N, Prieto da Silva ARB, Hayashi MAF, Oliveira EB, Yamane T. Identification of crotasin, a crotamine-related gene of Crotalus durissus terrificus. Toxicon 2004; 43:751-9. [PMID: 15284009 DOI: 10.1016/j.toxicon.2004.02.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2003] [Accepted: 02/25/2004] [Indexed: 11/16/2022]
Abstract
Crotamine is a cationic peptide (4.9 kDa, pI 9.5) of South American rattlesnake, Crotalus durissus terrificus' venom. Its presence varies according to the subspecies or the geographical locality of a given species. At the genomic level, we observed the presence of 1.8 kb gene, Crt-p1, in crotamine-positive specimens and its absence in crotamine-negative ones. In this work, we described a crotamine-related 2.5 kb gene, crotasin (Cts-p2), isolated from crotamine-negative specimens. Reverse transcription coupled to polymerase chain reaction indicates that Cts-p2 is abundantly expressed in several snake tissues, but scarcely expressed in the venom gland. The genome of crotamine-positive specimen contains both Crt-p1 and Cts-p2 genes. The present data suggest that both crotamine and crotasin have evolved by duplication of a common ancestor gene, and the conservation of their three disulfide bonds indicates that they might adopt the same fold as beta-defensin. The physiological function of the crotasin is not yet known.
Collapse
Affiliation(s)
- G Rádis-Baptista
- Molecular Toxinology Laboratory, Butantan Institute, Av. Vital Brazil 1500, São Paulo 05503-900, Brazil.
| | | | | | | | | | | | | |
Collapse
|
12
|
Brosius J. The contribution of RNAs and retroposition to evolutionary novelties. CONTEMPORARY ISSUES IN GENETICS AND EVOLUTION 2003. [DOI: 10.1007/978-94-010-0229-5_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
13
|
Overall CM. Molecular determinants of metalloproteinase substrate specificity: matrix metalloproteinase substrate binding domains, modules, and exosites. Mol Biotechnol 2002; 22:51-86. [PMID: 12353914 DOI: 10.1385/mb:22:1:051] [Citation(s) in RCA: 357] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The function of ancillary domains and modules attatched to the catalytic domain of mutidomain proteases, such as the matrix metalloproteinases (MMPs), are not well understood. The importance of discrete MMP substrate binding sites termed exosites on domains located outside the catalytic domain was first demonstrated for native collagenolysis. The essential role of hemopexin carboxyl-domain exosites in the cleavage of noncollagenous substrates such as chemokines has also been recently revealed. This article updates a previous review of the role of substrate recognition by MMP exosites in both preparing complex substrates, such as collagen, for cleavage and for tethering noncollagenous substrates to MMPs for more efficient proteolysis. Exosite domain interaction and movements--"molecular tectonics"--that are required for native collagen triple helicase activity are discussed. The potential role of collagen binding in regulating MMP-2 (gelatinase A) activation at the cell surface reveals unexpected consequences of substrate interactions that can lead to collagen cleavage and regulation of the activation and activity of downstream proteinases necessary to complete the collagenolytic cascade.
Collapse
|
14
|
Li Y, Baldauf S, Lim EK, Bowles DJ. Phylogenetic analysis of the UDP-glycosyltransferase multigene family of Arabidopsis thaliana. J Biol Chem 2001; 276:4338-43. [PMID: 11042215 DOI: 10.1074/jbc.m007447200] [Citation(s) in RCA: 291] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
A class of UDP-glycosyltransferases (UGTs) defined by the presence of a C-terminal consensus sequence is found throughout the plant and animal kingdoms. Whereas mammalian enzymes use UDP-glucuronic acid, the plant enzymes typically use UDP-glucose in the transfer reactions. A diverse array of aglycones can be glucosylated by these UGTs. In plants, the aglycones include plant hormones, secondary metabolites involved in stress and defense responses, and xenobiotics such as herbicides. Glycosylation is known to regulate many properties of the aglycones such as their bioactivity, their solubility, and their transport properties within the cell and throughout the plant. As a means of providing a framework to start to understand the substrate specificities and structure-function relationships of plant UGTs, we have now applied a molecular phylogenetic analysis to the multigene family of 99 UGT sequences in Arabidopsis. We have determined the overall organization and evolutionary relationships among individual members with a surprisingly high degree of confidence. Through constructing a composite phylogenetic tree that also includes all of the additional plant UGTs with known catalytic activities, we can start to predict both the evolutionary history and substrate specificities of new sequences as they are identified. The tree already suggests that while the activities of some subgroups of the UGT family are highly conserved among different plant species, others subgroups shift substrate specificity with relative ease.
Collapse
Affiliation(s)
- Y Li
- Department of Biology, University of York, P.O. Box 373, York YO10 5DD, United Kingdom
| | | | | | | |
Collapse
|
15
|
Binzak BA, Vockley JG, Jenkins RB, Vockley J. Structure and analysis of the human dimethylglycine dehydrogenase gene. Mol Genet Metab 2000; 69:181-7. [PMID: 10767172 DOI: 10.1006/mgme.2000.2980] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Dimethylglycine dehydrogenase (DMGDH; E.C. 1.5.99.2) is an enzyme involved in the catabolism of choline, catalyzing the oxidative demethylation of dimethylglycine (DMG) to form sarcosine. Subsequently, sarcosine dehydrogenase (SDH; E.C. 1.5.99.1) converts sarcosine to glycine via a similar reaction. Both enzymes are found as monomers in the mitochondrial matrix, and both contain 1 mol of covalently bound flavin adenine dinucleotide. DMGDH and SDH also utilize a noncovalently bound folate coenzyme that receives the "1-carbon" groups that are removed by DMGDH and SDH, forming "active formaldehyde." We have recently described a new inborn error of metabolism of DMGDH characterized by an unusual fish-like body odor. To augment our study of this new disorder, we have isolated two human genomic clones that together contain 16 exons of coding sequence for the hDMGDH gene. Fluorescent in situ hybridization analysis of the hDMGDH gene indicates that it is found on chromosome 5q12.2-q12.3. In addition, several polymorphisms have been identified in the hDMGDH cDNA sequence. Population analysis of two Ser/Pro polymorphisms found 367 amino acids apart reveals a skew of alleles, with the haplotypes Ser/Pro or Pro/Ser (79%) overrepresented compared to the number of Ser/Ser or Pro/Pro alleles observed. Possible functional consequences of these findings are discussed. Characterization of the gene structure for hDMGDH will aid in the study of patients with inherited defects of this enzyme.
Collapse
Affiliation(s)
- B A Binzak
- Department of Biochemistry and Molecular Biology, Mayo Medical and Graduate Schools, Rochester, Rochester, Minnesota 55905, USA
| | | | | | | |
Collapse
|
16
|
Abstract
Recent studies on the genomes of protists, plants, fungi and animals confirm that the increase in genome size and gene number in different eukaryotic lineages is paralleled by a general decrease in genome compactness and an increase in the number and size of introns. It may thus be predicted that exon-shuffling has become increasingly significant with the evolution of larger, less compact genomes. To test the validity of this prediction, we have analyzed the evolutionary distribution of modular proteins that have clearly evolved by intronic recombination. The results of this analysis indicate that modular multidomain proteins produced by exon-shuffling are restricted in their evolutionary distribution. Although such proteins are present in all major groups of metazoa from sponges to chordates, there is practically no evidence for the presence of related modular proteins in other groups of eukaryotes. The biological significance of this difference in the composition of the proteomes of animals, fungi, plants and protists is best appreciated when these modular proteins are classified with respect to their biological function. The majority of these proteins can be assigned to functional categories that are inextricably linked to multicellularity of animals, and are of absolute importance in permitting animals to function in an integrated fashion: constituents of the extracellular matrix, proteases involved in tissue remodelling processes, various proteins of body fluids, membrane-associated proteins mediating cell-cell and cell-matrix interactions, membrane associated receptor proteins regulating cell cell communications, etc. Although some basic types of modular proteins seem to be shared by all major groups of metazoa, there are also groups of modular proteins that appear to be restricted to certain evolutionary lineages. In summary, the results suggest that exon-shuffling acquired major significance at the time of metazoan radiation. It is interesting to note that the rise of exon-shuffling coincides with a spectacular burst of evolutionary creativity: the Big Bang of metazoan radiation. It seems probable that modular protein evolution by exon-shuffling has contributed significantly to this accelerated evolution of metazoa, since it facilitated the rapid construction of multidomain extracellular and cell surface proteins that are indispensable for multicellularity.
Collapse
Affiliation(s)
- L Patthy
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest.
| |
Collapse
|
17
|
Talts JF, Wirl G, Dictor M, Muller WJ, Fässler R. Tenascin-C modulates tumor stroma and monocyte/macrophage recruitment but not tumor growth or metastasis in a mouse strain with spontaneous mammary cancer. J Cell Sci 1999; 112 ( Pt 12):1855-64. [PMID: 10341205 DOI: 10.1242/jcs.112.12.1855] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The local growth of tumors and their ability to metastasize are crucially dependent on their interactions with the surrounding extracellular matrix. Tenascin-C (TNC) is an extracellular matrix protein which is highly expressed during development, tissue repair and cancer. Despite the high levels of TNC in the stroma of primary and metastatic tumors, the function of TNC is not known. In the present study we have crossed TNC-null mice with a mouse strain where both female and male mice spontaneously develop mammary tumors followed by metastatic disease in the lungs. We report that the absence of TNC had no effect on the temporal occurrence of mammary tumors and their metastatic dissemination in lungs. Furthermore, the number and size of tumors, the number and size of metastatic foci in the lungs, the proliferation rate and apoptosis of tumor cells and tumor angiogenesis were not altered in the absence of TNC. Histological examination revealed that the tumor organisation, however, was modulated by TNC. In the presence of TNC both primary as well as metastatic tumors were organised in large tumor cell nests surrounded by thick layers of extracellular matrix proteins. In the absence of TNC these tumor cell nests were smaller but still separated from each other by extracellular matrix proteins. In addition, the TNC-null stromal compartment contained significantly more monocytes/macrophages than tumor stroma from TNC wild-type mice. Using in vitro coculture experiments we show that TNC-null tumor cells were still able to activate the TNC gene in fibroblasts which express low basal levels of TNC. Altogether these data indicate that TNC has a very limited role during the spontaneous development and growth of mamary tumors and their metastasis to the lungs.
Collapse
Affiliation(s)
- J F Talts
- Max-Planck-Institute of Biochemistry, Department of Protein Chemistry, Germany
| | | | | | | | | |
Collapse
|
18
|
Abstract
Cells of the immune system have a large number of protein receptors on their surfaces, with a wide range of binding functions. They are, however, constructed from a limited set of protein structural units, which are recognisable at the sequence level. The 3D structure of many of these domains, or modules, is now known. These modular units and their structures are reviewed here. The ways in which they are assembled into multidomain receptor chains and oligomeric complexes of receptors are also discussed.
Collapse
Affiliation(s)
- I D Campbell
- Department of Biochemistry, University of Oxford, UK
| |
Collapse
|
19
|
Intron-exon structures. ACTA ACUST UNITED AC 1998. [DOI: 10.1016/s1067-5701(98)80020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
20
|
The Atypical Serine Proteases of the Complement System**Received for publication on October 7, 1997. Adv Immunol 1998. [DOI: 10.1016/s0065-2776(08)60609-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
21
|
Wakasugi K, Ishimori K, Morishima I. 'Module'-substituted globins: artificial exon shuffling among myoglobin, hemoglobin alpha- and beta-subunits. Biophys Chem 1997; 68:265-73. [PMID: 9468623 DOI: 10.1016/s0301-4622(97)80556-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Based on the detailed structural analysis of proteins, Go [M. Go, Nature 291 (1981) 90-92] found that protein structures can be divided into some structural units, 'modules,' which correspond to peptides coded by exons. In the present study, to investigate functional and structural roles of modular structures in proteins, we have engineered eight chimera globins, in which the exons are shuffled among human myoglobin, human hemoglobin alpha- and beta-subunits, in addition to the chimera beta beta alpha-globin described previously [K. Wakasugi, K. Ishimori, K. Imai, Y. Wada, I. Morishima, J. Biol. Chem. 269 (1994) 18750-18756]. Although all of the chimera globins stoichiometrically bound the heme and their alpha-helical contents increased by heme incorporation as found for native globins, the alpha-helical contents of the chimera globins were significantly lower than those of native globins, suggesting that 'module' substitutions seriously affect the protein folding and stability in globins. The comparisons among several chimera globins demonstrated that such structural alterations are mainly attributed to loss of some key intermodular interactions for protein folding. By simultaneous substitution of the modules M1 and M4 from the same globin, the protein structure was stabilized, which indicates that the module packing between modules M1 and M4 would be one of the crucial interaction to stabilize the globin fold. Present results allow us to conclude that module substitutions would be available for designing and producing novel functional proteins if we can reproduce the stable modular packing in the 'module'-substituted proteins.
Collapse
Affiliation(s)
- K Wakasugi
- Department of Molecular Engineering, Graduate School of Engineering, Kyoto University, Japan
| | | | | |
Collapse
|
22
|
Hegyi H, Bork P. On the classification and evolution of protein modules. JOURNAL OF PROTEIN CHEMISTRY 1997; 16:545-51. [PMID: 9246642 DOI: 10.1023/a:1026382032119] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Our efforts to classify the functional units of many proteins, the modules, are reviewed. The data from the sequencing projects for various model organisms are extremely helpful in deducing the evolution of proteins and modules. For example, a dramatic increase of modular proteins can be observed from yeast to C. elegans in accordance with new protein functions that had to be introduced in multicellular organisms. Our sequence characterization of modules relies on sensitive similarity search algorithms and the collection of multiple sequence alignments for each module. To trace the evolution of modules and to further automate the classification, we have developed a sequence and a module alerting system that checks newly arriving sequence data for the presence of already classified modules. Using these systems, we were able to identify an unexpected similarity between extracellular C1Q modules with bacterial proteins.
Collapse
|
23
|
Rzhetsky A, Ayala FJ, Hsu LC, Chang C, Yoshida A. Exon/intron structure of aldehyde dehydrogenase genes supports the "introns-late" theory. Proc Natl Acad Sci U S A 1997; 94:6820-5. [PMID: 9192649 PMCID: PMC21242 DOI: 10.1073/pnas.94.13.6820] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Whether or not nuclear introns predate the divergence of bacteria and eukaryotes is the central argument between the proponents of the "introns-early" and "introns-late" theories. In this study we compared the goodness-of-fit of each theory with a probabilistic model of exon/intron evolution and multiple nonallelic genes encoding human aldehyde dehydrogenases (ALDHs). Using a reconstructed phylogenetic tree of ALDH genes, we computed the likelihood of obtaining the present-day ALDH sequences under the assumptions of each competing theory. Although on the grounds of its own assumptions each theory accounted for the ALDH data significantly better than its rival, the introns-early model required frequent intron slippage, and the estimated slippage rates were too high to be consistent with reported correlations between the boundaries of ancient protein modules and the ends of ancient exons. Because the molecular mechanisms proposed to explain intron slippage are incapable of providing such high rates and are incompatible with the observed distribution of introns in higher eukaryotes, the ALDH data support the introns-late theory.
Collapse
Affiliation(s)
- A Rzhetsky
- Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | |
Collapse
|
24
|
Tremousaygue D, Bardet C, Dabos P, Regad F, Pelese F, Nazer R, Gander E, Lescure B. Genome DNA sequencing around the EF-1 alpha multigene locus of Arabidopsis thaliana indicates a high gene density and a shuffling of noncoding regions. Genome Res 1997; 7:198-209. [PMID: 9074924 DOI: 10.1101/gr.7.3.198] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In Arabidopsis thaliana, EF-1 alpha proteins are encoded by a multigene family of four members. Three of them are clustered at the same locus, which was positioned 24 cM from the top of chromosome 1. A region of DNA spanning 63 kb around these locus was sequenced and analyzed. One main characteristic of the locus is the mosaic organization of both genes and intergenic regions. Fourteen genes were identified, among which only four were already described, and other unidentified are most likely present. Functionally diverse genes are found at close intervals. Exon and intron distribution is highly variable at this locus, one gene being split into at least 20 introns. Several duplications were found within the sequenced segment both in coding and noncoding regions, including two gene families. Moreover, a sequence corresponding to the 5' noncoding region of the EF-1 alpha genes and harboring a 5' intervening sequence is duplicated and found upstream of several genes, suggesting that noncoding regions can be shuffled during evolution.
Collapse
Affiliation(s)
- D Tremousaygue
- Laboratoire de Biologie Moleculaire des relations Plantes-Microorganismes, Centre National de la Recherche Scientifique (CNRS)-Institut National de la Recherche Agronomique (INRA), Castanet Tolosan, France.
| | | | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
Thanks to recent improvements in techniques used for the detection of homologies, it is now clear that module exchange played a major role in protein evolution. Analysis of the genes of various modular proteins has identified a large number of cases where gene assembly was facilitated by intronic recombination--i.e., the proteins were formed by exon shuffling. Studies of the principles and mechanistic details of exon shuffling, however, revealed that this powerful evolutionary mechanism could become significant only after the appearance of spliceosomal introns typical of higher eukaryotes. Although exon shuffling is the most efficient way of constructing modular proteins, recent studies on the evolution of multidomain proteins of prokaryotes emphasize that intronic recombination is not an absolute prerequisite of module exchange.
Collapse
Affiliation(s)
- L Patthy
- Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
26
|
Long M, de Souza SJ, Rosenberg C, Gilbert W. Exon shuffling and the origin of the mitochondrial targeting function in plant cytochrome c1 precursor. Proc Natl Acad Sci U S A 1996; 93:7727-31. [PMID: 8755543 PMCID: PMC38815 DOI: 10.1073/pnas.93.15.7727] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Since most of the examples of "exon shuffling" are between vertebrate genes, the view is often expressed that exon shuffling is limited to the evolutionarily recent lineage of vertebrates. Although exon shuffling in plants has been inferred from the analysis of intron phases of plant genes [Long, M., Rosenberg, C. & Gilbert, W. (1995) Proc. Natl. Acad. Sci. USA 92, 12495-12499] and from the comparison of two functionally unknown sunflower genes [Domon, C. & Steinmetz, A. (1994) Mol. Gen. Genet. 244, 312-317], clear cases of exon shuffling in plant genes remain to be uncovered. Here, we report an example of exon shuffling in two important nucleus-encoded plant genes: cytosolic glyceraldehyde-3-phosphate dehydrogenase (cytosolic GAPDH or GapC) and cytochrome c1 precursor. The intron-exon structures of the shuffled region indicate that the shuffling event took place at the DNA sequence level. In this case, we can establish a donor-recipient relationship for the exon shuffling. Three amino terminal exons of GapC have been donated to cytochrome c1, where, in a new protein environment, they serve as a source of the mitochondrial targeting function. This finding throws light upon an old important but unsolved question in gene evolution: the origin of presequences or transit peptides that generally exist in nucleus-encoded organelle genes.
Collapse
Affiliation(s)
- M Long
- Department of Molecular and Cellular Biology, The Biological Laboratories, Harvard University, Cambridge, MA 02138, USA
| | | | | | | |
Collapse
|
27
|
Bork P, Downing AK, Kieffer B, Campbell ID. Structure and distribution of modules in extracellular proteins. Q Rev Biophys 1996; 29:119-67. [PMID: 8870072 DOI: 10.1017/s0033583500005783] [Citation(s) in RCA: 234] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
It has become standard practice to compare new amino-acid and nucleotide sequences with existing ones in the rapidly growing sequence databases. This has led to the recurring identification of certain sequence patterns, usually corresponding to less than 300 amino-acids in length. Many of these identifiable sequence regions have been shown to fold up to form a ‘domain’ structure; they are often called protein ‘modules’ (see definitions below). Proteins that contain such modules are widely distributed in biology, but they are particularly common in extracellular proteins.
Collapse
Affiliation(s)
- P Bork
- Max-Delbrück-Center for Molecular Medicine, Berlin-Buch, Germany
| | | | | | | |
Collapse
|
28
|
Long M, Rosenberg C, Gilbert W. Intron phase correlations and the evolution of the intron/exon structure of genes. Proc Natl Acad Sci U S A 1995; 92:12495-9. [PMID: 8618928 PMCID: PMC40384 DOI: 10.1073/pnas.92.26.12495] [Citation(s) in RCA: 186] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Two issues in the evolution of the intron/exon structure of genes are the role of exon shuffling and the origin of introns. Using a large data base of eukaryotic intron-containing genes, we have found that there are correlations between intron phases leading to an excess of symmetric exons and symmetric exon sets. We interpret these excesses as manifestations of exon shuffling and make a conservative estimate that at least 19% of the exons in the data base were involved in exon shuffling, suggesting an important role for exon shuffling in evolution. Furthermore, these excesses of symmetric exons appear also in those regions of eukaryotic genes that are homologous to prokaryotic genes: the ancient conserved regions. This last fact cannot be explained in terms of the insertional theory of introns but rather supports the concept that some of the introns were ancient, the exon theory of genes.
Collapse
Affiliation(s)
- M Long
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
29
|
Kwiatowski J, Krawczyk M, Kornacki M, Bailey K, Ayala FJ. Evidence against the exon theory of genes derived from the triose-phosphate isomerase gene. Proc Natl Acad Sci U S A 1995; 92:8503-6. [PMID: 7667319 PMCID: PMC41185 DOI: 10.1073/pnas.92.18.8503] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The exon theory of genes proposes that the introns of protein-encoding nuclear genes are remnants of the DNA spacers between ancient minigenes. The discovery of an intron at a predicted position in the triose-phosphate isomerase (EC 5.3.1.1) gene of Culex mosquitoes has been hailed as an evidential pillar of the theory. We have found that that intron is also present in Aedes mosquitoes, which are closely related to Culex, but not in the phylogenetically more distant Anopheles, nor in the fly Calliphora vicina, nor in the moth Spodoptera littoralis. The presence of this intron in Culex and Aedes is parsimoniously explained as the result of an insertion in a recent common ancestor of these two species rather than as the remnant of an ancient intron. The absence of the intron in 19 species of very diverse organisms requires at least 10 independent evolutionary losses in order to be consistent with the exon theory.
Collapse
|
30
|
Teller JK, Baker PJ, Britton KL, Engel PC, Rice DW, Stillman TJ. Correlation of intron-exon organisation with the three-dimensional structure in glutamate dehydrogenase. BIOCHIMICA ET BIOPHYSICA ACTA 1995; 1247:231-8. [PMID: 7696313 DOI: 10.1016/0167-4838(94)00240-h] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The positions of the intron-exon boundaries in the genes for glutamate dehydrogenase from Chlorella sorokiniana rat, and human have been located on the three-dimensional structure of the highly homologous enzyme from Clostridium symbiosum and analysed for their position in the protein structure. This analysis shows no correlation between the positions of these boundaries in the mammalian and Chlorella glutamate dehydrogenase genes and no correlation with units of function in the enzyme and suggests that the present day exons do not represent the protein modules of an ancestral glutamate dehydrogenase. There appears to be no clear preference for the residues at the splice junctions to be either buried or exposed to solvent. However, the frequency with which the introns appear in the loops linking elements of secondary structure, rather than in either the alpha-helical or beta-sheet segments, is higher than predicted on the basis of the proportion of residues in the loops. This is consistent with but not proof of a role for exon modification/exchange in protein evolution since changes at these positions are less likely to disturb the structure and hence maintain function.
Collapse
Affiliation(s)
- J K Teller
- Krebs Institute for Biomolecular Research, Department of Molecular Biology and Biotechnology, University of Sheffield, UK
| | | | | | | | | | | |
Collapse
|
31
|
Strelets VB, Lim HA. Ancient splice junction shadows with relation to blocks in protein structure. Biosystems 1995; 36:37-41. [PMID: 8527694 DOI: 10.1016/0303-2647(95)01525-p] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Splice junction shadows (ancient exon-exon junctions) presumably reflect the existence of amino acid primary blocks which were used in the course of evolution for the construction of new proteins. The lengths of such blocks (i.e. regions between splice junctions), as the lengths of corresponding inserted or duplicated ancient exons, should be divisible by three in order to store the preexisting coding frame in the course of evolution. In this paper, we will test the hypothesis of intron-mediated recombination in a model of block molecular evolution (exon shuffling) by revealing corresponding blocks in existing database-contained coding sequences. For this purpose, we use a weight matrix prediction of ancient splice junction shadows in coding regions of the nucleotide sequences in current databases. The usage of splice junction shadows allows us to test the block evolution hypothesis in better detail in comparison with previous methods which were based only on currently existing recent exons. Our result of block length distribution at the nucleotide level shows a clear tendency to be divisible by three. At the protein level, several unexpected favorable block lengths, which are six, nine, 12 and 15 amino acids in length, were observed. Further refinements in our method for revealing splice junction shadows (structural block boundaries) might reveal peptides which probably maintain stable folds in different structures. The latter can in turn be used for protein structure prediction.
Collapse
Affiliation(s)
- V B Strelets
- Supercomputer Computations Research Institute, Florida State University, Tallahassee 32306-4052, USA
| | | |
Collapse
|
32
|
Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit Rev Biochem Mol Biol 1995; 30:1-94. [PMID: 7587278 DOI: 10.3109/10409239509085139] [Citation(s) in RCA: 97] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
This review attempts a critical stock-taking of the current state of the science aimed at predicting structural features of proteins from their amino acid sequences. At the primary structure level, methods are considered for detection of remotely related sequences and for recognizing amino acid patterns to predict posttranslational modifications and binding sites. The techniques involving secondary structural features include prediction of secondary structure, membrane-spanning regions, and secondary structural class. At the tertiary structural level, methods for threading a sequence into a mainchain fold, homology modeling and assigning sequences to protein families with similar folds are discussed. A literature analysis suggests that, to date, threading techniques are not able to show their superiority over sequence pattern recognition methods. Recent progress in the state of ab initio structure calculation is reviewed in detail. The analysis shows that many structural features can be predicted from the amino acid sequence much better than just a few years ago and with attendant utility in experimental research. Best prediction can be achieved for new protein sequences that can be assigned to well-studied protein families. For single sequences without homologues, the folding problem has not yet been solved.
Collapse
Affiliation(s)
- F Eisenhaber
- Institut für Biochemie der Charité, Medizinische Fakultät, Humboldt-Universität zu Berlin, Fed. Rep. Germany
| | | | | |
Collapse
|
33
|
Strelets VB, Shindyalov IN, Lim HA. Analysis of peptides from known proteins: clusterization in sequence space. J Mol Evol 1994; 39:625-30. [PMID: 7807551 DOI: 10.1007/bf00160408] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
A combinatorial sequence space (CSS) model was introduced to represent sequences as a set of overlapping k-tuples of some fixed length which correspond to points in the CSS. The aim was to analyze clusterization of protein sequences in the CSS and to test various hypotheses about the possible evolutionary basis of this clusterization. The authors developed an easy-to-use technique which can reveal and analyze such a clusterization in a multidimensional CSS. Application of the technique led to an unexpectedly high clusterization of points in the CSS corresponding to k-tuples from known proteins. The clusterization could not be inferred from nonuniform amino acid frequencies or be explained by the influence of homologous data. None of the tested possible evolutionary and structural factors could explain the clusterization observed either. It looked as if certain protein sequence variations occurred and were fixed in the early course of evolution. Subsequent evolution (predominantly neutral) allowed only a limited number of changes and permitted new variants which led to preservation of certain k-tuples during the course of evolution. This was consistent with the theory of exon shuffling and protein block structure evolution. Possible applications of sequence space features found were also discussed.
Collapse
Affiliation(s)
- V B Strelets
- Supercomputer Computations Research Institute, Florida State University, Tallahassee 32306-4052
| | | | | |
Collapse
|
34
|
Woessner JP, Molendijk AJ, van Egmond P, Klis FM, Goodenough UW, Haring MA. Domain conservation in several volvocalean cell wall proteins. PLANT MOLECULAR BIOLOGY 1994; 26:947-960. [PMID: 8000007 DOI: 10.1007/bf00028861] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Based on our previous work demonstrating that (SerPro)x epitopes are common to extensin-like cell wall proteins in Chlamydomonas' reinhardtii, we looked for similar proteins in the distantly related species C. eugametos. Using a polyclonal antiserum against a (SerPro)10 oligopeptide, we found distinct sets of stage-specific polypeptides immunoprecipitated from in vitro translations of C. eugametos RNA. Screening of a C. eugametos cDNA expression library with the antiserum led to the isolation of a cDNA (WP6) encoding a (SerPro)x-rich multidomain wall protein. Analysis of a similarly selected cDNA (VSP-3) from a C. reinhardtii cDNA expression library revealed that it also coded for a (SerPro)x-rich multidomain wall protein. The C-terminal rod domains of VSP-3 and WP6 are highly homologous, while the N-terminal domains are dissimilar; however, the N-terminal domain of VSP-3 is homologous to the globular domain of a cell wall protein from Volvox carteri. Exon shuffling might be responsible for this example of domain conservation over 350 million years of volvocalean cell wall protein evolution.
Collapse
Affiliation(s)
- J P Woessner
- Department of Biology, Washington University, St. Louis, MO 63130
| | | | | | | | | | | |
Collapse
|
35
|
Stoltzfus A, Spencer DF, Zuker M, Logsdon JM, Doolittle WF. Testing the exon theory of genes: the evidence from protein structure. Science 1994; 265:202-7. [PMID: 8023140 DOI: 10.1126/science.8023140] [Citation(s) in RCA: 158] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
A tendency for exons to correspond to discrete units of protein structure in protein-coding genes of ancient origin would provide clear evidence in favor of the exon theory of genes, which proposes that split genes arose not by insertion of introns into unsplit genes, but from combinations of primordial mini-genes (exons) separated by spacers (introns). Although putative examples of such correspondence have strongly influenced previous debate on the origin of introns, a general correspondence has not been rigorously proved. Objective methods for detecting correspondences were developed and applied to four examples that have been cited previously as evidence of the exon theory of genes. No significant correspondence between exons and units of protein structure was detected, suggesting that the putative correspondence does not exist and that the exon theory of genes is untenable.
Collapse
Affiliation(s)
- A Stoltzfus
- Department of Biochemistry, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | | | | | | |
Collapse
|
36
|
|
37
|
White SH. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences. J Mol Evol 1994; 38:383-94. [PMID: 8007006 DOI: 10.1007/bf00163155] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79-95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 "super-family" proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length. The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079-2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377-1382).
Collapse
Affiliation(s)
- S H White
- Department of Physiology and Biophysics, University of California, Irvine 92717
| |
Collapse
|
38
|
Affiliation(s)
- N J Dibb
- Department of Haematology, Royal Postgraduate Medical School, Hammersmith Hospital, London, UK
| |
Collapse
|
39
|
White SH, Jacobs RE. The evolution of proteins from random amino acid sequences. I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 1993; 36:79-95. [PMID: 8433379 DOI: 10.1007/bf02407307] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identify using the run test statistic (ro) of Mood (1940, Ann. Math. Stat. 11, 367-392). The probability density of ro for a collection of random sequences has mean = 0 and variance = 1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong alpha-helix propensity show a strong tendency to cluster whereas those with beta-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic "patterns" that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.
Collapse
Affiliation(s)
- S H White
- Department of Physiology and Biophysics, University of California, Irvine 92717
| | | |
Collapse
|
40
|
Nolan KF, Kaluz S, Higgins JM, Goundis D, Reid KB. Characterization of the human properdin gene. Biochem J 1992; 287 ( Pt 1):291-7. [PMID: 1417780 PMCID: PMC1133157 DOI: 10.1042/bj2870291] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
A cosmid clone containing the complete coding sequence of the human properdin gene has been characterized. The gene is located at one end of the approximately 40 kb cosmid insert and approximately 8.2 kb of the sequence data have been obtained from this region. Two discrepancies with the published cDNA sequence [Nolan, Schwaeble, Kaluz, Dierich & Reid (1991) Eur. J. Immunol. 21, 771-776] have been resolved. Properdin has previously been described as a modular protein, with the majority of its sequence composed of six tandem repeats of a sequence motif of approximately 60 amino acids which is related to the type-I repeat sequence (TSR), initially described in thrombospondin [Lawler & Hynes (1986) J. Cell Biol. 103, 1635-1648; Goundis & Reid (1988), Nature (London) 335, 82-85]. Analysis of the genomic sequence data indicates that the human properdin gene is organized into ten exons which span approximately 6 kb of the genome. TSRs 2-5 are coded for by discrete, symmetrical exons (phase 1-1), which supports the hypothesis that modular proteins evolved by a process involving exon shuffling. TSR1 is also coded for by a discrete exon, but the boundaries are asymmetrical (phase 2-1). The sequence coding for the sixth TSR is split across the final two exons of the gene with the first 38 amino acids of the repeat coded for by an asymmetric exon (phase 1-2). This split at the genomic level has been shown, by alignment analysis, to be reflected at the protein level with the division of repeat 6 into TSR-like and TSR-unlike sequences.
Collapse
Affiliation(s)
- K F Nolan
- Department of Biochemistry, University of Oxford, U.K
| | | | | | | | | |
Collapse
|
41
|
Abstract
Nonhomologous fully sequenced human protein-coding genes were studied. Three sets of exon-exon junctions were formed defined by the intron (shadow) position relative to the reading frame. For the analysis of intron shadow signals in exons, information content and discrimination energy approaches were used with the correction allowing one to ignore the influence of a protein-coding message. The corrected formulas allow one to define the consensuses for the three types of intron shadow signals as a AG/guwn, cAG/GUnn, and cAG/gunU, and provide better recognition than the original formulas. The analysis of the codon usage in the signal positions leads to the conclusion that the prevalence of some amino acids in corresponding protein sites is caused by the signal requirements and not vice versa. The distribution of potential intron shadow signals in exons contradicts the hypothesis of intron insertion into suitable preexisting sites. There exists a correlation between the intron types and/or the exon length modulo 3.
Collapse
Affiliation(s)
- M S Gelfand
- Institute of Protein Research, Russia Academy of Sciences, Pushchino, Moscow Region
| |
Collapse
|
42
|
Cavener DR. GMC oxidoreductases. A newly defined family of homologous proteins with diverse catalytic activities. J Mol Biol 1992; 223:811-4. [PMID: 1542121 DOI: 10.1016/0022-2836(92)90992-s] [Citation(s) in RCA: 212] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Sequence comparison of Drosophila melanogaster glucose dehydrogenase, Escherichia coli choline dehydrogenase, Aspergillus niger glucose oxidase and Hansenula polymorpha methanol oxidase indicates that these four diverse flavoproteins are homologous, defining a new family of proteins named the GMC oxidoreductases. These enzymes contain a canonical ADP-binding beta alpha beta-fold close to their amino termini as found in other flavoenzymes. This domain is encoded by a single exon of the D. melanogaster glucose dehydrogenase gene.
Collapse
Affiliation(s)
- D R Cavener
- Department of Molecular Biology, Vanderbilt University, Nashville, TN 37235
| |
Collapse
|
43
|
Abstract
The catalogue of mosaic proteins showing evidence of exon-shuffling continues to expand. The repeated use of exon modules suggests that current protein diversity could have been generated from a finite set of such exon modules, and that the size and character of this underlying exon universe can still be glimpsed in extant proteins.
Collapse
Affiliation(s)
- R L Dorit
- Department of Biology, Yale University, Osborn Memorial Laboratories, New Haven, Connecticut 06511
| | | |
Collapse
|
44
|
|
45
|
Abstract
Accumulating evidence that introns are highly restricted in their phylogenetic distribution strongly supports the view that introns were inserted late in eukaryotic evolution into preformed genes and, hence, that exon-shuffling played no role in the assembly of primordial genes. Potential mechanisms of intron insertion and the possible evolution of nuclear introns and their splicing machinery from self-splicing group II introns are also discussed.
Collapse
Affiliation(s)
- J D Palmer
- Department of Biology, Indiana University, Bloomington 47405
| | | |
Collapse
|
46
|
Bräuer C, Scheit KH. Characterization of the gene for the bovine seminal vesicle secretory protein SVSP109. BIOCHIMICA ET BIOPHYSICA ACTA 1991; 1090:259-60. [PMID: 1932121 DOI: 10.1016/0167-4781(91)90113-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
As part of an attempt to understand androgen-regulation of SVSP109, a bovine seminal vesicle secretory protein of 109 amino acids, we have characterized the bovine SVSP109 gene. The 6.1 kb gene is organized in five exons and four introns. Regulatory sequences involved in regulation of transcription could not be identified by simple sequence homologies. The SVSP109 gene may provide an excellent example for functional properties of exons: exon 1 encodes the entire signal peptide and exon 4 the complete fibronectin type II-domain, responsible for protein-protein interactions.
Collapse
Affiliation(s)
- C Bräuer
- Max-Plank-Institut für Biophysikalische Chemie, Department Molecular Biology, Göttingen, Germany
| | | |
Collapse
|
47
|
|