1
|
McConnell BS, Parker MW. Protein intrinsically disordered regions have a non-random, modular architecture. Bioinformatics 2023; 39:btad732. [PMID: 38039154 PMCID: PMC10719218 DOI: 10.1093/bioinformatics/btad732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 11/03/2023] [Accepted: 11/30/2023] [Indexed: 12/03/2023] Open
Abstract
MOTIVATION Protein sequences can be broadly categorized into two classes: those which adopt stable secondary structure and fold into a domain (i.e. globular proteins), and those that do not. The sequences belonging to this latter class are conformationally heterogeneous and are described as being intrinsically disordered. Decades of investigation into the structure and function of globular proteins has resulted in a suite of computational tools that enable their sub-classification by domain type, an approach that has revolutionized how we understand and predict protein functionality. Conversely, it is unknown if sequences of disordered protein regions are subject to broadly generalizable organizational principles that would enable their sub-classification. RESULTS Here, we report the development of a statistical approach that quantifies linear variance in amino acid composition across a sequence. With multiple examples, we provide evidence that intrinsically disordered regions are organized into statistically non-random modules of unique compositional bias. Modularity is observed for both low and high-complexity sequences and, in some cases, we find that modules are organized in repetitive patterns. These data demonstrate that disordered sequences are non-randomly organized into modular architectures and motivate future experiments to comprehensively classify module types and to determine the degree to which modules constitute functionally separable units analogous to the domains of globular proteins. AVAILABILITY AND IMPLEMENTATION The source code, documentation, and data to reproduce all figures are freely available at https://github.com/MWPlabUTSW/Chi-Score-Analysis.git. The analysis is also available as a Google Colab Notebook (https://colab.research.google.com/github/MWPlabUTSW/Chi-Score-Analysis/blob/main/ChiScore_Analysis.ipynb).
Collapse
Affiliation(s)
- Brendan S McConnell
- Department of Biophysics, , University of Texas Southwestern Medical Center, Dallas, TX 75235, United States
| | - Matthew W Parker
- Department of Biophysics, , University of Texas Southwestern Medical Center, Dallas, TX 75235, United States
| |
Collapse
|
2
|
Zhang L, Liu Y, Wei G, Lei T, Wu J, Zheng L, Ma H, He G, Wang N. POLLEN WALL ABORTION 1 is essential for pollen wall development in rice. PLANT PHYSIOLOGY 2022; 190:2229-2245. [PMID: 36111856 PMCID: PMC9706457 DOI: 10.1093/plphys/kiac435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 08/15/2022] [Indexed: 06/15/2023]
Abstract
The integrity of pollen wall structures is essential for pollen development and maturity in rice (Oryza sativa L.). In this study, we isolated and characterized the rice male-sterile mutant pollen wall abortion 1 (pwa1), which exhibits a defective pollen wall (DPW) structure and has sterile pollen. Map-based cloning, genetic complementation, and gene knockout experiments revealed that PWA1 corresponds to the gene LOC_Os01g55094 encoding a coiled-coil domain-containing protein. PWA1 localized to the nucleus, and PWA1 was expressed in the tapetum and microspores. PWA1 interacted with the transcription factor TAPETUM DEGENERATION RETARDATION (TDR)-INTERACTING PROTEIN2 (TIP2, also named bHLH142) in vivo and in vitro. The tip2-1 mutant, which we obtained by clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9-mediated gene editing, showed delayed tapetum degradation, sterile pollen, and DPWs. We determined that TIP2/bHLH142 regulates PWA1 expression by binding to its promoter. Analysis of the phenotype of the tip2-1 pwa1 double mutant indicated that TIP2/bHLH142 functions upstream of PWA1. Further studies suggested that PWA1 has transcriptional activation activity and participates in pollen intine development through the β-glucosidase Os12BGlu38. Therefore, we identified a sterility factor, PWA1, and uncovered a regulatory network underlying the formation of the pollen wall and mature pollen in rice.
Collapse
Affiliation(s)
- Lisha Zhang
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Yang Liu
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Gang Wei
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Ting Lei
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Jingwen Wu
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Lintao Zheng
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Honglei Ma
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Guanghua He
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| | - Nan Wang
- Key Laboratory of Application and Safety Control of Genetically Modified Crops, College of Agronomy and Biotechnology, Southwest University, Chongqing 400715, China
| |
Collapse
|
3
|
Patthy L. Exon Shuffling Played a Decisive Role in the Evolution of the Genetic Toolkit for the Multicellular Body Plan of Metazoa. Genes (Basel) 2021; 12:382. [PMID: 33800339 PMCID: PMC8001218 DOI: 10.3390/genes12030382] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 03/01/2021] [Accepted: 03/04/2021] [Indexed: 11/30/2022] Open
Abstract
Division of labor and establishment of the spatial pattern of different cell types of multicellular organisms require cell type-specific transcription factor modules that control cellular phenotypes and proteins that mediate the interactions of cells with other cells. Recent studies indicate that, although constituent protein domains of numerous components of the genetic toolkit of the multicellular body plan of Metazoa were present in the unicellular ancestor of animals, the repertoire of multidomain proteins that are indispensable for the arrangement of distinct body parts in a reproducible manner evolved only in Metazoa. We have shown that the majority of the multidomain proteins involved in cell-cell and cell-matrix interactions of Metazoa have been assembled by exon shuffling, but there is no evidence for a similar role of exon shuffling in the evolution of proteins of metazoan transcription factor modules. A possible explanation for this difference in the intracellular and intercellular toolkits is that evolution of the transcription factor modules preceded the burst of exon shuffling that led to the creation of the proteins controlling spatial patterning in Metazoa. This explanation is in harmony with the temporal-to-spatial transition hypothesis of multicellularity that proposes that cell differentiation may have predated spatial segregation of cell types in animal ancestors.
Collapse
Affiliation(s)
- Laszlo Patthy
- Institute of Enzymology, Research Centre for Natural Sciences, H-1117 Budapest, Hungary
| |
Collapse
|
4
|
Zhao Z, Heideman N, Hofmeyr MD. Codon‐based analysis of selection pressure and genetic structure in the
Psammobates tentorius
(Bell, 1828) species complex, and phylogeny inferred from both codons and amino acid sequences. Afr J Ecol 2020. [DOI: 10.1111/aje.12840] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Zhongning Zhao
- Department of Zoology and Entomology University of the Free State Bloemfontein South Africa
| | - Neil Heideman
- Department of Zoology and Entomology University of the Free State Bloemfontein South Africa
| | - Margaretha D. Hofmeyr
- Chelonian Biodiversity and Conservation Department of Biodiversity and Conservation Biology University of the Western Cape Bellville South Africa
| |
Collapse
|
5
|
A Silent Exonic Mutation in a Rice Integrin-α FG-GAP Repeat-Containing Gene Causes Male-Sterility by Affecting mRNA Splicing. Int J Mol Sci 2020; 21:ijms21062018. [PMID: 32188023 PMCID: PMC7139555 DOI: 10.3390/ijms21062018] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 03/12/2020] [Accepted: 03/14/2020] [Indexed: 12/11/2022] Open
Abstract
Pollen development plays crucial roles in the life cycle of higher plants. Here we characterized a rice mutant with complete male-sterile phenotype, pollen-less 1 (pl1). pl1 exhibited smaller anthers with arrested pollen development, absent Ubisch bodies, necrosis-like tapetal hypertrophy, and smooth anther cuticular surface. Molecular mapping revealed a synonymous mutation in the fourth exon of PL1 co-segregated with the mutant phenotype. This mutation disrupts the exon-intron splice junction in PL1, generating aberrant mRNA species and truncated proteins. PL1 is highly expressed in the tapetal cells of developing anther, and its protein is co-localized with plasma membrane (PM) and endoplasmic reticulum (ER) signal. PL1 encodes an integrin-α FG-GAP repeat-containing protein, which has seven β-sheets and putative Ca2+-binding motifs and is broadly conserved in terrestrial plants. Our findings therefore provide insights into both the role of integrin-α FG-GAP repeat-containing protein in rice male fertility and the influence of exonic mutation on intronic splice donor site selection.
Collapse
|
6
|
Schad E, Kalmar L, Tompa P. Exon-phase symmetry and intrinsic structural disorder promote modular evolution in the human genome. Nucleic Acids Res 2013; 41:4409-22. [PMID: 23460204 PMCID: PMC3632108 DOI: 10.1093/nar/gkt110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
A key signature of module exchange in the genome is phase symmetry of exons, suggestive of exon shuffling events that occurred without disrupting translation reading frame. At the protein level, intrinsic structural disorder may be another key element because disordered regions often serve as functional elements that can be effectively integrated into a protein structure. Therefore, we asked whether exon-phase symmetry in the human genome and structural disorder in the human proteome are connected, signalling such evolutionary mechanisms in the assembly of multi-exon genes. We found an elevated level of structural disorder of regions encoded by symmetric exons and a preferred symmetry of exons encoding for mostly disordered regions (>70% predicted disorder). Alternatively spliced symmetric exons tend to correspond to the most disordered regions. The genes of mostly disordered proteins (>70% predicted disorder) tend to be assembled from symmetric exons, which often arise by internal tandem duplications. Preponderance of certain types of short motifs (e.g. SH3-binding motif) and domains (e.g. high-mobility group domains) suggests that certain disordered modules have been particularly effective in exon-shuffling events. Our observations suggest that structural disorder has facilitated modular assembly of complex genes in evolution of the human genome.
Collapse
Affiliation(s)
- Eva Schad
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest 1113, Hungary
| | | | | |
Collapse
|
7
|
Abstract
Receptor tyrosine kinases (RTKs) are transmembrane proteins involved in the control of fundamental cellular processes in metazoans. RTKs possess a general structure that includes an extracellular domain, a transmembrane domain and a highly conserved tyrosine kinase domain. RTKs are classified according to their variable extracellular ligand-binding domain. Studies of human RTK members have yielded a wealth of information elucidating their importance. Improper functioning of these enzymes due to mutations, mainly in the kinase domain, is often manifested in various human diseases and is known to be involved in several types of cancer. Here we summarize most of human RTKs, their cognate ligands, as well as related diseases and discuss the eventual use of certain RTKs as new therapeutic targets.
Collapse
Affiliation(s)
- Mouna Choura
- Molecular and Cellular Diagnosis Processes, Centre of Biotechnology of Sfax, University of Sfax , Route Sidi Mansour, Sfax , Tunisia
| | | |
Collapse
|
8
|
Sawada R, Mitaku S. How are exons encoding transmembrane sequences distributed in the exon-intron structure of genes? Genes Cells 2010; 16:115-21. [PMID: 21143351 DOI: 10.1111/j.1365-2443.2010.01468.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The exon-intron structure of eukaryotic genes raises a question about the distribution of transmembrane regions in membrane proteins. Were exons that encode transmembrane regions formed simply by inserting introns into preexisting genes or by some kind of exon shuffling? To answer this question, the exon-per-gene distribution was analyzed for all genes in 40 eukaryotic genomes with a particular focus on exons encoding transmembrane segments. In 21 higher multicellular eukaryotes, the percentage of multi-exon genes (those containing at least one intron) within all genes in a genome was high (>70%) and with a mean of 87%. When genes were grouped by the number of exons per gene in higher eukaryotes, good exponential distributions were obtained not only for all genes but also for the exons encoding transmembrane segments, leading to a constant ratio of membrane proteins independent of the exon-per-gene number. The positional distribution of transmembrane regions in single-pass membrane proteins showed that they are generally located in the amino or carboxyl terminal regions. This nonrandom distribution of transmembrane regions explains the constant ratio of membrane proteins to the exon-per-gene numbers because there are always two terminal (i.e., the amino and carboxyl) regions - independent of the length of sequences.
Collapse
Affiliation(s)
- Ryusuke Sawada
- Department of Computational Science and Engineering, Graduate School of Engineering, Nagoya University, Furocho, Chikusa-ku, Nagoya 464-8606, Japan.
| | | |
Collapse
|
9
|
Egesten A, Frick IM, Mörgelin M, Olin AI, Björck L. Binding of albumin promotes bacterial survival at the epithelial surface. J Biol Chem 2010; 286:2469-76. [PMID: 21098039 DOI: 10.1074/jbc.m110.148171] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Human serum albumin (HSA) is the dominating protein in human plasma. Many bacterial species, especially streptococci, express surface proteins that bind HSA with high specificity and affinity, but the biological consequences of these protein-protein interactions are poorly understood. Group G streptococci (GGS), carrying the HSA-binding protein G, colonize the skin and the mucosa of the upper respiratory tract, mostly without causing disease. In the case of bacterial invasion, pro-inflammatory cytokines are released that activate the epithelium to produce antibacterial peptides, in particular the chemokine MIG/CXCL9. In addition, the inflammation causes capillary leakage and extravasation of HSA and other plasma proteins, environmental changes at the epithelial surface to which the bacteria need to respond. In this study, we found that GGS adsorbed HSA from both saliva and plasma via binding to protein G and that HSA bound to protein G bound and inactivated the antibacterial MIG/CXCL9 peptide. Another surface protein of GGS, FOG, was found to mediate adherence of the bacteria to pharyngeal epithelial cells through interaction with glycosaminoglycans. This adherence was not affected by activation of the epithelium with a combination of IFN-γ and TNF-α, leading to the production of MIG/CXCL9. However, at the activated epithelial surface, adherent GGS were protected against killing by MIG/CXCL9 through protein G-dependent HSA coating. The findings identify a previously unknown bacterial survival strategy that helps to explain the evolution of HSA-binding proteins among bacterial species of the normal human microbiota.
Collapse
Affiliation(s)
- Arne Egesten
- Section for Respiratory Medicine and Allergology, Department of Clinical Sciences, Lund University and Lund University Hospital, SE-221 85 Lund, Sweden.
| | | | | | | | | |
Collapse
|
10
|
Mucin CYS domains are ancient and highly conserved modules that evolved in concert. Mol Phylogenet Evol 2009; 52:284-92. [DOI: 10.1016/j.ympev.2009.03.035] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Revised: 03/17/2009] [Accepted: 03/27/2009] [Indexed: 11/22/2022]
|
11
|
Abstract
It has been known for more than 35 years that, during evolution, new proteins are formed by gene duplications, sequence and structural divergence and, in many cases, gene combinations. The genome projects have produced complete, or almost complete, descriptions of the protein repertoires of over 600 distinct organisms. Analyses of these data have dramatically increased our understanding of the formation of new proteins. At the present time, we can accurately trace the evolutionary relationships of about half the proteins found in most genomes, and it is these proteins that we discuss in the present review. Usually, the units of evolution are protein domains that are duplicated, diverge and form combinations. Small proteins contain one domain, and large proteins contain combinations of two or more domains. Domains descended from a common ancestor are clustered into superfamilies. In most genomes, the net growth of superfamily members means that more than 90% of domains are duplicates. In a section on domain duplications, we discuss the number of currently known superfamilies, their size and distribution, and superfamily expansions related to biological complexity and to specific lineages. In a section on divergence, we describe how sequences and structures diverge, the changes in stability produced by acceptable mutations, and the nature of functional divergence and selection. In a section on domain combinations, we discuss their general nature, the sequential order of domains, how combinations modify function, and the extraordinary variety of the domain combinations found in different genomes. We conclude with a brief note on other forms of protein evolution and speculations of the origins of the duplication, divergence and combination processes.
Collapse
|
12
|
Kim H, Sung S, Klein R. Expansion of symmetric exon-bordering domains does not explain evolution of lineage specific genes in mammals. Genetica 2006; 131:59-68. [PMID: 17082903 DOI: 10.1007/s10709-006-9113-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 09/26/2006] [Indexed: 10/24/2022]
Abstract
In order to examine the evolution of lineage specific genes, we analyzed intron phase distributions and exon-bordering domains in primate and rodent specific genes. We found that the expansion of symmetric exon-bordering domains could not explain the evolution of lineage specific genes. Rather internal intron loss of a domain can partially explain the excess of class 1-1 intron phases in the lineage specific genes. We suggest the event that led to excess of symmetric exons in lineage specific genes had little bearing on shaping the phenotypes specific to the individual lineage. Instead, Kruppel-associated box (KRAB) proteins associated with zinc finger C2H2 (zf-C2H2) type are likely to be responsible for the lineage specific function.
Collapse
Affiliation(s)
- Heebal Kim
- Laboratory of Bioinformtics and Population Genetics, Department of Agricultural Biotechnology, Seoul National University, San 56-1, Sillim-dong, Gwanak-gu, Seoul 151-742, Korea.
| | | | | |
Collapse
|
13
|
Froy O. Convergent evolution of invertebrate defensins and nematode antibacterial factors. Trends Microbiol 2005; 13:314-9. [PMID: 15914006 DOI: 10.1016/j.tim.2005.05.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2005] [Revised: 04/20/2005] [Accepted: 05/06/2005] [Indexed: 10/25/2022]
Abstract
Antibacterial factors (ABFs) are secreted polypeptides that have an important role in the innate immune system of nematodes. Comparison of these polypeptides revealed similarity in bioactivity, protein sequence and 3D structure, suggesting that they originated from a common ancestor. Comparison of gene organization of nematode ABF genes revealed that all except one contain a Phase 0 intron at a conserved location. The intron phase and location are congruent with the postulated intron gain rules, suggesting a gain of intron before duplication and divergence of the ancestral gene. Although nematode ABFs display similarity in activity and structure to invertebrate (arthropod and mollusk) defensins, lack of sequence similarity and the different genomic organization suggest that these two polypeptide families exhibit convergent evolution.
Collapse
Affiliation(s)
- Oren Froy
- Institute of Biochemistry, Food Science and Nutrition, Faculty of Agricultural, Food and Environmental Quality, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot 76100, Israel.
| |
Collapse
|
14
|
Haenisch C, Diekmann H, Klinger M, Gennarini G, Kuwada JY, Stuermer CAO. The neuronal growth and regeneration associated Cntn1 (F3/F11/Contactin) gene is duplicated in fish: expression during development and retinal axon regeneration. Mol Cell Neurosci 2005; 28:361-74. [PMID: 15691716 DOI: 10.1016/j.mcn.2004.04.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2004] [Revised: 04/05/2004] [Accepted: 04/08/2004] [Indexed: 01/06/2023] Open
Abstract
The Cntn1 (Contactin/F3/F11) cell adhesion molecule is involved in axon growth and guidance, fasciculation, synapse formation, and myelination in birds and mammals. We identified Cntn1 genes in goldfish, zebrafish, and fugu, and provide evidence for a fish-specific duplication leading to Cntn1a and Cntn1b. Our analyses suggest a subfunctionalization for the Cntn1 paralogs in zebrafish compared to other vertebrates which have a single Cntn1 gene. Similar to Cntn1a, Cntn1b transcripts are found in subsets of sensory and motor neurons. However, Cntn1b is detected later and more restricted than Cntn1a. This spatio-temporal expression pattern of the two zebrafish Cntn1 paralogs suggests functions related to those of mammalian Cntn1. In adult goldfish, Cntn1b is expressed in oligodendrocytes and is upregulated in retinal ganglion cells after optic nerve transection, which is consistent with an additional role during regeneration.
Collapse
|
15
|
Abstract
Arthropod and mollusk defensins are secreted antibacterial proteins that exhibit similarity in sequence, mode of action and structure and are expressed ubiquitously. Comparison of the gene organization of a newly cloned scorpion defensin gene, with that of other arthropods and the mussel, revealed that all exons and introns, aside from the exon encoding the mature protein, differ widely in number, size and sequence. This variability suggests that the exon encoding the mature defensin has undergone exon-shuffling and integrated downstream of unrelated leader sequences during evolution. Unlike other exon-shuffling events, in which modules are added into existing proteins, arthropod and mollusk defensins represent the first instance of exon-shuffling of autonomous modules.
Collapse
Affiliation(s)
- Oren Froy
- Institute of Biochemistry, Food Science and Nutrition, Faculty of Agricultural, Food and Environmental Quality, The Hebrew University of Jerusalem, PO Box 12, Rehovot 76100, Israel.
| | | |
Collapse
|
16
|
Katju V, Lynch M. The Structure and Early Evolution of Recently Arisen Gene Duplicates in theCaenorhabditis elegansGenome. Genetics 2003; 165:1793-803. [PMID: 14704166 PMCID: PMC1462873 DOI: 10.1093/genetics/165.4.1793] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
AbstractThe significance of gene duplication in provisioning raw materials for the evolution of genomic diversity is widely recognized, but the early evolutionary dynamics of duplicate genes remain obscure. To elucidate the structural characteristics of newly arisen gene duplicates at infancy and their subsequent evolutionary properties, we analyzed gene pairs with ≤10% divergence at synonymous sites within the genome of Caenorhabditis elegans. Structural heterogeneity between duplicate copies is present very early in their evolutionary history and is maintained over longer evolutionary timescales, suggesting that duplications across gene boundaries in conjunction with shuffling events have at least as much potential to contribute to long-term evolution as do fully redundant (complete) duplicates. The median duplication span of 1.4 kb falls short of the average gene length in C. elegans (2.5 kb), suggesting that partial gene duplications are frequent. Most gene duplicates reside close to the parent copy at inception, often as tandem inverted loci, and appear to disperse in the genome as they age, as a result of reduced survivorship of duplicates located in proximity to the ancestral copy. We propose that illegitimate recombination events leading to inverted duplications play a disproportionately large role in gene duplication within this genome in comparison with other mechanisms.
Collapse
Affiliation(s)
- Vaishali Katju
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA.
| | | |
Collapse
|
17
|
Gauci C, Lightowlers MW. Molecular cloning of genes encoding oncosphere proteins reveals conservation of modular protein structure in cestode antigens. Mol Biochem Parasitol 2003; 127:193-8. [PMID: 12672528 DOI: 10.1016/s0166-6851(03)00005-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recombinant oncosphere antigens have been found to be remarkably effective when used as vaccines against cysticercosis and hydatid disease. Comparison of the structural features of these proteins and their associated genes suggest common features between antigens. Here molecular cloning is used to complete comparisons of Taenia solium, Taenia saginata, Taenia ovis, Echinococcus granulosus and Echinococcus multilocularis oncosphere antigens and genes. The exon/intron structure of genes cloned from T. solium and T. ovis genomic DNA (tsol16 and to16, respectively) in this study was found to be highly conserved. Two closely related tsol16 genes were cloned from the T. solium genome. Their corresponding transcripts were cloned from T. solium oncospheres and a comparison of their deduced amino-acid sequence with that of the protein encoded by to16 indicates that these proteins are the most highly conserved oncosphere proteins identified so far. Cloning of another gene from T. solium (designated tsol18) and comparison with the homologous gene of T. saginata (tsa18) also revealed substantial conservation of gene structure. Comparisons of the genes cloned in this study with genes encoding oncosphere antigens from other taeniid cestodes identified striking conservation of exon structure. The highly conserved regions of the genes encode a putative secretory signal and fibronectin type III domain in each of the oncosphere proteins. The location of exon boundaries in relation to protein features identifies a clear modular structure among all members of these oncosphere antigens. Identification of structural conservation of genes encoding antigenic proteins across several taeniid species suggests that the encoded proteins play important roles in host infection and parasite survival.
Collapse
Affiliation(s)
- Charles Gauci
- Molecular Parasitology Laboratory, The University of Melbourne, Princes Highway, Werribee, 3030, Vic., Australia.
| | | |
Collapse
|
18
|
|
19
|
Abstract
Contradictory evidence surrounds the claim that sperm cells are able to introduce exogenous DNA into the oocyte at the time of fertilisation. Although strong natural barriers exist against sperm-mediated gene transfer, such barriers are unlikely to be absolutely inviolable. If sperm cells can act as vectors for exogenous DNA, it follows that the genome of sexually reproducing animals may be subject to alteration by exogenous DNA sequences carried by sperm cells. At present there are insufficient data to permit quantification of the rate of sperm-mediated gene transfer. The implications of sperm-mediated gene transfer are significant and include evolutionary effects on the mammalian genome and pathologies in humans from de novo mutations. Despite the absence of firm data, geneticists would be wise to be vigilant to the potential consequences of sperm-mediated gene transfer.
Collapse
Affiliation(s)
- Kevin R Smith
- Division of Molecular and Life Sciences, School of Science and Engineering, University of Abertay Dundee, Dundee, UK
| |
Collapse
|
20
|
Ponting CP, Russell RR. The natural history of protein domains. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2002; 31:45-71. [PMID: 11988462 DOI: 10.1146/annurev.biophys.31.082901.134314] [Citation(s) in RCA: 193] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome sequencing and structural genomics projects are providing new insights into the evolutionary history ofprote in domains. As methods for sequence and structure comparison improve, more distantly related domains are shown to be homologous. Thus there is a need for domain families to be classified within a hierarchy similar to Linnaeus' Systema Naturae, the classification of species. With such a hierarchy in mind, we discuss the evolution of domains, their combination into proteins, and evidence as to the likely origin of protein domains. We also discuss when and how analysis of domains can be used to understand details of protein function. Unconventional features of domain evolution such as intragenomic competition, domain insertion, horizontal gene transfer, and convergent evolution are seen as analogs of organismal evolutionary events. These parallels illustrate how the concept of domains can be applied to provide insights into evolutionary biology.
Collapse
Affiliation(s)
- Chris P Ponting
- Department of Human Anatomy and Genetics, University of Oxford, MRC Functional Genetics Unit, South Parks Road, Oxford OX1 3QX, UK.
| | | |
Collapse
|
21
|
Hoffmann W, Jagla W. Cell type specific expression of secretory TFF peptides: colocalization with mucins and synthesis in the brain. INTERNATIONAL REVIEW OF CYTOLOGY 2002; 213:147-81. [PMID: 11837892 DOI: 10.1016/s0074-7696(02)13014-2] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The "TFF domain" is an ancient cysteine-rich shuffled module forming the basic unit for the family of secretory TFF peptides (formerly P-domain peptides and trefoil factors). It is also an integral component of mosaic proteins associated with mucous surfaces. Three mammalian TFF peptides are known (i.e., TFF1-TFF3); however, in Xenopus laevis the pattern is more complex (xP1, xP4.1, xP4.2, and xP2). TFF peptides are typical secretory products of a variety of mucin-producing epithelial cells (e.g., the conjunctiva, the salivary glands, the gastrointestinal tract, the respiratory tract, and the uterus). Each TFF peptide shows an unique expression pattern and different mucin-producing cells are characterized by their specific TFF peptide/secretory mucin combinations. TFF peptides have a pivotal role in maintaining the surface integrity of mucous epithelia in vivo. They are typical constituents of mucus gels, they modulate rapid mucosal repair ("restitution") by their motogenic and their cell scattering activity, they have antiapoptotic effects, and they probably modulate inflammatory processes. Pathological expression of TFF peptides occurs as a result of chronic inflammatory diseases or certain tumors. TFF peptides are also found in the central nervous system, at least in mammals. In particular, TFF3 is synthesized from oxytocinergic neurons of the hypothalamus and is released from the posterior pituitary into the bloodstream.
Collapse
Affiliation(s)
- Werner Hoffmann
- Institut für Molekularbiologie und Medizinische Chemie, Otto-von-Guericke-Universität, Magdeburg, Germany
| | | |
Collapse
|
22
|
Gough J, Karplus K, Hughey R, Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 2001; 313:903-19. [PMID: 11697912 DOI: 10.1006/jmbi.2001.5080] [Citation(s) in RCA: 854] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Of the sequence comparison methods, profile-based methods perform with greater selectively than those that use pairwise comparisons. Of the profile methods, hidden Markov models (HMMs) are apparently the best. The first part of this paper describes calculations that (i) improve the performance of HMMs and (ii) determine a good procedure for creating HMMs for sequences of proteins of known structure. For a family of related proteins, more homologues are detected using multiple models built from diverse single seed sequences than from one model built from a good alignment of those sequences. A new procedure is described for detecting and correcting those errors that arise at the model-building stage of the procedure. These two improvements greatly increase selectivity and coverage. The second part of the paper describes the construction of a library of HMMs, called SUPERFAMILY, that represent essentially all proteins of known structure. The sequences of the domains in proteins of known structure, that have identities less than 95 %, are used as seeds to build the models. Using the current data, this gives a library with 4894 models. The third part of the paper describes the use of the SUPERFAMILY model library to annotate the sequences of over 50 genomes. The models match twice as many target sequences as are matched by pairwise sequence comparison methods. For each genome, close to half of the sequences are matched in all or in part and, overall, the matches cover 35 % of eukaryotic genomes and 45 % of bacterial genomes. On average roughly 15% of genome sequences are labelled as being hypothetical yet homologous to proteins of known structure. The annotations derived from these matches are available from a public web server at: http://stash.mrc-lmb.cam.ac.uk/SUPERFAMILY. This server also enables users to match their own sequences against the SUPERFAMILY model library.
Collapse
Affiliation(s)
- J Gough
- Laboratory of Molecular Biology, MRC, Hills Road, Cambridge, CB2 2QH, UK.
| | | | | | | |
Collapse
|
23
|
Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol 2001; 310:311-25. [PMID: 11428892 DOI: 10.1006/jmbi.2001.4776] [Citation(s) in RCA: 344] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
There is a limited repertoire of domain families that are duplicated and combined in different ways to form the set of proteins in a genome. Proteins are gene products, and at the level of genes, duplication, recombination, fusion and fission are the processes that produce new genes. We attempt to gain an overview of these processes by studying the evolutionary units in proteins, domains, in the protein sequences of 40 genomes. The domain and superfamily definitions in the Structural Classification of Proteins Database are used, so that we can view all pairs of adjacent domains in genome sequences in terms of their superfamily combinations. We find 783 out of the 859 superfamilies in SCOP in these genomes, and the 783 families occur in 1307 pairwise combinations. Most families are observed in combination with one or two other families, while a few families are very versatile in their combinatorial behaviour; 209 families do not make combinations with other families. This type of pattern can be described as a scale-free network. We also study the N to C-terminal orientation of domain pairs and domain repeats. The phylogenetic distribution of domain combinations is surveyed, to establish the extent of common and kingdom-specific combinations. Of the kingdom-specific combinations, significantly more combinations consist of families present in all three kingdoms than of families present in one or two kingdoms. Hence, we are led to conclude that recombination between common families, as compared to the invention of new families and recombination among these, has also been a major contribution to the evolution of kingdom-specific and species-specific functions in organisms in all three kingdoms. Finally, we compare the set of the domain combinations in the genomes to those in the RCSB Protein Data Bank, and discuss the implications for structural genomics.
Collapse
Affiliation(s)
- G Apic
- Laboratory of Molecular Biology, MRC, Hills Road, Cambridge, CB2 2QH, UK.
| | | | | |
Collapse
|
24
|
Abstract
Evolution of eukaryotes is mediated by sexual recombination of parental genomes. Crossovers occur in random, but homologous, positions at a frequency that depends on DNA length. As exons occupy only 1% of the human genome and introns about 24%, by far most of the crossovers occur between exons, rather than inside. The natural process of creating new combinations of exons by intronic recombination is called exon shuffling. Our group is developing in vitro formats for exon shuffling and applying these to the directed evolution of proteins. Based on the splice frame junctions, nine classes of exons and three classes of introns can be distinguished. Splice frame diagrams of natural genes show how the splice frame rules govern exon shuffling. Here, we review various approaches to constructing libraries of exon-shuffled genes. For example, exon shuffling of human pharmaceutical proteins can generate libraries in which all of the sequences are fully human, without the point mutations that raise concerns about immunogenicity.
Collapse
Affiliation(s)
- J A Kolkman
- Maxygen Inc., 515 Galveston Drive, Redwood City, CA 94063, USA
| | | |
Collapse
|
25
|
Abstract
A theoretical method is proposed to identify structural domains in proteins of known structures. It is based on the distribution of the local axes of the polypeptide chain. In particular, a statistical analysis is applied to the contributions of the local axes to the absolute writhing number, a topological property of a space curve resulting from the number of self-crossings in the curve projections onto a unit sphere. This finding supports the hypothesis that topological requirements should be satisfied in the process of protein folding and in the final organization of the tertiary structures.
Collapse
Affiliation(s)
- C Anselmi
- Dipartimento di Chimica, Università La Sapienza, Ple A. Moro 5, I-00185 Roma, Italy
| | | | | | | |
Collapse
|
26
|
|
27
|
Bowles J, Schepers G, Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators. Dev Biol 2000; 227:239-55. [PMID: 11071752 DOI: 10.1006/dbio.2000.9883] [Citation(s) in RCA: 693] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Members of the SOX family of transcription factors are found throughout the animal kingdom, are characterized by the presence of a DNA-binding HMG domain, and are involved in a diverse range of developmental processes. Previous attempts to group SOX genes and deduce their structural, functional, and evolutionary relationships have relied largely on complete or partial HMG box sequence of a limited number of genes. In this study, we have used complete HMG domain sequence, full-length protein structure, and gene organization data to study the pattern of evolution within the family. For the first time, a substantial number of invertebrate SOX sequences have been included in the analysis. We find support for subdivision of the family into groups A-H, as has been suggested in some previous studies, and for the assignment of two new groups, I and J. For vertebrate genes, it appears that relatedness as suggested by HMG domain sequence is congruent with relatedness as indicated by overall structure of the full-length protein and intron-exon structure of the genes. Most of the SOX groups identified in vertebrates were represented by a single SOX sequence in each invertebrate species studied. We have named anonymous sequences and, where appropriate, have suggested systematic names for some previously identified sequences. In addition, we identify an HMG domain signature motif which may be considered representative of the SOX family. Based on our data, we propose a robust phylogeny of SOX genes that reflects their evolutionary history in metazoans.
Collapse
Affiliation(s)
- J Bowles
- Institute for Molecular Bioscience, University of Queensland, Brisbane, 4072, Australia
| | | | | |
Collapse
|
28
|
Abstract
We present here HOBACGEN, a database system devoted to comparative genomics in bacteria. HOBACGEN contains all available protein genes from bacteria, archaea, and yeast, taken from SWISS-PROT/TrEMBL and classified into families. It also includes multiple alignments and phylogenetic trees built from these families. The database is organized under a client/server architecture with a client written in Java, which may run on any platform. This client integrates a graphical interface allowing users to select families according to various criteria and notably to select homologs common to a given set of taxa. This interface also allows users to visualize multiple alignments and trees associated to families. In tree displays, protein gene names are colored according to the taxonomy of the corresponding organisms. Users may access all information associated to sequences and multiple alignments by clicking on genes. This graphic tool thus gives a rapid and simple access to all data required to interpret homology relationships between genes and distinguish orthologs from paralogs. Instructions for installation of the client or the server are available at http://pbil.univ-lyon1. fr/databases/hobacgen.html.
Collapse
Affiliation(s)
- G Perrière
- Laboratoire de Biométrie et Biologie Evolutive, Unité Mixte de Recherche Centre National de la Recherche Scientifique (UMR CNRS) n( degrees ). 5558, Université Claude Bernard-Lyon 1, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
29
|
Berezovsky IN, Grosberg AY, Trifonov EN. Closed loops of nearly standard size: common basic element of protein structure. FEBS Lett 2000; 466:283-6. [PMID: 10682844 DOI: 10.1016/s0014-5793(00)01091-7] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
By screening the crystal protein structure database for close Calpha-Calpha contacts, a size distribution of the closed loops is generated. The distribution reveals a maximum at 27+/-5 residues, the same for eukaryotic and prokaryotic proteins. This is apparently a consequence of polymer statistic properties of protein chain trajectory. That is, closure into the loops depends on the flexibility (persistence length) of the chain. The observed preferential loop size is consistent with the theoretical optimal loop closure size. The mapping of the detected unit-size loops on the sequences of major typical folds reveals an almost regular compact consecutive arrangement of the loops. Thus, a novel basic element of protein architecture is discovered; structurally diverse closed loops of the particular size.
Collapse
Affiliation(s)
- I N Berezovsky
- Department of Structural Biology, The Weizmann Institute of Science, Rehovot, Israel.
| | | | | |
Collapse
|
30
|
Abstract
Recent studies on the genomes of protists, plants, fungi and animals confirm that the increase in genome size and gene number in different eukaryotic lineages is paralleled by a general decrease in genome compactness and an increase in the number and size of introns. It may thus be predicted that exon-shuffling has become increasingly significant with the evolution of larger, less compact genomes. To test the validity of this prediction, we have analyzed the evolutionary distribution of modular proteins that have clearly evolved by intronic recombination. The results of this analysis indicate that modular multidomain proteins produced by exon-shuffling are restricted in their evolutionary distribution. Although such proteins are present in all major groups of metazoa from sponges to chordates, there is practically no evidence for the presence of related modular proteins in other groups of eukaryotes. The biological significance of this difference in the composition of the proteomes of animals, fungi, plants and protists is best appreciated when these modular proteins are classified with respect to their biological function. The majority of these proteins can be assigned to functional categories that are inextricably linked to multicellularity of animals, and are of absolute importance in permitting animals to function in an integrated fashion: constituents of the extracellular matrix, proteases involved in tissue remodelling processes, various proteins of body fluids, membrane-associated proteins mediating cell-cell and cell-matrix interactions, membrane associated receptor proteins regulating cell cell communications, etc. Although some basic types of modular proteins seem to be shared by all major groups of metazoa, there are also groups of modular proteins that appear to be restricted to certain evolutionary lineages. In summary, the results suggest that exon-shuffling acquired major significance at the time of metazoan radiation. It is interesting to note that the rise of exon-shuffling coincides with a spectacular burst of evolutionary creativity: the Big Bang of metazoan radiation. It seems probable that modular protein evolution by exon-shuffling has contributed significantly to this accelerated evolution of metazoa, since it facilitated the rapid construction of multidomain extracellular and cell surface proteins that are indispensable for multicellularity.
Collapse
Affiliation(s)
- L Patthy
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest.
| |
Collapse
|
31
|
Müller WE, Kruse M, Blumbach B, Skorokhod A, Müller IM. Gene structure and function of tyrosine kinases in the marine sponge Geodia cydonium: autapomorphic characters in Metazoa. Gene 1999; 238:179-93. [PMID: 10570996 DOI: 10.1016/s0378-1119(99)00226-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Porifera (sponges) represent the most ancient, extant metazoan phylum. They existed already prior to the 'Cambrian Explosion'. Based on the analysis of aa sequences of informative proteins, it is highly likely that all metazoan phyla evolved from only one common ancestor (monophyletic origin). As 'autapomorphic' proteins which are restricted to Metazoa only, integrin receptors, receptors with scavenger receptor cysteine-rich repeats, neuronal-like receptors and protein-tyrosine kinases (PTKs) have been identified in Porifera. From the marine sponge Geodia cydonium, a receptor tyrosine kinase (RTK) has been cloned that comprises the characteristic structural topology known from other metazoan RTKs; an extracellular domain, the transmembrane region, the juxtamembrane region and the TK domain. Only two introns, within the coding region of the RTK gene, could be found, which separate the two highly polymorphic immunoglobulin-like domains, found in the extracellular region of the enzyme. The functional role of this sponge RTK could be demonstrated both in situ (grafting experiments) and in vitro (increase of intracellular Ca2+ level). Upstream of this RTK gene, two further genes coding for tyrosine kinases (TK) have been identified. Both are intron-free. The deduced aa sequence of the first gene shows no transmembrane segment; from the second gene--so far--only half of its catalytic domain is known. A phylogenetic analysis with the TK domains from these sequences and a fourth, from a novel scavenger RTK (all domains comprise the signature for the TK class II receptors), showed that they are distantly related to the insulin and insulin-like receptors. The presented findings support the 'introns-late' hypothesis for such genes that encode 'metazoan' proteins. It is proposed that the TKs evolved from protein-serine/threonine kinases through modularization and subsequent exon shuffling. After formation of the ancestral TKs, the modules lost the framing introns to protect the evolutionary novelty. Since cell culture systems of sponges are now available, it can be expected that soon also those mechanisms that control the developmental programs will be unravelled.
Collapse
Affiliation(s)
- W E Müller
- Institut für Physiologische Chemie, Abteilung Angewandte Molekularbiologie, Universität, Mainz, Germany.
| | | | | | | | | |
Collapse
|
32
|
Bányai L, Patthy L. The NTR module: domains of netrins, secreted frizzled related proteins, and type I procollagen C-proteinase enhancer protein are homologous with tissue inhibitors of metalloproteases. Protein Sci 1999; 8:1636-42. [PMID: 10452607 PMCID: PMC2144412 DOI: 10.1110/ps.8.8.1636] [Citation(s) in RCA: 137] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Using homology search, structure prediction, and structural characterization methods we show that the C-terminal domains of (1) netrins, (2) complement proteins C3, C4, C5, (3) secreted frizzled-related proteins, and (4) type I procollagen C-proteinase enhancer proteins (PCOLCEs) are homologous with the N-terminal domains of (5) tissue inhibitors of metalloproteinases (TIMPs). The proteins harboring this netrin module (NTR module) fulfill diverse biological roles ranging from axon guidance, regulation of Wnt signaling, to the control of the activity of metalloproteases. With the exception of TIMPs, it is not known at present what role the NTR modules play in these processes. In view of the fact that the NTR modules of TIMPs are involved in the inhibition of matrixin-type metalloproteases and that the NTR module of PCOLCEs is involved in the control of the activity of the astacin-type metalloprotease BMP1, it seems possible that interaction with metzincins could be a shared property of NTR modules and could be critical for the biological roles of the host proteins.
Collapse
Affiliation(s)
- L Bányai
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest
| | | |
Collapse
|
33
|
Berezovsky IN, Namiot VA, Tumanyan VG, Esipova NG. Hierarchy of the interaction energy distribution in the spatial structure of globular proteins and the problem of domain definition. J Biomol Struct Dyn 1999; 17:133-55. [PMID: 10496428 DOI: 10.1080/07391102.1999.10508347] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
An algorithm for determining of protein domain structure is proposed. Domain structures resulted from the algorithm application have been obtained and compared with available data. The method is based on entirely physical model of van der Waals interactions that reflects as illustrated in this work the distribution of electron density. Various levels of hierarchy in the protein spatial structure are discerned by analysis of the energy interaction between structural units of different scales. Thus the level of energy hierarchy plays role of sole parameter, and the method obviates the use of complicated geometrical criteria with numerous fitting parameters. The algorithm readily and accurately locates domains formed by continuous segments of the protein chain as well as those comprising non-sequential segments, sets no limit to the number of segments in a domain. We have analyzed 309 protein structures. Among 277 structures for which our results could be compared with the domain definitions made in other works, 243 showed complete or partial coincidence, and only in 34 cases the domain structures proved substantially different. The domains delineated with our approach may coincide with reference definition at different levels of the globule hierarchy. Along with defining the domain structure, our approach allows one to consider the protein spatial structure in terms of the spatial distribution of the interaction energy in order to establish the correspondence between the hierarchy of energy distribution and the hierarchy of structural elements.
Collapse
Affiliation(s)
- I N Berezovsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow.
| | | | | | | |
Collapse
|
34
|
Seack J, Kruse M, Müller IM, Müller WE. Promoter and exon-intron structure of the protein kinase C gene from the marine sponge Geodia cydonium: evolutionary considerations and promoter activity. BIOCHIMICA ET BIOPHYSICA ACTA 1999; 1444:241-53. [PMID: 10023072 DOI: 10.1016/s0167-4781(98)00275-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
We report the gene structure of a key signaling molecule from a marine sponge, Geodia cydonium. The selected gene, which codes for a classical protein kinase C (cPKC), comprises 13 exons and 12 introns; the introns are, in contrast to those found in cPKC from higher Metazoa, small in size ranging from 93 nt to 359 nt. The complete gene has a length of 4229 nt and contains exons which encode the characteristic putative regulatory and catalytic domains of metazoan cPKCs. While in the regulatory domain only one intron is in phase 0, in the catalytic domain most introns are phase 0 introns, suggesting that the latter only rarely undergo module duplication. The 5'-flanking sequence of the sponge cPKC gene contains a TATA-box like motif which is located 35-26 nt upstream from the start of the longest sequenced cDNA. This 5'-flanking sequence was analyzed for promoter activity. The longest fragment (538 nt) was able to drive the expression of luciferase in transient transfections of NIH 3T3 fibroblasts; the strong activity of the sponge promoter was found to be half the one displayed by the SV40 reference promoter. Deletion analysis demonstrates that the AP4 site and the GC box which is most adjacent to the TATA box are the crucial elements for maximal promoter activity. The activity of the promoter is not changed in 3T3 cells which are kept serum starved or in the presence of a phorbol ester. In conclusion, these data present the phylogenetically oldest cPKC gene which contains in the 5'-flanking region a promoter functional in the heterologous mammalian cell system.
Collapse
Affiliation(s)
- J Seack
- Institut für Physiologische Chemie, Abteilung Angewandte Molekularbiologie, Johannes Gutenberg-Universität, Duesbergweg 6, D-55099, Mainz, Germany
| | | | | | | |
Collapse
|
35
|
Gaboriaud C, Rossi V, Fontecilla-Camps JC, Arlaud GJ. Evolutionary conserved rigid module-domain interactions can be detected at the sequence level: the examples of complement and blood coagulation proteases. J Mol Biol 1998; 282:459-70. [PMID: 9735300 DOI: 10.1006/jmbi.1998.2008] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Several extracellular modular proteins, including proteases of the complement and blood coagulation cascades, are shown here to exhibit conserved sequence patterns specific for a particular module-domain association. This was detected by comparative analysis of sequence variability in different multiple sequence alignments, which provides a new tool to investigate the evolution of modular proteins. A first example deals with the proteins featuring a common complement control protein (CCP) module-serine protease (SP) domain pattern at their C-terminal end, defined here as the CCP-SP sub-family. These proteins include the complement proteases C1r, C1s and MASPs, the Limulus clotting factor C, and the proteins of the haptoglobin family. A second example deals with blood coagulation factors VII, IX and X and protein C, all featuring a common epidermal growth factor (EGF)-SP C-terminal assembly. Highly specific motifs are found at the connection between the CCP or EGF module and the activation peptide of the SP domain: [P/A]-x-C-x-[P/A]-[I/V]-C-G-x-[P/S/K] in the case of the CCP-SP proteins, and C-x-[P/S]-x-x-x-[Y/F]-P-C-G in the case of the EGF-SP proteins. Each motif is strictly conserved in the whole sub-family and it is detected in no more than one other known protein sequence. Strikingly, most of the conserved residues specific to each sub-family appear to be clustered at the interface between the SP domain and the CCP or EGF module. We propose that a rigid module-domain interaction occurs in these proteins and has been conserved through evolution. The functional implications of these assemblies, underlined by such evolutionary constraints, are discussed.
Collapse
Affiliation(s)
- C Gaboriaud
- Laboratoire de Cristallogenèse et Cristallographie des Protéines. Institut de Biologie Structuralel, 38027 Grenoble Cedex 1, France.
| | | | | | | |
Collapse
|
36
|
Intron-exon structures. ACTA ACUST UNITED AC 1998. [DOI: 10.1016/s1067-5701(98)80020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
37
|
The Atypical Serine Proteases of the Complement System**Received for publication on October 7, 1997. Adv Immunol 1998. [DOI: 10.1016/s0065-2776(08)60609-4] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
38
|
Müller WE, Müller IM. Transition from protozoa to metazoa: an experimental approach. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 1998; 19:1-22. [PMID: 15898186 DOI: 10.1007/978-3-642-48745-3_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Affiliation(s)
- W E Müller
- Institut für Physiologische Chemie, Johannes Gutenberg-Universität, Abteilung Angewandte Molekularbiologie, Duesbergweg 6, 55099 Mainz, Germany
| | | |
Collapse
|
39
|
Villoutreix BO, García de Frutos P, Lövenklev M, Linse S, Fernlund P, Dahlbäck B. SHBG region of the anticoagulant cofactor protein S: Secondary structure prediction, circular dichroism spectroscopy, and analysis of naturally occurring mutations. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(199712)29:4<478::aid-prot8>3.0.co;2-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
40
|
Bakker H, Van Tetering A, Agterberg M, Smit AB, Van den Eijnden DH, Van Die I. Deletion of two exons from the Lymnaea stagnalis beta1-->4-N-acetylglucosaminyltransferase gene elevates the kinetic efficiency of the encoded enzyme for both UDP-sugar donor and acceptor substrates. J Biol Chem 1997; 272:18580-5. [PMID: 9228024 DOI: 10.1074/jbc.272.30.18580] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Lymnaea stagnalis UDP-GlcNAc:GlcNAcbeta-R beta1-->4-N-acetylglucosaminyltransferase (beta4-GlcNAcT) is an enzyme with structural similarity to mammalian UDP-Gal:GlcNAcbeta-R beta1-->4-galactosyltransferase (beta4-GalT). Here, we report that also the exon organization of the genes encoding these enzymes is very similar. The beta4-GlcNAcT gene (12.5 kilobase pairs, spanning 10 exons) contains four exons, encompassing sequences that are absent in the beta4-GalT gene. Two of these exons (exons 7 and 8) show a high sequence similarity to part of the preceding exon (exon 6), suggesting that they have originated by exon duplication. The exon in the beta4-GalT gene, corresponding to beta4-GlcNAcT exon 6, encodes a region that has been proposed to be involved in the binding of UDP-Gal. The question therefore arose, whether the repeating sequences encoded by exon 7 and 8 of the beta4-GlcNAcT gene would determine the specificity of the enzyme for UDP-GlcNAc, or for the less preferred UDP-GalNAc. It was found that deletion of only the sequence encoded by exon 8 resulted in a completely inactive enzyme. By contrast, deletion of the amino acid residues encoded by exons 7 and 8 resulted in an enzyme with an elevated kinetic efficiency for both UDP-sugar donors, as well as for its acceptor substrates. These results suggest that at least part of the donor and acceptor binding domains of the beta4-GlcNAcT are structurally linked and that the region encompassing the insertion contributes to acceptor recognition as well as to UDP-sugar binding and specificity.
Collapse
Affiliation(s)
- H Bakker
- Department of Medical Chemistry, Vrije Universiteit, 1081 BT Amsterdam, The Netherlands
| | | | | | | | | | | |
Collapse
|
41
|
Plagge A, Brümmendorf T. The gene of the neural cell recognition molecule F11: conserved exon-intron arrangement in genes of neural members of the immunoglobulin superfamily. Gene 1997; 192:215-25. [PMID: 9224893 DOI: 10.1016/s0378-1119(97)00066-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The chicken neural glycoprotein F11 is a cell recognition molecule implicated in neurohistogenesis, in particular in the context of neurite outgrowth and fasciculation. F11 is a glycosyl-phosphatidylinositol-linked member of the immunoglobulin superfamily that is also termed contactin or F3 in humans and rodents, respectively. In this study, we report the complete structure of the F11 gene. It is composed of 23 exons distributed over more than 100 kb of genomic DNA and each of the ten domains of the F11 protein is encoded by two exons. The sizes of the introns vary by two orders of magnitude ranging from 150 bp to more than 15 kb. All interdomain introns are in phase one, i.e. are inserted after the first nucleotide of a codon, being consistent with assembly of a F11 progenitor gene via exon shuffling. The intradomain introns are localized at variable sites within the domains and have different intron phases. This study reveals a remarkable similarity of the F11 gene with the gene of axonin-1, a related neural immunoglobulin superfamily member which is also implicated in neurite outgrowth and fasciculation. The intron positions with respect to the protein domain organization are found to be identical, strongly suggesting that both genes are derived from a common ancestor that already had this exon-intron structure.
Collapse
Affiliation(s)
- A Plagge
- Max-Planck-Institut für Entwicklungsbiologie, Tübingen, Germany
| | | |
Collapse
|
42
|
Johansson MU, de Château M, Wikström M, Forsén S, Drakenberg T, Björck L. Solution structure of the albumin-binding GA module: a versatile bacterial protein domain. J Mol Biol 1997; 266:859-65. [PMID: 9086265 DOI: 10.1006/jmbi.1996.0856] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The albumin-binding GA module is found in a family of surface proteins of different bacterial species. It comprises 45 amino acid residues and represents the first known example of contemporary module shuffling. Using 1H NMR spectroscopy we have determined the solution structure of the GA module from protein PAB, a protein of the anaerobic human commensal and pathogen Peptostreptococcus magnus. This structure, the first three-dimensional structure of an albumin-binding protein domain described, was shown to be composed of a left-handed three-helix-bundle. Sequence differences between GA modules with different affinities for albumin indicated that a conserved region in the C-terminal part of the second helix and the flexible sequence between helices 2 and 3 could contribute to the albumin-binding activity. The effect on backbone amide proton exchange rates upon binding to albumin support this assumption. The GA module has a fold that is strikingly similar to the immunoglobulin-binding domains of staphylococcal protein A but it shows no resemblance to the fold shared by the immunoglobulin-binding domains of streptococcal protein G and peptostreptococcal protein L. When the gene sequences, binding properties and thermal stability of these four domains are analysed in relation to their global folds an evolutionary pattern emerges. Thus, in the evolution of novel binding properties mutations are allowed only as long as the energetically favourable global fold is maintained.
Collapse
Affiliation(s)
- M U Johansson
- Department of Physical Chemistry, Lund University, Sweden
| | | | | | | | | | | |
Collapse
|
43
|
Abstract
Considerable advances have been made in our knowledge of the molecular structure of cell adhesion molecules, their binding sites, and adhesion complexes. For the cadherins, protein zero, and CD2, additional experimental data support the insights obtained from structural analysis of their domains and molecular models of their adhesion complexes. For neural cell adhesion molecules, L1, fibronectin, tenascin-C, integrins, and vascular cell adhesion molecules, the molecular structure of domains, and in most cases their binding sites, have been elucidated. The substrate recognition sites in some of these molecules possess rate constants for association and dissociation that permit both rapid cell migration and, through avidity, high-affinity cell-cell interactions.
Collapse
Affiliation(s)
- C Chothia
- MRC Laboratory of Molecular Biology, Cambridge, England
| | | |
Collapse
|
44
|
Abstract
Thanks to recent improvements in techniques used for the detection of homologies, it is now clear that module exchange played a major role in protein evolution. Analysis of the genes of various modular proteins has identified a large number of cases where gene assembly was facilitated by intronic recombination--i.e., the proteins were formed by exon shuffling. Studies of the principles and mechanistic details of exon shuffling, however, revealed that this powerful evolutionary mechanism could become significant only after the appearance of spliceosomal introns typical of higher eukaryotes. Although exon shuffling is the most efficient way of constructing modular proteins, recent studies on the evolution of multidomain proteins of prokaryotes emphasize that intronic recombination is not an absolute prerequisite of module exchange.
Collapse
Affiliation(s)
- L Patthy
- Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
45
|
Joba W, Hoffmann W. Alternative splicing of repetitive units is responsible for the polydispersities of integumentary mucin B.1 (FIM-B.1) from Xenopus laevis. Glycoconj J 1996; 13:735-40. [PMID: 8910000 DOI: 10.1007/bf00702337] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Frog integumentary mucin B.1 (FIM-B.1) represents a polymorphic extracellular mosaic protein which contains tandemly arranged serine/threonine-rich modules as well as cysteine-rich domains. The latter are probably important for oligomerization of FIM-B.1 and have also been found in many proteins of the complement cascade as well as regions homologous to von Willebrand factor. The repetitive modules are targets for extensive O-glycosylation. Previous cDNA cloning experiments clearly established polydispersities within the same individual, which originate from deletions/insertions in the repetitive domain. Here, we analyse part of the corresponding genomic region. Each repetitive unit as well as the cysteine-rich domain is encoded by an individual class 1-1 exon typical of shuffled modules. Alternative splicing of these multiple cassettes creates the polydisperse FIM-B.1 transcripts.
Collapse
Affiliation(s)
- W Joba
- Max-Planck-Institut für Psychiatrie, Abteilung Neurochemie, Martinsried, Germany
| | | |
Collapse
|
46
|
Molina F, Bouanani M, Pau B, Granier C. Characterization of the type-1 repeat from thyroglobulin, a cysteine-rich module found in proteins from different families. EUROPEAN JOURNAL OF BIOCHEMISTRY 1996; 240:125-33. [PMID: 8797845 DOI: 10.1111/j.1432-1033.1996.0125h.x] [Citation(s) in RCA: 70] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The amino acid sequence of human thyroglobulin is known to enclose cysteine-rich repetitive regions. In this study, we report the existence of an eleventh type-1 repeat within the human thyroglobulin sequence, and we characterize the thyroglobulin type-1 repeat as a protein module. The 11 thyroglobulin type-1 repeats possessed the same number of cysteine residues (six in type A, four in the two type B repeats), a fairly constant number of residues between cysteines and a conserved sequence pattern. By scanning protein sequence databases, 29 proteins belonging to six different families were found to enclose at least one, and up to three, thyroglobulin type-1 repeats in their sequence. Although the repeat was present in numerous proteins possessing binding properties, an examination of the information available in the literature showed that a direct role of the repeat in protein-protein interaction has rarely been assessed. A distance analysis of the sequences indicated that all repeats segregate into four clusters of phylogenically close sequences. A consensus sequence of type-1 repeats was derived from sequence similarity analysis; it comprised a central core of conserved residues including two highly conserved motifs, QC and CWCV. The type-1 repeat from thyroglobulin was found to differ from several previously described cysteine-rich modules, in particular from the epidermal-growth-factor-like module with which it has sometimes been confused. Therefore, our results provide a complete characterization of the repeats which will help in the detection of these repeats in newly characterized proteins, a necessary step for understanding the structural/biological role of this module.
Collapse
Affiliation(s)
- F Molina
- CNRS UMR 9921, Faculté de Pharmacie, Montpellier, France
| | | | | | | |
Collapse
|
47
|
Johnson RR, Jiang X, Burkhalter A. Regional and laminar differences in synaptic localization of NMDA receptor subunit NR1 splice variants in rat visual cortex and hippocampus. J Comp Neurol 1996; 368:335-55. [PMID: 8725343 DOI: 10.1002/(sici)1096-9861(19960506)368:3<335::aid-cne2>3.0.co;2-6] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Changes in N-methyl-D-aspartate (NMDA) receptor expression may represent a molecular substrate for differences in synaptic plasticity between early postnatal and adult brains (Fox and Zahs [1994] Curr. Opinion Neurobiol. 4:112-119). We have, therefore, examined the regional and laminar distribution of NR1, the essential subunit of the NMDA receptor, in two regions in which synaptic plasticity has been most thoroughly studied: primary visual cortex and hippocampus. To study NR1 expression at the light and electron microscopic levels we have used a new antiserum (NR1-C1; Sheng et al. [1994] Nature 368:144-147) directed against a differentially spliced C-terminal exon ("C1"). The most striking result was that the pattern of NR1-C1 labeling in the adult was more restricted than that of previously published NR1-specific antibodies. Specifically, NR1-C1 did not label cells in the CA3, dentate gyrus or subicular regions of the hippocampus or in layer 4 of the visual cortex. Quantitative ultrastructural analysis revealed that these differences were paralleled by differential expression of NR1-C1 at synapses. In sharp contrast to the pattern in the adult, NR1-C1 immunoreactivity was distributed more widely in the developing brain. At postnatal day 11, NR1-C1 splice variants were expressed in all layers of the visual cortex and in all regions of the hippocampus. The transient expression of NR1-C1 splice variants in layer 4 of visual cortex suggests that NR1-C1 may play a role in determining the critical period for binocular plasticity. Continued expression of NR1-C1 in upper and lower layers of the adult cortex and in CA1 of the hippocampus may provide a substrate for plasticity in corticocortical connections and Schaffer collateral synapses beyond the critical period. In addition to abundant postsynaptic staining, NR1-C1 immunoreactivity was found in a large number of axon terminals in the dorsal subiculum, but in very few terminals in visual cortex. This strongly suggests that presynaptic NMDA receptors play a major role in neuronal processing of hippocampal output through the subiculum, but play a relatively minor role in visual processing.
Collapse
Affiliation(s)
- R R Johnson
- Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis Missouri 63110, USA
| | | | | |
Collapse
|
48
|
Abstract
The lipoprotein Lp(a) is associated with increased risk of atherosclerosis and myocardial infarction in humans. Lp(a) is mostly confined to primate species, due to the limited phylogenetic distribution of its distinguishing protein component, apolipoprotein(a) which is a close homolog of plasminogen. The known properties of Lp(a) are reviewed here. Many of these derive from the ability of Lp(a) to bind to the same substrates as plasminogen. A possible new animal model of Lp(a) is the hedgehog, which contains an Lp(a)-like particle that is the apparent product of independent evolution of a multi-kringle, apolipoprotein(a)-like protein by duplication and modification of portions of the hedgehog plasminogen gene.
Collapse
Affiliation(s)
- R M Lawn
- Falk Cardiovascular Research Center, Stanford University School of Medicine, CA 94305-5246, USA
| |
Collapse
|
49
|
Sudol M. Structure and function of the WW domain. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 1996; 65:113-32. [PMID: 9029943 DOI: 10.1016/s0079-6107(96)00008-9] [Citation(s) in RCA: 229] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- M Sudol
- Mount Sinai School of Medicine, New York, NY 10029, USA
| |
Collapse
|
50
|
Affiliation(s)
- P Bork
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | |
Collapse
|