151
|
Schuelke M, Loeffen J, Mariman E, Smeitink J, van den Heuvel L. Cloning of the human mitochondrial 51 kDa subunit (NDUFV1) reveals a 100% antisense homology of its 3'UTR with the 5'UTR of the gamma-interferon inducible protein (IP-30) precursor: is this a link between mitochondrial myopathy and inflammation? Biochem Biophys Res Commun 1998; 245:599-606. [PMID: 9571201 DOI: 10.1006/bbrc.1998.8486] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We report the cloning of the genomic and cDNA of the human 51 kDa subunit (NDUFV1) of mitochondrial complex I. The 6 kbp NDUFV1 gene is composed of 10 exons. All intron-exon boundaries comply to the consensus sequence for splice donor and acceptor sites. Within the 5' flanking region we identified a putative binding site for NRF-2, a GATA- and GC-box element. Canonical TATA- or CCAAT-boxes were absent, the transcriptional start site, however, lies within a CpG island, which is consistent with the "housekeeping" function of the gene. Within the coding sequence we detected consensus motifs for NADH, FMN, and iron-sulfur binding sites. The amino acid sequence homology between human and cow is 96.9%. Surprisingly we found a 48 bp long complete antisense homology between the 3'UTR of the NDUFV1-mRNA and the 5'UTR of the mRNA for the gamma-interferon inducible protein precursor (IP-30). This finding is intriguing since both genes lie on different chromosomes. The exact function of IP-30 is not yet known, but it may play a role in gamma-interferon mediated immune reactions. The NDUFV1-mRNA might act as an antisense suppresser, thus restraining translation of IP-30 in tissues with high energy demand. This finding could be a molecular link between complex I deficiency and inflammatory myopathy which have been repeatedly described to occur together.
Collapse
Affiliation(s)
- M Schuelke
- Department of Paediatrics, Nijmegen Center for Mitochondrial Disorders, University Hospital Nijmegen, The Netherlands
| | | | | | | | | |
Collapse
|
152
|
Halleck MS, Pradhan D, Blackman C, Berkes C, Williamson P, Schlegel RA. Multiple members of a third subfamily of P-type ATPases identified by genomic sequences and ESTs. Genome Res 1998; 8:354-61. [PMID: 9548971 DOI: 10.1101/gr.8.4.354] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The Saccharomyces cerevisiae genome contains five P-type ATPases divergent from both of the well-known subfamilies of these membrane ion transporters. This newly recognized third subfamily can be further divided into four classes of genes with nearly equal relatedness to each other. Genes of this new subfamily are also present and expressed in multicellular organisms such as Caenorhabditis elegans and mammals; some, but not all, can be assigned to the classes identified in yeast. Different classes of genes and different genes within a class are expressed differentially in tissues of the mouse. The recently cloned gene for the mammalian aminophospholipid translocase belongs to this new subfamily, suggesting that other subfamily members may transport other lipids or lipid-like molecules from one leaflet of the membrane bilayer to the other.
Collapse
Affiliation(s)
- M S Halleck
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | | | | | | | |
Collapse
|
153
|
Gy I, Aubourg S, Sherson S, Cobbett CS, Cheron A, Kreis M, Lecharny A. Analysis of a 14-kb fragment containing a putative cell wall gene and a candidate for the ARA1, arabinose kinase, gene from chromosome IV of Arabidopsis thaliana. Gene 1998; 209:201-10. [PMID: 9524266 DOI: 10.1016/s0378-1119(98)00049-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
An Arabidopsis thaliana genomic DNA fragment of 14kb has been characterized in the framework of the E.S.S.A. programme. Computational and molecular approaches identified three novel gene sequences coding, respectively, for a protein of unknown function, a putative membrane-anchored cell wall protein and an arabinose kinase gene corresponding to the locus ARA1. The latter two genes named AtSEB1 and AtISA1 have been characterized in detail. They are very different in their organization, codon usage and level of expression. Homologues of AtSEB1 and AtISA1 have been identified. Sequence comparisons showed that the former genes contained a long 5' extension coding for an N-terminal domain probably specifying subcellular localization. Cloning and sequencing of the cognate cDNA for the AtISA1 homologue in A. thaliana, named GAL1, indicate that it encodes for a galactokinase-like protein. Our results highlight the integrative outcome of a systematic sequencing project in which links between biochemically and genetically characterized mutants, ESTs and genomic sequence data are generated.
Collapse
Affiliation(s)
- I Gy
- Institut de Biotechnologie des Plantes, Laboratoire de Biologie du Développement des Plantes, Bâtiment 630, Université de Paris-Sud, CNRS-ERS 569, F-91405, Orsay, Cedex, France
| | | | | | | | | | | | | |
Collapse
|
154
|
Pistillo D, Manzi A, Tino A, Boyl PP, Graziani F, Malva C. The Drosophila melanogaster lipase homologs: a gene family with tissue and developmental specific expression. J Mol Biol 1998; 276:877-85. [PMID: 9566193 DOI: 10.1006/jmbi.1997.1536] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We report the molecular cloning of Drosophila genes encoding putative lipase homologs, Dm lip1, lip2 and lip3, the definition of their structure and the expression patterns during development. These Drosophila lipases are related to acid lipases, with a common GHSQG motif, within a more general consensus GXSXG, identified as the active site shared by all the members of lipase superfamily. The lip1 and lip3 genes are transcribed in different tissues and developmental stages, suggesting that they have different functions. The lip1 gene, coding for a protein similar to digestive lipases, is expressed in ovaries and early embryos and, with a different sized transcript, in all the other developmental stages. The lip3 gene, whose translation product is more similar to lysosomal acid lipases, is expressed only during the larval period. The lip2 gene seems non-functional. The Drosophila putative lipases do not show similarity with the Drosophila yolk proteins that are reported to have sequence similarity with lipoprotein lipases, but share a consistent similarity with lepidopteran proteins reported as egg specific or yolk proteins, probably corresponding to lipase homologs. The results reported here are discussed in relation to the evolution and functions of lipases within the between species.
Collapse
Affiliation(s)
- D Pistillo
- Istituto Internazionale di Genetica e Biofisica, Napoli, Italy
| | | | | | | | | | | |
Collapse
|
155
|
Jiang J, Jacob HJ. EbEST: an automated tool using expressed sequence tags to delineate gene structure. Genome Res 1998; 8:268-75. [PMID: 9521930 PMCID: PMC310694 DOI: 10.1101/gr.8.3.268] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/1997] [Accepted: 02/05/1998] [Indexed: 02/06/2023]
Abstract
Large numbers of expressed sequence tags (ESTs) continue to fill public and private databases with partial cDNA sequences. However, using this huge amount of ESTs to facilitate gene finding in genomic sequence imposes a challenge, especially to wet-lab scientists who often have limited computing resources. In an effort to consolidate the information hidden in the vast number of ESTs into a readable and manageable format, we have developed EbEST-a program that automates the process of using ESTs to help delineate gene structure in long stretches of genomic sequence. The EbEST program consists of three functional modules-the first module separates homologous ESTs into clusters and identifies the most informative ESTs within each cluster; the second module uses the informative ESTs to perform gapped alignment and to predict the exon-intron boundary; and the third module generates text file and graphic outputs that illustrate the orientation, exonic structure, and untranslated regions (UTRs) of putative genes in the genomic sequence being analyzed. Evaluation of EbEST with 176 human genes from the ALLSEQ set indicated that it performed in-line with several existing gene finding programs, but was more tolerant to sequencing errors. Furthermore, when EbEST was challenged with query sequences that harbor more than one gene, it suffered only a slight drop in performance, whereas the performance of the other programs evaluated decreased more. EbEST may be used as a stand-alone tool to annotate human genomic sequences with EST-derived gene elements, or can be used in conjunction with computational gene-recognition programs to increase the accuracy of gene prediction. [EbBEST is available at http://EbEST.ifrc.mcw.edu]
Collapse
Affiliation(s)
- J Jiang
- Department of Physiology, Laboratory for Genetics Research, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
| | | |
Collapse
|
156
|
Kestilä M, Lenkkeri U, Männikkö M, Lamerdin J, McCready P, Putaala H, Ruotsalainen V, Morita T, Nissinen M, Herva R, Kashtan CE, Peltonen L, Holmberg C, Olsen A, Tryggvason K. Positionally cloned gene for a novel glomerular protein--nephrin--is mutated in congenital nephrotic syndrome. Mol Cell 1998; 1:575-82. [PMID: 9660941 DOI: 10.1016/s1097-2765(00)80057-x] [Citation(s) in RCA: 1308] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Congenital nephrotic syndrome of the Finnish type (NPHS1) is an autosomal-recessive disorder, characterized by massive proteinuria in utero and nephrosis at birth. In this study, the 150 kb critical region of NPHS1 was sequenced, revealing the presence of at least 11 genes, the structures of 5 of which were determined. Four different mutations segregating with the disease were found in one of the genes in NPHS1 patients. The NPHS1 gene product, termed nephrin, is a 1241-residue putative transmembrane protein of the immunoglobulin family of cell adhesion molecules, which by Northern and in situ hybridization was shown to be specifically expressed in renal glomeruli. The results demonstrate a crucial role for this protein in the development or function of the kidney filtration barrier.
Collapse
Affiliation(s)
- M Kestilä
- Department of Biochemistry, University of Oulu, Finland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
157
|
Rzhetsky A, Kalachikov S, Ye X, Zhang P, Russo JJ. Tools for visualization and integration of intermediate sequencing results in large disease gene discovery projects. Gene 1998; 208:31-5. [PMID: 9479040 DOI: 10.1016/s0378-1119(97)00635-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We describe two Java applets which are useful for insightful presentation of intermediate experimental data in gene discovery projects involving large scale sequencing. One of these applets provides a physical map of a genomic region and provides easy access to the second applet, which furnishes a detailed map of sequence contigs associated with clones on the physical map. In particular, the second applet displays all the known information about each contig, including the presence of exons, database homology 'hits', repetitive elements and other features; the graphics are linked to other World Wide Web pages, providing detailed information on each feature. These applets should be useful to other research groups working on large sequencing projects.
Collapse
MESH Headings
- Chromosome Mapping
- Chromosomes, Human, Pair 13/genetics
- Computer Communication Networks
- Cosmids
- DNA, Complementary
- Databases, Factual
- Exons
- Genes
- Genetic Diseases, Inborn/genetics
- Humans
- Leukemia, Lymphocytic, Chronic, B-Cell/genetics
- Programming Languages
- Repetitive Sequences, Nucleic Acid
- Sequence Analysis, DNA
- Software
Collapse
Affiliation(s)
- A Rzhetsky
- Columbia Genome Center, Columbia University, 630 West 168th Street-BB 16-1611, New York, NY 10032, USA.
| | | | | | | | | |
Collapse
|
158
|
Brendel V, Kleffe J, Carle-Urioste JC, Walbot V. Prediction of splice sites in plant pre-mRNA from sequence properties. J Mol Biol 1998; 276:85-104. [PMID: 9514728 DOI: 10.1006/jmbi.1997.1523] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Heterologous introns are often inaccurately or inefficiently processed in higher plants. The precise features that distinguish the process of pre-mRNA splicing in plants from splicing in yeast and mammals are unclear. One contributing factor is the prominent base compositional contrast between U-rich plant introns and flanking G + C-rich exons. Inclusion of this contrast factor in recently developed statistical methods for splice site prediction from sequence inspection significantly improved prediction accuracy. We applied the prediction tools to re-analyze experimental data on splice site selection and splicing efficiency for native and more than 170 mutated plant introns. In almost all cases, the experimentally determined preferred sites correspond to the highest scoring sites predicted by the model. In native genes, about 90% of splice sites are the locally highest scoring sites within the bounds of the flanking exon and intron. We propose that, in most cases, local context (about 50 bases upstream and downstream from a potential intron end) is sufficient to account for intrinsic splice site strength, and that competition for transacting factors determines splice site selection in vivo. We suggest that computer-aided splice site prediction can be a powerful tool for experimental design and interpretation.
Collapse
Affiliation(s)
- V Brendel
- Department of Mathematics, Stanford University, CA 94305-2125, USA
| | | | | | | |
Collapse
|
159
|
Shiina T, Tamiya G, Oka A, Yamagata T, Yamagata N, Kikkawa E, Goto K, Mizuki N, Watanabe K, Fukuzumi Y, Taguchi S, Sugawara C, Ono A, Chen L, Yamazaki M, Tashiro H, Ando A, Ikemura T, Kimura M, Inoko H. Nucleotide sequencing analysis of the 146-kilobase segment around the IkBL and MICA genes at the centromeric end of the HLA class I region. Genomics 1998; 47:372-82. [PMID: 9480751 DOI: 10.1006/geno.1997.5114] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
To elucidate the complete gene structure and to identify new genes involved in the development of HLA class I antigen-associated diseases in the class I region of the human major histocompatibility complex on chromosome 6, a YAC clone (745D12) covering the 146-kb segment around the IkBL and MICA loci was isolated from a YAC library constructed from the B-cell line, BOLETH. A physical map of this region was constructed by isolation of overlapping cosmid clones derived from 745D12. Of these, five contiguous cosmids were chosen for DNA sequencing by the shotgun strategy to give a single contig of 146,601 bp from 2.8 kb telomeric of the IkBL gene to exon 6 of MICA. This region was confirmed to contain five known genes, IkBL, BAT1, MICB, P5-1, and HLA-X (class I fragment), from centromere to telomere, and their exon-intron organizations were determined. The 3.8-1 homologue gene (3.8-1-hom) showing 99.7% identity with the 3.8-1 cDNA clone, which was originally isolated using the 3.8-kb EcoRI fragment between the HLA-54/H and the HLA-G genes, was detected between MICA and MICB and was suggested to represent the cognate 3.8-1 genomic sequence from which the cDNA clone was derived. No evidence for the presence of expressed new genes could be obtained in this region by homology and EST searches or coding and exon prediction analyses. One TA microsatellite repeat spanning 2545 bases with as many as 913 repetitions was found on the centromeric side of the MICA gene and was indicated to be a potential hot spot for genetic recombination. The two segments of approximately 35 kb upstream of the MICA and MICB genes showed high sequence homology (about 85%) to each other, suggesting that segmental genome duplication including the MICA and MICB genes must have occurred during the evolution of the human MHC.
Collapse
Affiliation(s)
- T Shiina
- Division of Molecular Life Science, Tokai University School of Medicine, Bohseidai, Isehara, Kanagawa, 259-11, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
160
|
Xu G, Sze SH, Liu CP, Pevzner PA, Arnheim N. Gene hunting without sequencing genomic clones: finding exon boundaries in cDNAs. Genomics 1998; 47:171-9. [PMID: 9479489 DOI: 10.1006/geno.1997.5072] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
We propose a new experimental protocol, ExonPCR, which is able to identify exon boundaries in a cDNA even in the absence of any genomic clones. ExonPCR can bypass the isolation, characterization, and DNA sequencing of subclones of genomic DNA to determine exon boundaries: a major effort in the process of positional cloning. Given a cDNA sequence, ExonPCR uses a series of "adaptive" steps to analyze the PCR products from cDNA and genomic DNA thereby revealing the approximate positions of "hidden" exon boundaries in the cDNA. The nucleotide sequence of adjacent intronic regions is determined by ligation-mediated PCR. Primers adjacent to the "hidden" exon boundaries are used to amplify genomic DNA followed by limited DNA sequencing of the PCR product. The method was successfully tested on the 3-kb hMSH2 cDNA with 16 known exons and the 9-kb PRDII-BF1 cDNA with a previously unknown number of exons. We subsequently developed the ExonPCR algorithm and software to direct the experimental protocol using a strategy that is analogous to that used in the game "Twenty Questions." Through the use of ExonPCR, the search for disease-causing mutations can be initiated almost immediately after cDNA clones in a genetically mapped region become available. This approach would be most valuable in gene discovery strategies that focus initially on cDNA isolation.
Collapse
Affiliation(s)
- G Xu
- Molecular Biology Program, University of Southern California, Los Angeles 90089-1340, USA
| | | | | | | | | |
Collapse
|
161
|
Abstract
As the Human Genome Project enters the large-scale sequencing phase, computational gene identification methods are becoming essential for the automatic analysis and annotation of large uncharacterized genomic sequences. Currently available computer programs relying mainly on sequence coding statistics are of great use in pin-pointing regions in genomic sequences containing exons. Such programs perform rather poorly, however, when the problem is to fully elucidate gene structure. For this problem, the DNA sequence signals involved in the specification of the genes--start sites and splice sites--carry a lot of information, and simple methods relying on such information can predict gene structure with an accuracy to some extent comparable to that of other more sophisticated computational methods.
Collapse
Affiliation(s)
- R Guigó
- Departament d'Informàtica Mèdica, Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Spain.
| |
Collapse
|
162
|
Modeling dependencies in pre-mRNA splicing signals. COMPUTATIONAL METHODS IN MOLECULAR BIOLOGY 1998. [DOI: 10.1016/s0167-7306(08)60465-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
163
|
Decision trees and Markov chains for gene finding. ACTA ACUST UNITED AC 1998. [DOI: 10.1016/s0167-7306(08)60467-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
164
|
Abstract
Transcriptional repression in eukaryotes often involves tens or hundreds of kilobase pairs, two to three orders of magnitude more than the bacterial operator/repressor model does. Classical repression, represented by this model, was maintained over the whole span of evolution under different guises, and consists of repressor factors interacting primarily with promoters and, in later evolution, also with enhancers. The use of much larger amounts of DNA in the other mode of repression, here called the sectorial mode ('superrepression'), results in the conceptual transfer of so-called junk DNA to the domain of functional DNA. This contribution to the solution of the c-value paradox involves perhaps 15% of genomic 'junk,' and encompasses the bulk of the introns, thought to fill a stabilizing role in sectorially repressed chromatin structures. In the case of developmental genes, such structures appear to be heterochromatoid in character. However, solid clues regarding general structural features of superrepressed terminal differentiation genes remain elusive. The competition among superrepressible DNA sectors for sectorially binding factors offers, in principle, a molecular mechanism for developmental switches. Position effect variegation may be considered an abnormal manifestation of normal processes that underly development and involve heterochromatoid sectorial repression, which is apparently required for local elimination or modulation of morphological features (morpholysis). Sectorial repression of genes participating either in development or in terminal differentiation is considered instrumental in establishing stable cell types, and provides a basis for the distinction between determination and cell type specification. The gamut of possible stable cell types may have been broadened by the appearance in evolution of heavy isochores. Additional types of relatively frequent GC-rich cis-acting DNA motifs may offer reiterated binding sites to factors endowed with a selective (though not individually strong) affinity for these motifs. The majority of sequence motifs thought to be used in superrepression need not be individually maintained by natural selection. It is re-emphasized that the dispensability of sequences is not an indicator of their nonfunctionality and that in many cases, along noncoding sequences, nucleotides tend to fill functions collectively, rather than individually.
Collapse
Affiliation(s)
- E Zuckerkandl
- Institute of Molecular Medical Sciences, Palo Alto, CA 94306, USA
| |
Collapse
|
165
|
Inoue H, Ishii H, Alder H, Snyder E, Druck T, Huebner K, Croce CM. Sequence of the FRA3B common fragile region: implications for the mechanism of FHIT deletion. Proc Natl Acad Sci U S A 1997; 94:14584-9. [PMID: 9405656 PMCID: PMC25062 DOI: 10.1073/pnas.94.26.14584] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The hypothesis that chromosomal fragile sites may be "weak links" that result in hot spots for cancer-specific chromosome rearrangements was supported by the discovery that numerous cancer cell homozygous deletions and a familial translocation map within the FHIT gene, which encompasses the common fragile site, FRA3B. Sequence analysis of 276 kb of the FRA3B/FHIT locus and 22 associated cancer cell deletion endpoints shows that this locus is a frequent target of homologous recombination between long interspersed nuclear element sequences resulting in FHIT gene internal deletions, probably as a result of carcinogen-induced damage at FRA3B fragile sites.
Collapse
Affiliation(s)
- H Inoue
- Kimmel Cancer Institute, Jefferson Medical College, Philadelphia, PA 19107, USA
| | | | | | | | | | | | | |
Collapse
|
166
|
Schaefer L, Prakash S, Zoghbi HY. Cloning and characterization of a novel rho-type GTPase-activating protein gene (ARHGAP6) from the critical region for microphthalmia with linear skin defects. Genomics 1997; 46:268-77. [PMID: 9417914 DOI: 10.1006/geno.1997.5040] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Microphthalmia with linear skin defects syndrome (MLS) is an X-linked dominant, male-lethal disorder associated with chromosomal rearrangements that result in deletions of the distal short arm of the X chromosome. In an effort to isolate expressed sequences from the 500-kb MLS critical region in Xp22.3, exons were trapped from 14 overlapping cosmids. Using exon connection followed by cDNA library screening, we identified a 2.4-kb contig of cDNA library screening 170 kb of genomic sequence in the MLS deletion region. Northern analysis of this cDNA detected a prominent approximately 4.2-kb transcript and a less abundant approximately 6-kb transcript in all tissues examined, with additional transcripts in skeletal muscle. Sequence analysis revealed a coding region of 601 amino acids contained in 12 exons, with a splice variant isoform of 495 amino acids. The predicted protein sequence of the gene, named ARHGAP6, contains homology to the GTPase-activating (GAP) domain of the rhoGAP family of proteins, which has been implicated in the regulation of actin polymerization at the plasma membrane in several cellular processes. The possible role of the ARHGAP6 protein in the pathogenesis of MLS is discussed.
Collapse
Affiliation(s)
- L Schaefer
- Department of Molecular and Human Genetics, Howard Hughes Medical Institute, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | |
Collapse
|
167
|
Royaux I, Lambert de Rouvroit C, D'Arcangelo G, Demirov D, Goffinet AM. Genomic organization of the mouse reelin gene. Genomics 1997; 46:240-50. [PMID: 9417911 DOI: 10.1006/geno.1997.4983] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Reelin is the protein defective in reeler mice, an extensively studied model of brain development. The reelin gene (symbol Reln) codes for a protein of the extracellular matrix that contains eight successive repeats of 350 to 390 amino acids. In this work, we describe the genomic structure of the mouse reelin gene and the 5'-flanking genomic DNA sequences. The reelin gene is composed of 65 exons spread over approximately 450 kb of genomic DNA. We identified different reelin transcripts, formed by alternative splicing of a microexon as well as by use of two different polyadenylation sites. All splice sites conform to the GT-AG rule, except for the splice donor site of intron 30, which is GC instead of GT. A processed pseudogene is present in intron 42. Its nucleotide sequence is 86% identical to the sequence of the rat RDJ1 cDNA, which codes for a DnaJ-like protein of the Hsp40 family. Comparison of 8 intron positions in mouse and human reelin genes reveals a highly conserved genomic structure, suggesting a similar structure of the whole gene in both species. We identified two transcription start sites embedded within a CpG. The promoter region contains putative recognition sites for the transcription factors Sp1 and AP2 but lacks TATA and CAAT boxes. The presence of tandemly repeated regions in the Reelin protein suggests that gene duplication events occurred during evolution. By comparison of the amino acid sequences of the eight repeats and the positions of introns, we suggest a model for the evolution of the repeat coding portion of the reelin gene from a putative ancestral minigene.
Collapse
Affiliation(s)
- I Royaux
- Department of Physiology, University of Namur School of Medicine, Belgium
| | | | | | | | | |
Collapse
|
168
|
Laurell H, Grober J, Vindis C, Lacombe T, Dauzats M, Holm C, Langin D. Species-specific alternative splicing generates a catalytically inactive form of human hormone-sensitive lipase. Biochem J 1997; 328 ( Pt 1):137-43. [PMID: 9359844 PMCID: PMC1218897 DOI: 10.1042/bj3280137] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Hormone-sensitive lipase (HSL) catalyses the rate-limiting step of adipose tissue lipolysis. The enzyme is also expressed in steroidogenic tissues, mammary gland, muscle tissues and macrophages. A novel HSL mRNA termed hHSL-S, 228 bp shorter than the full-length HSL mRNA, was detected in human adipocytes. hHSL-S mRNA results from the in-frame skipping of exon 6, which encodes the serine residue of the catalytic triad. The corresponding 80 kDa protein was identified in human adipocytes after immunoprecipitation. The truncated protein expressed in COS cells showed neither lipase nor esterase activity but was phosphorylated by cAMP-dependent protein kinase. hHSL-S mRNA was found in all human tissues expressing HSL, except brown adipose tissue from newborns. It represented approx. 20% of total HSL transcripts in human subcutaneous adipocytes. No alternative splicing was detected in other mammals. Human and mouse three-exon HSL minigenes transfected into primate and rodent cell lines reproduced the splicing pattern of the endogenous HSL genes. Analysis of hybrid human/mouse minigenes transfected into human cell lines showed that cis-acting elements responsible for the skipping of human exon 6 were restricted to a 247 bp region including exon 6 and the first 19 nt of intron 6. Moreover, divergence in exonic splicing elements between mouse and human was shown to be critical for the species-specific alternative splicing.
Collapse
Affiliation(s)
- H Laurell
- Unité INSERM 317, Institut Louis Bugnard, Faculté de Médecine, Université Paul Sabatier, Toulouse, France
| | | | | | | | | | | | | |
Collapse
|
169
|
Aubourg S, Takvorian A, Chéron A, Kreis M, Lecharny A. Structure, organization and putative function of the genes identified within a 23.9-kb fragment from Arabidopsis thaliana chromosome IV. Gene 1997; 199:241-53. [PMID: 9358062 DOI: 10.1016/s0378-1119(97)00374-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In the framework of the complete genome sequencing programme of the crucifer Arabidopsis thaliana, a 23.9-kb fragment from the long arm of chromosome IV has been analysed. This paper presents a methodological approach, integrating computerized predictions, database screening, the sequencing of cognate cDNAs and a PCR-based detection of expression that allows the accumulation of an important amount of information from an anonymous sequence. This work revealed the organization of novel genes and the vestige of a copia-like retrotransposon. The gene AtRH1 encodes the first member of a new subfamily of the plant DEAD box RNA helicases. A recurrent and complete search of dbEST has been used to evaluate the number of different RNA helicases expressed in A. thaliana. On the 18 discriminated members of the family, only a small number seems to be expressed at a relatively high level. The putative gene AtTS1 encodes a novel terpene synthase in A. thaliana, and the genes G14587-5 and G14587-6 encode unknown proteins. This study illustrates most of the situations that could be encountered during the analysis of an anonymous sequence from A. thaliana.
Collapse
Affiliation(s)
- S Aubourg
- Laboratoire de Biologie du Développement des Plantes, Institut de Biotechnologie des Plantes, ERS/CNRS 569, Université de Paris-Sud, Orsay, France
| | | | | | | | | |
Collapse
|
170
|
Wilson CL, Thomsen J, Hoivik DJ, Wormke MT, Stanker L, Holtzapple C, Safe SH. Aryl hydrocarbon (Ah) nonresponsiveness in estrogen receptor-negative MDA-MB-231 cells is associated with expression of a variant arnt protein. Arch Biochem Biophys 1997; 346:65-73. [PMID: 9328285 DOI: 10.1006/abbi.1997.0289] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Several studies have reported a correlation between expression of the estrogen receptor (ER) and aryl hydrocarbon (Ah) responsiveness in human breast cancer cell lines. MDA-MB-231 cells are ER-negative and Ah-nonresponsive; however, initial studies showed that 2,3,7,8-tetrachlorodibenzo-p-dioxin induced CYP1A1 mRNA levels (5.8-fold) and chloramphenicol acetyltransferase activity (2.6-fold) in high passage (Hp, >50 passages) cells transiently transfected with an Ah-responsive plasmid. In contrast, no induction responses were observed in low passage (Lp, <20 passages) cells. The Ah responsiveness of Hp compared to Lp MDA-MB-231 cells was associated with a >2-fold increased expression of the Ah receptor in Hp cells. Further analysis revealed that the apparent molecular weight of the Ah receptor mRNA transcript and immunoreactive protein were comparable in Lp MDA-MB-231 and Ah-responsive human HepG2 cells. In contrast, RT-PCR analysis of the Ah receptor nuclear translocator (Arnt) protein showed that HepG2 cells expressed the expected 2.6-kb transcript, whereas a 1.3-kb transcript was the major product in MDA-MB-231 cells. Western blot analysis confirmed that HepG2 cells primarily expressed a 97-kDa wild-type form of Arnt, whereas a dominant 36-kDa variant was expressed in MDA-MB-231 cells. Complete sequence analysis of the variant form of Arnt revealed a major deletion of the C-terminal region of the protein (aa 330 to 789). Like HepG2 cells, the wild-type 2.6-kb transcript was detected in ER-positive (Ah-responsive) MCF-7 cells, whereas the low-molecular-weight variant Arnt was dominant in ER-negative MDA-MB-231, MDA-MB-435, and Adriamycin-resistant MCF-7 cells. These results suggest that expression of this protein may be useful as a prognostic factor in breast cancer.
Collapse
Affiliation(s)
- C L Wilson
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station 77843-4466, USA
| | | | | | | | | | | | | |
Collapse
|
171
|
Abstract
We present an improved splice site predictor for the genefinding program Genie. Genie is based on a generalized Hidden Markov Model (GHMM) that describes the grammar of a legal parse of a multi-exon gene in a DNA sequence. In Genie, probabilities are estimated for gene features by using dynamic programming to combine information from multiple content and signal sensors, including sensors that integrate matches to homologous sequences from a database. One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 86% of coding nucleotides correctly with a specificity of 85%, versus 80% and 84% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.
Collapse
Affiliation(s)
- M G Reese
- Human Genome Informatics Group, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
| | | | | | | |
Collapse
|
172
|
Abstract
Computational methods for gene identification in genomic sequences typically have two phases: coding region recognition and gene parsing. While there are a number of effective methods for recognizing coding regions (exons), parsing the recognized exons into proper gene structures, to a large extent, remains an unsolved problem. We have developed a computer program which can automatically parse the recognized exons into gene models that are most consistent with the available Expressed Sequence Tags (ESTs) and a set of biological heuristics, derived empirically. The gene modeling algorithm used in this program provides a general framework for applying EST information so the modeling accuracy improves as the amount of available EST information increases. Based on preliminary tests on a number of large DNA sequences, using the dbEST database, we have observed that the algorithm can (1) accurately model complicated multiple gene structures, including embedded genes, (2) identify falsely-recognized exons and locate missed exons by the initial exon recognition phase, and (3) make more accurate exon boundary predictions, if the necessary EST information is available. We have extended this EST-based gene modeling algorithm to model genes on unfinished DNA contigs at the end of the shotgun sequencing. This extended version can automatically determine the orientations and the relative order of the DNA contigs (with gaps between them) using the available ESTs as reference models, before the gene modeling phase.
Collapse
Affiliation(s)
- Y Xu
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Tennessee 37831-6364, USA.
| | | |
Collapse
|
173
|
Ji DD, Arnot DE. A Plasmodium falciparum homologue of the ATPase subunit of a multi-protein complex involved in chromatin remodelling for transcription. Mol Biochem Parasitol 1997; 88:151-62. [PMID: 9274876 DOI: 10.1016/s0166-6851(97)00089-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A Plasmodium falciparum homologue of one of the components of a chromatin-remodelling complex which controls binding of transcription factors to nucleosome core particles has been cloned and characterised. The gene encodes 1422 amino acids with an estimated molecular mass of 167 kDa. The protein, SNF2L, shares 60% amino acid identity in its conserved DNA-dependent ATPase domain with yeast transcription factors originally identified by characterising mating type switch mutants. It also contains sequences related to the so-called SWI3, ADA2, N-CoR and TFIIIB B" or SANT DNA binding domains which are characteristic of these transcriptional activation factors. The SNF2L gene has two short introns in the 3' region of the coding sequence of the gene and is transcribed into a single approximately 6.5 kb messenger RNA species which is present throughout the asexual stages of the cell cycle. Southern blotting and pulsed field gel electrophoresis experiments show that SNF2L is a single copy gene. located on P. falciparum chromosome 11.
Collapse
Affiliation(s)
- D D Ji
- Institute of Cell, Animal and Population Biology, Ashworth Laboratory, University of Edinburgh, UK
| | | |
Collapse
|
174
|
Abstract
Familial Mediterranean fever (FMF) is an autosomal recessive disorder characterized by attacks of fever and serositis. In this paper, we define a minimal co-segregating region of 60 kb containing the FMF gene (MEFV) and identify four different transcript units within this region. One of these transcripts encodes a new protein (marenostrin) related to the ret-finger protein and to butyrophllin. Four conservative missense variations co-segregating with FMF have been found within the MEFV candidate gene in 85% of the carrier chromosomes. These variations, which cluster at the carboxy terminal domain of the protein, were not present in 308 control chromosomes, including 162 validated non-carriers. We therefore propose that the sequence alterations in the marenostrin protein are responsible for the FMF disease.
Collapse
|
175
|
Chissoe SL, Marra MA, Hillier L, Brinkman R, Wilson RK, Waterston RH. Representation of cloned genomic sequences in two sequencing vectors: correlation of DNA sequence and subclone distribution. Nucleic Acids Res 1997; 25:2960-6. [PMID: 9224593 PMCID: PMC146865 DOI: 10.1093/nar/25.15.2960] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Representation of subcloned Caenorhabditis elegans and human DNA sequences in both M13 and pUC sequencing vectors was determined in the context of large scale genomic sequencing. In many cases, regions of subclone under-representation correlated with the occurrence of repeat sequences, and in some cases the under-representation was orientation specific. Factors which affected subclone representation included the nature and complexity of the repeat sequence, as well as the length of the repeat region. In some but not all cases, notable differences between the M13 and pUC subclone distributions existed. However, in all regions lacking one type of subclone (either M13 or pUC), an alternate subclone was identified in at least one orientation. This suggests that complementary use of M13 and pUC subclones would provide the most comprehensive subclone coverage of a given genomic sequence.
Collapse
Affiliation(s)
- S L Chissoe
- Department of Genetics and Genome Sequencing Center, Washington University School of Medicine, St Louis, MO 63108, USA.
| | | | | | | | | | | |
Collapse
|
176
|
McCullough AJ, Berget SM. G triplets located throughout a class of small vertebrate introns enforce intron borders and regulate splice site selection. Mol Cell Biol 1997; 17:4562-71. [PMID: 9234714 PMCID: PMC232310 DOI: 10.1128/mcb.17.8.4562] [Citation(s) in RCA: 185] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Splicing of small introns in lower eucaryotes can be distinguished from vertebrate splicing by the inability of such introns to be expanded and by the inability of splice site mutations to cause exon skipping-properties suggesting that the intron rather than the exon is the unit of recognition. Vertebrates do contain small introns. To see if they possess properties similar to small introns in lower eucaryotes, we studied the small second intron from the human alpha-globin gene. Mutation of the 5' splice site of this intron resulted in in vivo intron inclusion, not exon skipping, suggesting the presence of intron bridging interactions. The intron had an unusual base composition reflective of a sequence bias present in a collection of small human introns in which multiple G triplets stud the interior of the introns. Each G triplet represented a minimal sequence element additively contributing to maximal splicing efficiency and spliceosome assembly. More importantly, G triplets proximal to a duplicated splice site caused preferential utilization of the 5' splice site upstream of the triplets or the 3' splice site downstream of the triplets; i.e., sequences containing G triplets were preferentially used as introns when a choice was possible. Thus, G triplets internal to a small intron have the ability to affect splice site decisions at both ends of the intron. Each G triplet additively contributed to splice site selectivity. We suggest that G triplets are a common component of human 5' splice sites and aid in the definition of exon-intron borders as well as overall splicing efficiency. In addition, our data suggest that such intronic elements may be characteristic of small introns and represent an intronic equivalent to the exon enhancers that facilitate recognition of both ends of an exon during exon definition.
Collapse
Affiliation(s)
- A J McCullough
- Verna and Marrs McLean Department of Biochemistry, Baylor College of Medicine, Houston, Texas 77030, USA.
| | | |
Collapse
|
177
|
Abstract
We present here a new algorithm for functional site analysis. It is based on four main assumptions: each variation of nucleotide composition makes a different contribution to the overall binding free energy of interaction between a functional site and another molecule; nonfunctioning site-like regions (pseudosites) are absent or rare in genomes; there may be errors in the sample of sites; and nucleotides of different site positions are considered to be mutually dependent. In this algorithm, the site set is divided into subsets, each described by a certain consensus. Donor splice sites of the human protein-coding genes were analyzed. Comparing the results with other methods of donor splice site prediction has demonstrated a more accurate prediction of consensus sequences AG/GU(A,G), G/GUnAG, /GU(A,G)AG, /GU(A,G)nGU, and G/GUA than is achieved by weight matrix and consensus (A,C)AG/GU(A,G)AGU with mismatches. The probability of the first type error, E1, for the obtained consensus set was about 0.05, and the probability of the second type error, E2, was 0.15. The analysis demonstrated that accuracy of the functional site prediction could be improved if one takes into account correlations between the site positions. The accuracy of prediction by using human consensus sequences was tested on sequences from different organisms. Some differences in consensus sequences for the plant Arabidopsis sp., the invertebrate Caenorhabditis sp., and the fungus Aspergillus sp. were revealed. For the yeast Saccharomyces sp. only one conservative consensus, /GUA(U,A,C)G(U,A,C), was revealed (E1 = 0.03, E2 = 0.03). Yeast is a very interesting model to use for analysis of molecular mechanisms of splicing.
Collapse
Affiliation(s)
- I B Rogozin
- Istituto di Tecnologie Biomediche Avanzate, Consiglio Nazionale Delle Ricerche, via Ampere 56, 20131 Milano, Italy
| | | |
Collapse
|
178
|
Abstract
This study describes a new Hidden Markov Model (HMM) system for segmenting uncharacterized genomic DNA sequences into exons, introns, and intergenic regions. Separate HMM modules were designed and trained for specific regions of DNA: exons, introns, intergenic regions, and splice sites. The models were then tied together to form a biologically feasible topology. The integrated HMM was trained further on a set of eukaryotic DNA sequences and tested by using it to segment a separate set of sequences. The resulting HMM system which is called VEIL (Viterbi Exon-Intron Locator), obtains an overall accuracy on test data of 92% of total bases correctly labelled, with a correlation coefficient of 0.73. Using the more stringent test of exact exon prediction, VEIL correctly located both ends of 53% of the coding exons, and 49% of the exons it predicts are exactly correct. These results compare favorably to the best previous results for gene structure prediction and demonstrate the benefits of using HMMs for this problem.
Collapse
Affiliation(s)
- J Henderson
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.
| | | | | |
Collapse
|
179
|
Francis F, Strom TM, Hennig S, Böddrich A, Lorenz B, Brandau O, Mohnike KL, Cagnoli M, Steffens C, Klages S, Borzym K, Pohl T, Oudet C, Econs MJ, Rowe PS, Reinhardt R, Meitinger T, Lehrach H. Genomic organization of the human PEX gene mutated in X-linked dominant hypophosphatemic rickets. Genome Res 1997; 7:573-85. [PMID: 9199930 DOI: 10.1101/gr.7.6.573] [Citation(s) in RCA: 120] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
X-linked dominant hypophosphatemic rickets (HYP) is the most common form of hereditary rickets. Recently we have cloned thePEX gene and shown it to be mutated and deleted in HYP individuals. We have now completely sequenced a 243-kb genomic region containing PEX and have identified all intron–exon boundary sequences. We show that PEX, homologous to members of a neutral endopeptidase family, has an exon organization that is very similar to neprilysin. We have performed an extensive mutation analysis examining all 22 PEX coding exons in 29 familial and 14 sporadic cases of hypophosphatemia. Sequence changes include missense, frameshift, nonsense, and splice site mutations and intragenic deletions. A mutation was found in 25 (86%) of the 29 familial cases and 8 (57%) of the 14 sporadic cases. Our data provide the first evidence that most of the familial and also a large number of the sporadic cases of hypophosphatemia are caused by loss-of-function mutations in PEX.[The sequence data described in this paper have been submitted to GenBank under accession nos.Y08111–Y08132 and Y10196.]
Collapse
Affiliation(s)
- F Francis
- Max-Planck Institut für Molekulare Genetik, Berlin, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
180
|
Laporte J, Kioschis P, Hu LJ, Kretz C, Carlsson B, Poustka A, Mandel JL, Dahl N. Cloning and characterization of an alternatively spliced gene in proximal Xq28 deleted in two patients with intersexual genitalia and myotubular myopathy. Genomics 1997; 41:458-62. [PMID: 9169146 DOI: 10.1006/geno.1997.4662] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We have identified a novel human gene that is entirely deleted in two boys with abnormal genital development and myotubular myopathy (MTM1). The gene, F18, is located in proximal Xq28, approximately 80 kb centromeric to the recently isolated MTM1 gene. Northern analysis of mRNA showed a ubiquitous pattern and suggested high levels of expression in skeletal muscle, brain, and heart. A transcript of 4.6 kb was detected in a range of tissues, and additional alternate forms of 3.8 and 2.6 kb were present in placenta and pancreas, respectively. The gene extends over 100 kb and is composed of at least seven exons, of which two are noncoding. Sequence analysis of a 4.6-kb cDNA contig revealed two overlapping open reading frames (ORFs) that encode putative proteins of 701 and 424 amino acids, respectively. Two alternative spliced transcripts affecting the large open reading frame were identified that, together with the Northern blot results, suggest that distinct proteins are derived from the gene. No significant homology to other known proteins was detected, but segments of the first ORF encode polyglutamine tracts and proline-rich domains, which are frequently observed in DNA-binding proteins. The F18 gene is a strong candidate for being implicated in the intersexual genitalia present in the two MTM1-deleted patients. The gene also serves as a candidate for other disorders that map to proximal Xq28.
Collapse
Affiliation(s)
- J Laporte
- Institut de Génetique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, Illkirch, Strasbourg, France
| | | | | | | | | | | | | | | |
Collapse
|
181
|
Abstract
We introduce a general probabilistic model of the gene structure of human genomic sequences which incorporates descriptions of the basic transcriptional, translational and splicing signals, as well as length distributions and compositional features of exons, introns and intergenic regions. Distinct sets of model parameters are derived to account for the many substantial differences in gene density and structure observed in distinct C + G compositional regions of the human genome. In addition, new models of the donor and acceptor splice signals are described which capture potentially important dependencies between signal positions. The model is applied to the problem of gene identification in a computer program, GENSCAN, which identifies complete exon/intron structures of genes in genomic DNA. Novel features of the program include the capacity to predict multiple genes in a sequence, to deal with partial as well as complete genes, and to predict consistent sets of genes occurring on either or both DNA strands. GENSCAN is shown to have substantially higher accuracy than existing methods when tested on standardized sets of human and vertebrate genes, with 75 to 80% of exons identified exactly. The program is also capable of indicating fairly accurately the reliability of each predicted exon. Consistently high levels of accuracy are observed for sequences of differing C + G content and for distinct groups of vertebrates.
Collapse
Affiliation(s)
- C Burge
- Department of Mathematics, Stanford University, CA 94305, USA
| | | |
Collapse
|
182
|
Ansari-Lari MA, Shen Y, Muzny DM, Lee W, Gibbs RA. Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination. Genome Res 1997; 7:268-80. [PMID: 9074930 DOI: 10.1101/gr.7.3.268] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The detailed genomic organization of a gene-dense region at human chromosome 12p13, spanning 223 kb of contiguous sequence, was determined. This region is composed of 20 genes and several other expressed sequences. Experimental tools including RT-PCR and cDNA sequencing, combined with gene prediction programs, were utilized in the analysis of the sequence. Various computer software programs were employed for sequence similarity searches and functional predictions. The high number of genes with diverse functions and complex transcriptional patterns make this region ideal for addressing challenges of gene discovery and genomic characterization amenable to large-scale sequence analysis.
Collapse
Affiliation(s)
- M A Ansari-Lari
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
| | | | | | | | | |
Collapse
|
183
|
Meller VH, Wu KH, Roman G, Kuroda MI, Davis RL. roX1 RNA paints the X chromosome of male Drosophila and is regulated by the dosage compensation system. Cell 1997; 88:445-57. [PMID: 9038336 DOI: 10.1016/s0092-8674(00)81885-1] [Citation(s) in RCA: 210] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The Drosophila roX1 gene is X-linked and produces RNAs that are male-specific, somatic, and preferentially expressed in the central nervous system. These RNAs are retained in the nucleus and lack any significant open reading frame. Although all sexually dimorphic characteristics in Drosophila were thought to be controlled by the sex determination pathway through the gene transformer (tra), the expression of roX1 is independent of tra activity. Instead, the dosage compensation system is necessary and sufficient for the expression of roX1. Consistent with a potential function in dosage compensation, roX1 RNAs localize specifically to the male X chromosome. This localization occurs even when roX1 RNAs are expressed from autosomal locations in X-to-autosome translocations. The novel regulation and subnuclear localization of roX1 RNAs makes them candidates for an RNA component of the dosage compensation machinery.
Collapse
Affiliation(s)
- V H Meller
- Department of Cell Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | |
Collapse
|
184
|
Zhang MQ. Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci U S A 1997; 94:565-8. [PMID: 9012824 PMCID: PMC19553 DOI: 10.1073/pnas.94.2.565] [Citation(s) in RCA: 198] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/1996] [Accepted: 10/29/1996] [Indexed: 02/03/2023] Open
Abstract
A new method for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. Substantial improvements have been made (with only 9 discriminant variables) when compared with existing methods: HEXON [Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. (1994) Nucleic Acids Res. 22, 5156-5163] (based on linear discriminant analysis) and GRAIL2 [Uberbacher, E. C. & Mural, R. J. (1991) Proc. Natl. Acad. Sci. USA 88, 11261-11265] (based on neural networks). A computer program called MZEF is freely available to the genome community and allows users to adjust prior probability and to output alternative overlapping exons.
Collapse
Affiliation(s)
- M Q Zhang
- Cold Spring Harbor Laboratory, NY 11724, USA
| |
Collapse
|
185
|
Tiso N, Rampoldi L, Pallavicini A, Zimbello R, Pandolfo D, Valle G, Lanfranchi G, Danieli GA. Fine mapping of five human skeletal muscle genes: alpha-tropomyosin, beta-tropomyosin, troponin-I slow-twitch, troponin-I fast-twitch, and troponin-C fast. Biochem Biophys Res Commun 1997; 230:347-50. [PMID: 9016781 DOI: 10.1006/bbrc.1996.5958] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
In this paper the chromosomal localization of the human skeletal muscle genes Troponin-I slow-twitch (TNNI1), Troponin-I fast-twitch (TNNI2), and Troponin-C fast (TNNC2) and the refinement of the position for alpha-Tropomyosin (TPM1) and beta-Tropomyosin (TPM2) are reported. By radiation hybrid mapping, TPM1 was assigned to chromosome 15q22.1, TPM2 to chromosome 9p13.2-p13.1, TNNI1 to chromosome 1q31.3, TNNI2 to chromosome 11p15.5, and TNNC2 to chromosome 20q12-q13.11. The genomic distribution of these genes is discussed, with particular emphasis on the cluster organization of the Troponin genes.
Collapse
MESH Headings
- Base Sequence
- Chromosome Mapping
- Chromosomes, Human, Pair 1
- Chromosomes, Human, Pair 11
- Chromosomes, Human, Pair 15
- Chromosomes, Human, Pair 20
- Chromosomes, Human, Pair 9
- DNA Primers
- Humans
- Molecular Sequence Data
- Multigene Family
- Muscle Fibers, Fast-Twitch/metabolism
- Muscle Fibers, Slow-Twitch/metabolism
- Muscle, Skeletal/metabolism
- Polymerase Chain Reaction
- Tropomyosin/biosynthesis
- Tropomyosin/genetics
- Troponin C/biosynthesis
- Troponin C/genetics
- Troponin I/biosynthesis
- Troponin I/genetics
Collapse
Affiliation(s)
- N Tiso
- Biology Department, University of Padova, Italy
| | | | | | | | | | | | | | | |
Collapse
|
186
|
Rescheleit DK, Rommerskirch WJ, Wiederanders B. Sequence analysis and distribution of two new human cathepsin L splice variants. FEBS Lett 1996; 394:345-8. [PMID: 8830671 DOI: 10.1016/0014-5793(96)00986-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Despite elevated cathepsin L mRNA levels in kidney tumors, cathepsin L protein/activity was scarcely detectable in these tumors. As a possible reason, we detected two new splice variants of human cathepsin L mRNAs not identical to those previously reported. Besides the normal 'full-length' mRNA (hCATL-A) there is one form lacking 27 nucleotides (hCATL-A I) and another form lacking 90 nucleotides (hCATL-A II) in exon I. The splice variants do not influence the amino acid sequence of the translational product. hCATL-A and hCATL-A I probably form a secondary structure at the 5' non-coding sequence not present in hCATL-A II.
Collapse
Affiliation(s)
- D K Rescheleit
- Friedrich-Schiller-Universität Jena, Klinikum, Institut für Biochemie, Germany
| | | | | |
Collapse
|
187
|
Gelfand MS, Mironov AA, Pevzner PA. Gene recognition via spliced sequence alignment. Proc Natl Acad Sci U S A 1996; 93:9061-6. [PMID: 8799154 PMCID: PMC38595 DOI: 10.1073/pnas.93.17.9061] [Citation(s) in RCA: 192] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.
Collapse
Affiliation(s)
- M S Gelfand
- Institute of Protein Research, Russian Academy of Sciences, Puschino, Moscow, Russia
| | | | | |
Collapse
|
188
|
Timmermans MC, Das OP, Messing J. Characterization of a meiotic crossover in maize identified by a restriction fragment length polymorphism-based method. Genetics 1996; 143:1771-83. [PMID: 8844163 PMCID: PMC1207438 DOI: 10.1093/genetics/143.4.1771] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Genetic map lengths do not correlate directly with genome size, suggesting that meiotic recombination is not uniform throughout the genome. Further, the abundance of repeated sequences in plant genomes requires that crossing over is restricted to particular genomic regions. We used a physical mapping approach to identify these regions without the bias introduced by phenotypic selection. This approach is based on the detection of nonparental polymorphisms formed by recombination between polymorphic alleles. In an F2 population of 48 maize plants, we identified a crossover at two of the seven restriction fragment length polymorphism loci tested. Characterization of one recombination event revealed that the crossover mapped within a 534-bp region of perfect homology between the parental alleles embedded in a 2773-bp unique sequence. No transcripts from this region could be detected. Sequences immediately surrounding the crossover site were not detectably methylated, except for an SstI site and at the flanking repetitive sequences were faithfully inherited by the recombinant allele. Our observations suggest that meiotic recombination in maize occurs between perfectly homologous sequences, within unmethylated, nonrepetitive regions of the genome.
Collapse
Affiliation(s)
- M C Timmermans
- Waksman Institute, Rulgers University, Piscataway, New Jersey 08855-0759, USA
| | | | | |
Collapse
|
189
|
Pichon L, Hampe A, Giffon T, Carn G, Legall JY, David V. A new non-HLA multigene family associated with the PERB11 family within the MHC class I region. Immunogenetics 1996; 44:259-67. [PMID: 8753856 DOI: 10.1007/bf02602555] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
In an effort to initiate steps designed to characterize the idiopathic hemochromatosis disease gene, the HLA-A/HLA-F region where this gene is in disequilibrium linkage with some polymorphic markers has been overlapped by a yeast artificial chromosome (YAC) contig. In order to achieve the physical mapping of these YACs and of the corresponding genomic region, we subcloned one of the YACs involved. A computer-assisted analysis of the sequence of one subclone led to the isolation of a potential exon that proved to belong to a new expressed messenger named HCGIX. After Southern blot analysis, the corresponding cDNA clone was found to belong to a new multigene family whose members are dispersed throughout the HLA class I region and are closely associated with members of another recently described multigene family designated PERB11. The data reported here suggest that these two multigene families form a cluster that have been dispersed together throughout the telomeric part of the major histocompatibility complex and have been involved in the genesis of this human class I region.
Collapse
Affiliation(s)
- L Pichon
- Department of Biochemistry and Molecular Biology, UPR 41 CNRS "Recombinaisons Génétiques" Faculté de Médecine, 2 avenue du Professeur Léon Bernard, 35043 Rennes Cedex, France
| | | | | | | | | | | |
Collapse
|
190
|
Pichon L, Giffon T, Chauvel B, Carn G, Bouric P, El Kahloun A, Legall JY, David V. Physical map of the HLA-A/HLA-F subregion and identification of two new coding sequences. Immunogenetics 1996; 43:175-81. [PMID: 8575815 DOI: 10.1007/bf00587297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
As part of an effort to characterize the hemochromatosis gene, we selected three non-chimeric yeast artificial chromosomes (YACs) overlapping with the YAC B30 previously described and forming an 800 kilobase contig covering the HLA-A/HLA-F region. The precise physical map of these YACs and of the corresponding genomic region were established. Nine concentrated sites of CpG cutter elements, potentially HTF islands, were mapped. In addition, several probes have been generated as tools for mapping and examining transcripts produced in the region. This allowed for the characterization and localization of two new coding sequences, provisionally named HCG (for hemochromatosis candidate gene) and numbered VIII and IX.
Collapse
MESH Headings
- Blotting, Northern
- Chromosomes, Artificial, Yeast
- Chromosomes, Human, Pair 6
- Cloning, Molecular
- DNA Fingerprinting
- DNA, Complementary/genetics
- Electrophoresis, Gel, Pulsed-Field
- Gene Library
- HLA Antigens/genetics
- HLA-A Antigens/genetics
- Hemochromatosis/genetics
- Histocompatibility Antigens Class I/genetics
- Humans
- Molecular Sequence Data
- Open Reading Frames
- Restriction Mapping
- Sequence Analysis, DNA
- Transcription, Genetic
Collapse
Affiliation(s)
- L Pichon
- Department of Biochemistry and Molecular Biology, UPR 41 CNRS - "Recombinaisons génétiques" Faculté de Médecine, 2 avenue du Professeur Léon Bernard, 35043 Rennes Cedex, France
| | | | | | | | | | | | | | | |
Collapse
|
191
|
Laporte J, Hu LJ, Kretz C, Mandel JL, Kioschis P, Coy JF, Klauck SM, Poustka A, Dahl N. A gene mutated in X-linked myotubular myopathy defines a new putative tyrosine phosphatase family conserved in yeast. Nat Genet 1996; 13:175-82. [PMID: 8640223 DOI: 10.1038/ng0696-175] [Citation(s) in RCA: 455] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
X-linked recessive myotubular myopathy (MTM1) is characterized by severe hypotonia and generalized muscle weakness, with impaired maturation of muscle fibres. We have restricted the candidate region to 280 kb and characterized two candidate genes using positional cloning strategies. The presence of frameshift or missense mutations (of which two are new mutations) in seven patients proved that one of these genes is indeed implicated in MTM1. The protein encoded by the MTM1 gene is highly conserved in yeast, which is surprising for a muscle specific disease. The protein contains the consensus sequence for the active site of tyrosine phosphatases, a wide class of proteins involved in signal transduction. At least three other genes, one located within 100 kb distal from the MTM1 gene, encode proteins with very high sequence similarities and define, together with the MTM1 gene, a new family of putative tyrosine phosphatases in man.
Collapse
Affiliation(s)
- J Laporte
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/U LP, B.P. 163, C.U. de Strasbourg, France
| | | | | | | | | | | | | | | | | |
Collapse
|
192
|
Campuzano V, Montermini L, Moltò MD, Pianese L, Cossée M, Cavalcanti F, Monros E, Rodius F, Duclos F, Monticelli A, Zara F, Cañizares J, Koutnikova H, Bidichandani SI, Gellera C, Brice A, Trouillas P, De Michele G, Filla A, De Frutos R, Palau F, Patel PI, Di Donato S, Mandel JL, Cocozza S, Koenig M, Pandolfo M. Friedreich's ataxia: autosomal recessive disease caused by an intronic GAA triplet repeat expansion. Science 1996; 271:1423-7. [PMID: 8596916 DOI: 10.1126/science.271.5254.1423] [Citation(s) in RCA: 1884] [Impact Index Per Article: 67.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Friedreich's ataxia (FRDA) is an autosomal recessive, degenerative disease that involves the central and peripheral nervous systems and the heart. A gene, X25, was identified in the critical region for the FRDA locus on chromosome 9q13. This gene encodes a 210-amino acid protein, frataxin, that has homologs in distant species such as Caenorhabditis elegans and yeast. A few FRDA patients were found to have point mutations in X25, but the majority were homozygous for an unstable GAA trinucleotide expansion in the first X25 intron.
Collapse
Affiliation(s)
- V Campuzano
- Department de Genetica, University of Valencia, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
193
|
Vignal L, d'Aubenton-Carafa Y, Lisacek F, Mephu Ngüifo E, Rouzé P, Quinqueton J, Thermes C. Exon prediction in eucaryotic genomes. Biochimie 1996; 78:327-34. [PMID: 8905152 DOI: 10.1016/0300-9084(96)84765-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Two independent computer systems, NetPlantGene and AMELIE, dedicated to the identification of splice sites in plant and human genomes, respectively, are introduced here. Both methods were designed in relation to experimental work; they rely on automatically generated rules involving the nucleotide content of sequences regardless of the coding properties of exons. The specificity of plant sequences as considered in NetPlantGene is shown to enhance the quality of detection as opposed to general methods such as GRAIL. A scanning model of the acceptor site recognition is being simulated by AMELIE leading to a relatively accurate selection process of sites.
Collapse
Affiliation(s)
- L Vignal
- Laboratoire d'Informatique de Robotique et Micro-électronique de Montpellier (LIRMM), France
| | | | | | | | | | | | | |
Collapse
|
194
|
Uberbacher EC, Xu Y, Mural RJ. Discovering and understanding genes in human DNA sequence using GRAIL. Methods Enzymol 1996; 266:259-81. [PMID: 8743689 DOI: 10.1016/s0076-6879(96)66018-2] [Citation(s) in RCA: 100] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- E C Uberbacher
- Computer Sciences and Mathematics Division, Oak Ridge National Laboratory, Tennessee 37831, USA
| | | | | |
Collapse
|
195
|
Abstract
The identification of genes involved in human genetic disease is no longer the province of those who would make a career of 'not finding' a gene. Developments from the human genome initiative have vastly facilitated the process of localizing genetic intervals segregating mutations, as well as that of obtaining the physical regents necessary for characterizing the region. In a few years' time, efforts aimed at the assignment of genes to the physical map, coupled with increasing quantities of sequence data from both cDNA and genomic sources, will provide numerous candidate genes for analysis, with consequences for the approaches used to define the gene and mutations(s) involved in the disease of interest.
Collapse
Affiliation(s)
- D L Nelson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|