1
|
Complete mitochondrial genomes confirm the generic placement of the plateau vole, Neodon fuscus. Biosci Rep 2019; 39:BSR20182349. [PMID: 31262975 PMCID: PMC6689105 DOI: 10.1042/bsr20182349] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 05/25/2019] [Accepted: 05/31/2019] [Indexed: 11/22/2022] Open
Abstract
The plateau vole, Neodon fuscus is endemic to China and is distributed mainly in Qinghai Province. It is of public health interest, as it is, a potential reservoir of Toxoplasma gondii and the intermediate host of Echinococcus multilocularis. However, genetic data of this species are lacking, and its name and taxonomy are still a controversy. In the present study, we determined the nucleotide sequence of the entire mitochondrial (mt) genome of N. fuscus and analyzed its evolutionary relationship. The mitogenome was 16328 bp in length and contained 13 protein-coding genes, 22 genes for transfer RNAs (tRNA), two ribosomal RNA genes and two major noncoding regions (OL region and D-loop region). Most genes were located on the heavy strand. All tRNA genes had typical cloverleaf structures except for tRNASer (GCU). The mt genome of N. fuscus was rich in A+T (58.45%). Maximum likelihood (ML) and Bayesian methods yielded phylogenetic trees from 33 mt genomes of Arvicolinae, in which N. fuscus formed a sister group with Neodon irene and Neodon sikimensis to the exclusion of species of Microtus and other members of the Arvicolinae. Further phylogenetic analyses (ML only) based on the cytb gene sequences also demonstrated that N. fuscus had a close relationship with N. irene. The complete mitochondrial genome was successfully assembled and annotated, providing the necessary information for the phylogenetic analyses. Although the name Lasiopodomys fuscus was used in the book ‘Wilson & Reeder’s Mammal Species of the World’, we have confirmed here that its appropriate name is N. fuscus through an analysis of the evolutionary relationships.
Collapse
|
2
|
Dvorak P, Leupen S, Soucek P. Functionally Significant Features in the 5' Untranslated Region of the ABCA1 Gene and Their Comparison in Vertebrates. Cells 2019; 8:cells8060623. [PMID: 31234415 PMCID: PMC6627321 DOI: 10.3390/cells8060623] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 06/17/2019] [Accepted: 06/19/2019] [Indexed: 02/07/2023] Open
Abstract
Single nucleotide polymorphisms located in 5′ untranslated regions (5′UTRs) can regulate gene expression and have clinical impact. Recognition of functionally significant sequences within 5′UTRs is crucial in next-generation sequencing applications. Furthermore, information about the behavior of 5′UTRs during gene evolution is scarce. Using the example of the ATP-binding cassette transporter A1 (ABCA1) gene (Tangier disease), we describe our algorithm for functionally significant sequence finding. 5′UTR features (upstream start and stop codons, open reading frames (ORFs), GC content, motifs, and secondary structures) were studied using freely available bioinformatics tools in 55 vertebrate orthologous genes obtained from Ensembl and UCSC. The most conserved sequences were suggested as hot spots. Exon and intron enhancers and silencers (sc35, ighg2 cgamma2, ctnt, gh-1, and fibronectin eda exon), transcription factors (TFIIA, TATA, NFAT1, NFAT4, and HOXA13), some of them cancer related, and microRNA (hsa-miR-4474-3p) were localized to these regions. An upstream ORF, overlapping with the main ORF in primates and possibly coding for a small bioactive peptide, was also detected. Moreover, we showed several features of 5′UTRs, such as GC content variation, hairpin structure conservation or 5′UTR segmentation, which are interesting from a phylogenetic point of view and can stimulate further evolutionary oriented research.
Collapse
Affiliation(s)
- Pavel Dvorak
- Department of Biology, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300 Pilsen, Czech Republic.
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300 Pilsen, Czech Republic.
| | - Sarah Leupen
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
| | - Pavel Soucek
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Alej Svobody 76, 32300 Pilsen, Czech Republic.
- Toxicogenomics Unit, National Institute of Public Health, Srobarova 48, 100 42 Prague 10, Czech Republic.
| |
Collapse
|
3
|
Szyda J, Frąszczak M, Mielczarek M, Giannico R, Minozzi G, Nicolazzi EL, Kamiński S, Wojdak-Maksymiec K. The assessment of inter-individual variation of whole-genome DNA sequence in 32 cows. Mamm Genome 2015; 26:658-65. [PMID: 26475143 PMCID: PMC4653241 DOI: 10.1007/s00335-015-9606-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Accepted: 10/01/2015] [Indexed: 11/24/2022]
Abstract
Despite the growing number of sequenced bovine genomes, the knowledge of the population-wide variation of sequences remains limited. In many studies, statistical methodology was not applied in order to relate findings in the sequenced samples to a population-wide level. Our goal was to assess the population-wide variation in DNA sequence based on whole-genome sequences of 32 Holstein-Friesian cows. The number of SNPs significantly varied across individuals. The number of identified SNPs increased with coverage, following a logarithmic curve. A total of 15,272,427 SNPs were identified, 99.16 % of them being bi-allelic. Missense SNPs were classified into three categories based on their genomic location: housekeeping genes, genes undergoing strong selection, and genes neutral to selection. The number of missense SNPs was significantly higher within genes neutral to selection than in the other two categories. The number of variants located within 3'UTR and 5'UTR regions was also significantly different across gene families. Moreover, the number of insertions and deletions differed significantly among cows varying between 261,712 and 330,103 insertions and from 271,398 to 343,649 deletions. Results not only demonstrate inter-individual variation in the number of SNPs and indels but also show that the number of missense SNPs differs across genes representing different functional backgrounds.
Collapse
Affiliation(s)
- Joanna Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland.
| | - Magdalena Frąszczak
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland
| | - Magda Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland
| | - Riccardo Giannico
- Fondazione Parco Tecnologico Padano, Via Einstein Albert, 26900, Lodi, LO, Italy
| | - Giulietta Minozzi
- Fondazione Parco Tecnologico Padano, Via Einstein Albert, 26900, Lodi, LO, Italy.,DIVET, Università di Milano, Via Celoria 10, 20133, Milan, Italy
| | - Ezequiel L Nicolazzi
- Fondazione Parco Tecnologico Padano, Via Einstein Albert, 26900, Lodi, LO, Italy
| | - Stanislaw Kamiński
- Institute of Animal Genetics, University of Warmia and Mazury, Oczapowskiego 2, Olsztyn, 10-719, Poland
| | | |
Collapse
|
4
|
Gallego X, Cox RJ, Laughlin JR, Stitzel JA, Ehringer MA. Alternative CHRNB4 3'-UTRs mediate the allelic effects of SNP rs1948 on gene expression. PLoS One 2013; 8:e63699. [PMID: 23691088 PMCID: PMC3653846 DOI: 10.1371/journal.pone.0063699] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Accepted: 04/05/2013] [Indexed: 11/29/2022] Open
Abstract
Common genetic factors strongly contribute to both nicotine, the main addictive component of tobacco, and alcohol use. Several lines of evidence suggest nicotinic acetylcholine receptors as common sites of action for nicotine and alcohol. Specifically, rs1948, a single-nucleotide polymorphism (SNP) located in the CHRNB4 3′-untranslated region (UTR), has been associated to early age of initiation for both alcohol and tobacco use. To determine the allelic effects of rs1948 on gene expression, two rs1948-containing sequences of different lengths corresponding to the CHRNB4 3′-UTR were cloned into pGL3-promoter luciferase reporter vectors. Data obtained showed that the allelic effects of SNP rs1948 on luciferase expression are mediated by the length and species of transcripts generated. In addition, it was found that miR-3157 increased the overall luciferase expression while miR-138, a microRNA known to play a role in neuroadaptation to drug abuse, decreased luciferase expression when compared to basal conditions. These findings demonstrate the importance of SNP rs1948 on the regulation of CHRNB4 expression and provide the first evidence of CHRNB4 down-regulation by miR-138.
Collapse
Affiliation(s)
- Xavier Gallego
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Ryan J. Cox
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - James R. Laughlin
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Jerry A. Stitzel
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Marissa A Ehringer
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, Colorado, United States of America
- * E-mail:
| |
Collapse
|
5
|
Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, Prohaska SJ, Stadler BMR, Stadler PF, Tanzer A, Washietl S, Witwer C. Evolutionary patterns of non-coding RNAs. Theory Biosci 2012; 123:301-69. [PMID: 18202870 DOI: 10.1016/j.thbio.2005.01.002] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Accepted: 01/24/2005] [Indexed: 01/04/2023]
Abstract
A plethora of new functions of non-coding RNAs (ncRNAs) have been discovered in past few years. In fact, RNA is emerging as the central player in cellular regulation, taking on active roles in multiple regulatory layers from transcription, RNA maturation, and RNA modification to translational regulation. Nevertheless, very little is known about the evolution of this "Modern RNA World" and its components. In this contribution, we attempt to provide at least a cursory overview of the diversity of ncRNAs and functional RNA motifs in non-translated regions of regular messenger RNAs (mRNAs) with an emphasis on evolutionary questions. This survey is complemented by an in-depth analysis of examples from different classes of RNAs focusing mostly on their evolution in the vertebrate lineage. We present a survey of Y RNA genes in vertebrates and study the molecular evolution of the U7 snRNA, the snoRNAs E1/U17, E2, and E3, the Y RNA family, the let-7 microRNA (miRNA) family, and the mRNA-like evf-1 gene. We furthermore discuss the statistical distribution of miRNAs in metazoans, which suggests an explosive increase in the miRNA repertoire in vertebrates. The analysis of the transcription of ncRNAs suggests that small RNAs in general are genetically mobile in the sense that their association with a hostgene (e.g. when transcribed from introns of a mRNA) can change on evolutionary time scales. The let-7 family demonstrates, that even the mode of transcription (as intron or as exon) can change among paralogous ncRNA.
Collapse
|
6
|
Liu H, Yin J, Xiao M, Gao C, Mason AS, Zhao Z, Liu Y, Li J, Fu D. Characterization and evolution of 5' and 3' untranslated regions in eukaryotes. Gene 2012; 507:106-11. [PMID: 22846368 DOI: 10.1016/j.gene.2012.07.034] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Revised: 06/26/2012] [Accepted: 07/18/2012] [Indexed: 01/21/2023]
Abstract
Untranslated regions (UTRs) in eukaryotes play a significant role in the regulation of translation and mRNA half-life, as well as interacting with specific RNA-binding proteins. However, UTRs receive less attention than more crucial elements such as genes, and the basic structural and evolutionary characteristics of UTRs of different species, and the relationship between these UTRs and the genome size and species gene number is not well understood. To address these questions, we performed a comparative analysis of 5' and 3' untranslated regions of different species by analyzing the basic characteristics of 244,976 UTRs from three eukaryote kingdoms (Plantae, Fungi, and Protista). The results showed that the UTR lengths and SSR frequencies in UTRs increased significantly with increasing species gene number while the length and G+C content in 5' UTRs and different types of repetitive sequences in 3' UTRs increased with the increase of genome size. We also found that the sequence length of 5' UTRs was significantly positively correlated with the presence of transposons and SSRs while the sequence length of 3' UTRs was significantly positively correlated with the presence of tandem repeat sequences. These results suggested that evolution of species complexity from lower organisms to higher organisms is accompanied by an increase in the regulatory complexity of UTRs, mediated by increasing UTR length, increasing G+C content of 5' UTRs, and insertion and expansion of repetitive sequences.
Collapse
Affiliation(s)
- Honglei Liu
- Engineering Research Center of South Upland Agriculture of Ministry of Education, PR China, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Hung CY, Holliday BM, Kaur H, Yadav R, Kittur FS, Xie J. Identification and characterization of selenate- and selenite-responsive genes in a Se-hyperaccumulator Astragalus racemosus. Mol Biol Rep 2012; 39:7635-46. [PMID: 22362314 DOI: 10.1007/s11033-012-1598-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2011] [Accepted: 01/31/2012] [Indexed: 01/03/2023]
Abstract
Plants with capacity to accumulate high levels of selenium (Se) are desired for phytoremediation and biofortification. Plants of genus Astragalus accumulate and tolerate high levels of Se, but their slow growth, low biomass and non-edible properties limit their direct utilization. Genetic engineering may be an alternative way to produce edible or high biomass Se-accumulating plants. The first step towards this goal is to isolate genes that are responsible for Se accumulation and tolerance. Later, these genes can be introduced into other edible and high biomass plants. In the present study, we applied fluorescent differential display to analyze the transcript profile of Se-hyperaccumulator A. racemosus treated with 20 μM selenate (K(2)SeO(4)) for 2 weeks. Among 125 identified Se-responsive candidate genes, the expression levels of nine were induced or suppressed more than twofold by selenate treatment in two independent experiments while 14 showed such changes when treated with selenite (K(2)SeO(3)). Six of them were found to respond to both selenate and selenite treatments. A novel gene CEJ367 was found to be highly induced by both selenate (1,920-fold) and selenite (579-fold). Root- or shoot-preferential expression of nine genes was further investigated. These identified genes may allow us to create Se-enriched transgenic plants.
Collapse
Affiliation(s)
- Chiu-Yueh Hung
- Department of Pharmaceutical Sciences, Biomanufacturing Research Institute & Technology Enterprise, North Carolina Central University, Durham, NC 27707, USA
| | | | | | | | | | | |
Collapse
|
8
|
The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects. BMC Genomics 2011; 12:221. [PMID: 21569260 PMCID: PMC3115881 DOI: 10.1186/1471-2164-12-221] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2010] [Accepted: 05/10/2011] [Indexed: 11/18/2022] Open
Abstract
Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history.
Collapse
|
9
|
Mrinal N, Nagaraju J. Intron loss is associated with gain of function in the evolution of the gloverin family of antibacterial genes in Bombyx mori. J Biol Chem 2008; 283:23376-87. [PMID: 18524767 DOI: 10.1074/jbc.m801080200] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Gene duplication is a characteristic feature of eukaryotic genomes. Here we investigated the role of gene duplication in the evolution of the gloverin family of antibacterial genes (Bmglv1, Bmglv2, Bmglv3, and Bmglv4) in Bombyx mori. We observed the following two significant changes during the first duplication event: (i) loss of intronV, located in the 3'-untranslated region (UTR) of the ancestral gene Bmglv1, and (ii) 12-bp deletion in exon3. We show that loss of intronV during Bmglv1 to Bmglv2 duplication was associated with embryonic expression of Bmglv2. Gel mobility shift, chromatin immunoprecipitation, and immunodepletion assays identified chorion factor 2, a zinc finger protein, as the repressor molecule that bound to a 10-bp regulatory motif in intronV of Bmglv1 and repressed its transcription. gloverin paralogs that lacked intronV were independent of chorion factor 2 regulation and expressed in embryo. These results suggest that change in cis-regulation because of intron loss resulted in embryonic expression of Bmglv2-4, a gain of function over Bmglv1. Studies on the significance of intron loss have focused on introns present within the coding sequences for their potential effect on the open reading frame, whereas introns present in the UTRs of the genes were not given due attention. This study emphasizes the regulatory function of the 3'-UTR intron. In addition, we also studied the genomic loss and show that "in-frame" deletion of 12 nucleotides led to loss of amino acids IHDF resulting in the generation of a prepro-processing site in BmGlv2. As a result, the N-terminal pro-part of BmGlv2, but not of BmGlv1, gets processed in an infection-dependent manner suggesting that prepro-processing is an evolved feature in Gloverins.
Collapse
Affiliation(s)
- Nirotpal Mrinal
- Laboratory of Molecular Genetics, Centre for DNA Fingerprinting and Diagnostics, Hyderabad-500076, India
| | | |
Collapse
|
10
|
Roy SW, Penny D, Neafsey DE. Evolutionary conservation of UTR intron boundaries in Cryptococcus. Mol Biol Evol 2007; 24:1140-8. [PMID: 17374879 DOI: 10.1093/molbev/msm045] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Despite significant progress, the general functional and evolutionary significance of the untranslated regions (UTRs) of eukaryotic transcripts remain mysterious. Particularly mysterious is the common occurrence of spliceosomal introns in transcript UTRs because UTR splicing is not necessary for restoration of transcript coding sequence. In general, it is not known to what extent such splicing performs an important function or merely represents spliceosomal "noise." We conducted the first analysis of evolutionary conservation of UTR splicing. Among 4 species from Cryptococcus neoformans species complex, we find high levels of conservation of UTR intron boundary sequences, strongly suggesting that UTR intron splicing is conserved by purifying selection. We estimate that 50-90% of splice boundaries are maintained by selection. Donor site sequences are more highly conserved than acceptor sequences, and splicing boundaries are more conserved in 5' UTRs than in 3' UTRs. In addition, we report a variety of differences between patterns of UTR splicing in Cryptococcus and corresponding patterns in animals and plants. These results focus attention on the functional roles of eukaryotic UTRs and deepen the mystery of UTR intron splicing.
Collapse
Affiliation(s)
- Scott William Roy
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand.
| | | | | |
Collapse
|
11
|
Norgren RB. Expression arrays for macaque monkeys. Transplant Rev (Orlando) 2006. [DOI: 10.1016/j.trre.2006.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Spindel ER, Pauley MA, Jia Y, Gravett C, Thompson SL, Boyle NF, Ojeda SR, Norgren RB. Leveraging human genomic information to identify nonhuman primate sequences for expression array development. BMC Genomics 2005; 6:160. [PMID: 16288651 PMCID: PMC1314899 DOI: 10.1186/1471-2164-6-160] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2005] [Accepted: 11/15/2005] [Indexed: 01/16/2023] Open
Abstract
Background Nonhuman primates (NHPs) are essential for biomedical research due to their similarities to humans. The utility of NHPs will be greatly increased by the application of genomics-based approaches such as gene expression profiling. Sequence information from the 3' end of genes is the key resource needed to create oligonucleotide expression arrays. Results We have developed the algorithms and procedures necessary to quickly acquire sequence information from the 3' end of nonhuman primate orthologs of human genes. To accomplish this, we identified terminal exons of over 15,000 human genes by aligning mRNA sequences with genomic sequence. We found the mean length of complete last exons to be approximately 1,400 bp, significantly longer than previous estimates. We designed primers to amplify genomic DNA, which included at least 300 bp of the terminal exon. We cloned and sequenced the PCR products representing over 5,500 Macaca mulatta (rhesus monkey) orthologs of human genes. This sequence information has been used to select probes for rhesus gene expression profiling. We have also tested 10 sets of primers with genomic DNA from Macaca fascicularis (Cynomolgus monkey), Papio hamadryas (Baboon), and Chlorocebus aethiops (African green monkey, vervet). The results indicate that the primers developed for this study will be useful for acquiring sequence from the 3' end of genes for other nonhuman primate species. Conclusion This study demonstrates that human genomic DNA sequence can be leveraged to obtain sequence from the 3' end of NHP orthologs and that this sequence can then be used to generate NHP oligonucleotide microarrays. Affymetrix and Agilent used sequences obtained with this approach in the design of their rhesus macaque oligonucleotide microarrays.
Collapse
Affiliation(s)
- Eliot R Spindel
- Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA
| | - Mark A Pauley
- College of Information Science & Technology, University of Nebraska at Omaha, Omaha, NE, 68182 USA
| | - Yibing Jia
- Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA
| | - Courtney Gravett
- Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA
| | - Shaun L Thompson
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - Nicholas F Boyle
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - Sergio R Ojeda
- Division of Neuroscience, Oregon National Primate Research Center, Beaverton, OR 97006, USA
| | - Robert B Norgren
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| |
Collapse
|
13
|
Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV. Evolutionary conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic genes. Nucleic Acids Res 2005; 33:5512-20. [PMID: 16186132 PMCID: PMC1236974 DOI: 10.1093/nar/gki847] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
By comparing sequences of human, mouse and rat orthologous genes, we show that in 5′-untranslated regions (5′-UTRs) of mammalian cDNAs but not in 3′-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 5′-UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20–30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5′-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5′-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5′-UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation.
Collapse
Affiliation(s)
| | - Igor B. Rogozin
- National Center for Biotechnology Information NLM, National Institutes of HealthBethesda MD 20894, USA
| | - Vladimir N. Babenko
- National Center for Biotechnology Information NLM, National Institutes of HealthBethesda MD 20894, USA
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information NLM, National Institutes of HealthBethesda MD 20894, USA
- To whom correspondence should be addressed. Tel: +1 301 435 5913; Fax: +1 301 435 7794;
| |
Collapse
|
14
|
Sivko GS, Sanford DC, Dearth LD, Tang D, DeWille JW. CCAAT/Enhancer binding protein delta (c/EBPdelta) regulation and expression in human mammary epithelial cells: II. Analysis of activating signal transduction pathways, transcriptional, post-transcriptional, and post-translational control. J Cell Biochem 2005; 93:844-56. [PMID: 15389878 DOI: 10.1002/jcb.20224] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
CCAAT/enhancer binding protein delta (C/EBPdelta) plays a key role in mammary epithelial cell G0 growth arrest. C/EBPdelta gene expression is down-regulated in rodent mammary tumorigenesis and in human breast cancer, suggesting that "loss of function" alterations in C/EBPdelta gene expression are common in mammary gland malignancies. The goal of this study was to systematically investigate the mechanisms controlling C/EBPdelta gene expression in MCF-10A and MCF-12A human mammary epithelial cell lines. The results demonstrate that G0 growth arrest conditions (i.e., serum and growth factor withdrawal or Oncostatin M (OSM) treatment) result in the activation of JAK1, JAK2, and Tyk 2, members of the Janus kinase family of non-receptor tyrosine kinases, in MCF-10A and MCF-12A cells. Growth arrest or OSM treatment also specifically increases activated (phosphorylated) signal transduction and activators of transcription 3 (STAT3) levels, demonstrating that STAT3, not STAT1 or STAT5, is the downstream target of the activated Janus kinases in MCF-10A and MCF-12A cells. Whole cell lysates from G0 growth arrested (GA) and OSM-treated MCF-12A cells exhibit increased acute phase response element (APRE) binding compared to lysates from growing (GR) MCF-12A cells. Transient transfection using C/EBPdelta promoter-luciferase constructs demonstrated that the APRE (STAT3) consensus binding site is essential for growth arrest or OSM induction of the C/EBPdelta promoter. Mutation of the C/EBPdelta promoter STAT3 site or expression of a dominant negative STAT3 construct (STAT3delta) reduces C/EBPdelta promoter activity in response to growth arrest conditions. The human C/EBPdelta promoter also contains an Sp1 site at -61 bp (relative to the transcriptional start site) which is required for basal transcriptional activation. Mutation or deletion of the Sp1 site decreases promoter activity in response to growth arrest conditions. Treatment with the transcriptional inhibitor actinomycin D demonstrated that the C/EBPdelta mRNA exhibits a relatively short half-life (approximately 40 min). Similarly, treatment with the translational inhibitor anisomysin demonstrated that the C/EBPdelta protein half-life was also relatively short (approximately 160 min). These results indicate that the human C/EBPdelta gene is controlled at multiple levels, consistent with a role for C/EBPdelta in cell cycle control and/or cell fate determination.
Collapse
Affiliation(s)
- G S Sivko
- Department of Veterinary Biosciences and Division of Molecular Biology and Cancer Genetics, Ohio State Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, USA
| | | | | | | | | |
Collapse
|
15
|
Kochetov AV, Sarai A, Rogozin IB, Shumny VK, Kolchanov NA. The role of alternative translation start sites in the generation of human protein diversity. Mol Genet Genomics 2005; 273:491-6. [PMID: 15959805 DOI: 10.1007/s00438-005-1152-7] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2004] [Accepted: 03/29/2005] [Indexed: 11/29/2022]
Abstract
According to the scanning model, 40S ribosomal subunits initiate translation at the first (5' proximal) AUG codon they encounter. However, if the first AUG is in a suboptimal context, it may not be recognized, and translation can then initiate at downstream AUG(s). In this way, a single RNA can produce several variant products. Earlier experiments suggested that some of these additional protein variants might be functionally important. We have analysed human mRNAs that have AUG triplets in 5' untranslated regions and mRNAs in which the annotated translational start codon is located in a suboptimal context. It was found that 3% of human mRNAs have the potential to encode N-terminally extended variants of the annotated proteins and 12% could code for N-truncated variants. The predicted subcellular localizations of these protein variants were compared: 31% of the N-extended proteins and 30% of the N-truncated proteins were predicted to localize to subcellular compartments that differed from those targeted by the annotated protein forms. These results suggest that additional AUGs may frequently be exploited for the synthesis of proteins that possess novel functional properties.
Collapse
Affiliation(s)
- Alex V Kochetov
- Institute of Cytology and Genetics, Novosibirsk 630090, Russia.
| | | | | | | | | |
Collapse
|
16
|
Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, Koyanagi KO, Barrero RA, Tamura T, Yamaguchi-Kabata Y, Tanino M, Yura K, Miyazaki S, Ikeo K, Homma K, Kasprzyk A, Nishikawa T, Hirakawa M, Thierry-Mieg J, Thierry-Mieg D, Ashurst J, Jia L, Nakao M, Thomas MA, Mulder N, Karavidopoulou Y, Jin L, Kim S, Yasuda T, Lenhard B, Eveno E, Suzuki Y, Yamasaki C, Takeda JI, Gough C, Hilton P, Fujii Y, Sakai H, Tanaka S, Amid C, Bellgard M, Bonaldo MDF, Bono H, Bromberg SK, Brookes AJ, Bruford E, Carninci P, Chelala C, Couillault C, de Souza SJ, Debily MA, Devignes MD, Dubchak I, Endo T, Estreicher A, Eyras E, Fukami-Kobayashi K, R. Gopinath G, Graudens E, Hahn Y, Han M, Han ZG, Hanada K, Hanaoka H, Harada E, Hashimoto K, Hinz U, Hirai M, Hishiki T, Hopkinson I, Imbeaud S, Inoko H, Kanapin A, Kaneko Y, Kasukawa T, Kelso J, Kersey P, Kikuno R, Kimura K, Korn B, Kuryshev V, Makalowska I, Makino T, Mano S, Mariage-Samson R, Mashima J, Matsuda H, Mewes HW, Minoshima S, Nagai K, Nagasaki H, Nagata N, Nigam R, Ogasawara O, Ohara O, Ohtsubo M, Okada N, Okido T, Oota S, Ota M, Ota T, Otsuki T, Piatier-Tonneau D, Poustka A, Ren SX, Saitou N, Sakai K, Sakamoto S, Sakate R, Schupp I, Servant F, Sherry S, Shiba R, Shimizu N, Shimoyama M, Simpson AJ, Soares B, Steward C, Suwa M, Suzuki M, Takahashi A, Tamiya G, Tanaka H, Taylor T, Terwilliger JD, Unneberg P, Veeramachaneni V, Watanabe S, Wilming L, Yasuda N, Yoo HS, Stodolsky M, Makalowski W, Go M, Nakai K, Takagi T, Kanehisa M, Sakaki Y, Quackenbush J, Okazaki Y, Hayashizaki Y, Hide W, Chakraborty R, Nishikawa K, Sugawara H, Tateno Y, Chen Z, Oishi M, Tonellato P, Apweiler R, Okubo K, Wagner L, Wiemann S, Strausberg RL, Isogai T, Auffray C, Nomura N, Gojobori T, Sugano S. Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biol 2004; 2:e162. [PMID: 15103394 PMCID: PMC393292 DOI: 10.1371/journal.pbio.0020162] [Citation(s) in RCA: 267] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2003] [Accepted: 04/01/2004] [Indexed: 01/08/2023] Open
Abstract
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
Collapse
Affiliation(s)
- Tadashi Imanishi
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Takeshi Itoh
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 2Bioinformatics Laboratory, Genome Research Department, National Institute of Agrobiological SciencesIbarakiJapan
| | - Yutaka Suzuki
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of TokyoTokyoJapan
| | - Claire O'Donovan
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Satoshi Fukuchi
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | | | - Roberto A Barrero
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Takuro Tamura
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 8BITS CompanyShizuokaJapan
| | - Yumi Yamaguchi-Kabata
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Motohiko Tanino
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Kei Yura
- 9Quantum Bioinformatics Group, Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research InstituteKyotoJapan
| | - Satoru Miyazaki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Kazuho Ikeo
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Keiichi Homma
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Arek Kasprzyk
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Tetsuo Nishikawa
- 10Reverse Proteomics Research InstituteChibaJapan
- 11Central Research Laboratory, HitachiTokyoJapan
| | - Mika Hirakawa
- 12Bioinformatics Center, Institute for Chemical Research, Kyoto UniversityKyotoJapan
| | - Jean Thierry-Mieg
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
- 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique MathematiqueMontpellierFrance
| | - Danielle Thierry-Mieg
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
- 14Centre National de la Recherche Scientifique (CNRS), Laboratoire de Physique MathematiqueMontpellierFrance
| | - Jennifer Ashurst
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Libin Jia
- 16National Cancer Institute, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Mitsuteru Nakao
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Michael A Thomas
- 17Department of Biological Sciences, Idaho State UniversityPocatello, IdahoUnited States of America
| | - Nicola Mulder
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Youla Karavidopoulou
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Lihua Jin
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Sangsoo Kim
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | | | - Boris Lenhard
- 19Center for Genomics and Bioinformatics, Karolinska InstitutetStockholmSweden
| | - Eric Eveno
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Yoshiyuki Suzuki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Chisato Yamasaki
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Jun-ichi Takeda
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Craig Gough
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Phillip Hilton
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Yasuyuki Fujii
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Hiroaki Sakai
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 22Tokyo Research Laboratories, Kyowa Hakko Kogyo CompanyTokyoJapan
| | - Susumu Tanaka
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Clara Amid
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Matthew Bellgard
- 24Centre for Bioinformatics and Biological Computing, School of Information Technology, Murdoch UniversityMurdoch, Western AustraliaAustralia
| | - Maria de Fatima Bonaldo
- 25Medical Education and Biomedical Research Facility, University of IowaIowa City, IowaUnited States of America
| | - Hidemasa Bono
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Susan K Bromberg
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | - Anthony J Brookes
- 19Center for Genomics and Bioinformatics, Karolinska InstitutetStockholmSweden
| | - Elspeth Bruford
- 28HUGO Gene Nomenclature Committee, University College LondonLondonUnited Kingdom
| | | | - Claude Chelala
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | - Christine Couillault
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | | | - Marie-Anne Debily
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | | | - Inna Dubchak
- 32Lawrence Berkeley National Laboratory, BerkeleyCaliforniaUnited States of America
| | - Toshinori Endo
- 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental UniversityTokyoJapan
| | | | - Eduardo Eyras
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Kaoru Fukami-Kobayashi
- 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba InstituteIbarakiJapan
| | - Gopal R. Gopinath
- 36Genome Knowledgebase, Cold Spring Harbor LaboratoryCold Spring Harbor, New YorkUnited States of America
| | - Esther Graudens
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Yoonsoo Hahn
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | - Michael Han
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Ze-Guang Han
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
| | - Kousuke Hanada
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideki Hanaoka
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Erimi Harada
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Katsuyuki Hashimoto
- 38Division of Genetic Resources, National Institute of Infectious DiseasesTokyoJapan
| | - Ursula Hinz
- 34Swiss Institute of BioinformaticsGenevaSwitzerland
| | - Momoki Hirai
- 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of TokyoChibaJapan
| | - Teruyoshi Hishiki
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Ian Hopkinson
- 41Department of Primary Care and Population Sciences, Royal Free University College Medical School, University College LondonLondonUnited Kingdom
- 42Clinical and Molecular Genetics Unit, The Institute of Child HealthLondonUnited Kingdom
| | - Sandrine Imbeaud
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Hidetoshi Inoko
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Alexander Kanapin
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Yayoi Kaneko
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Takeya Kasukawa
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Janet Kelso
- 44South African National Bioinformatics Institute, University of the Western CapeBellvilleSouth Africa
| | - Paul Kersey
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | | | | | - Bernhard Korn
- 46RZPD Resource Center for Genome ResearchHeidelbergGermany
| | - Vladimir Kuryshev
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Izabela Makalowska
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Takashi Makino
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Shuhei Mano
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Regine Mariage-Samson
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
| | - Jun Mashima
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideo Matsuda
- 49Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka UniversityOsakaJapan
| | - Hans-Werner Mewes
- 23MIPS—Institute for Bioinformatics, GSF—National Research Center for Environment and HealthNeuherbergGermany
| | - Shinsei Minoshima
- 50Medical Photobiology Department, Photon Medical Research Center, Hamamatsu University School of MedicineShizuokaJapan
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | | | - Hideki Nagasaki
- 51Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Naoki Nagata
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Rajni Nigam
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | - Osamu Ogasawara
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | | | - Masafumi Ohtsubo
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | - Norihiro Okada
- 53Department of Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute of TechnologyKanagawaJapan
| | - Toshihisa Okido
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Satoshi Oota
- 35Bioresource Information Division, RIKEN BioResource Center, RIKEN Tsukuba InstituteIbarakiJapan
| | - Motonori Ota
- 54Global Scientific Information and Computing Center, Tokyo Institute of TechnologyTokyoJapan
| | - Toshio Ota
- 22Tokyo Research Laboratories, Kyowa Hakko Kogyo CompanyTokyoJapan
| | - Tetsuji Otsuki
- 55Molecular Biology Laboratory, Medicinal Research Laboratories, Taisho Pharmaceutical CompanySaitamaJapan
| | | | - Annemarie Poustka
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Shuang-Xi Ren
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
| | - Naruya Saitou
- 56Department of Population Genetics, National Institute of GeneticsShizuokaJapan
| | - Katsunaga Sakai
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Shigetaka Sakamoto
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Ryuichi Sakate
- 39Graduate School of Frontier Sciences, Department of Integrated Biosciences, University of TokyoChibaJapan
| | - Ingo Schupp
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Florence Servant
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Stephen Sherry
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Rie Shiba
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Nobuyoshi Shimizu
- 52Department of Molecular Biology, Keio University School of MedicineTokyoJapan
| | - Mary Shimoyama
- 27Medical College of Wisconsin, MilwaukeeWisconsinUnited States of America
| | | | - Bento Soares
- 25Medical Education and Biomedical Research Facility, University of IowaIowa City, IowaUnited States of America
| | - Charles Steward
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Makiko Suwa
- 51Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Mami Suzuki
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Aiko Takahashi
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Gen Tamiya
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
- 43Department of Genetic Information, Division of Molecular Life Science, School of Medicine, Tokai UniversityKanagawaJapan
| | - Hiroshi Tanaka
- 33Department of Bioinformatics, Medical Research Institute, Tokyo Medical and Dental UniversityTokyoJapan
| | - Todd Taylor
- 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Joseph D Terwilliger
- 58Columbia University and Columbia Genome CenterNew York, New YorkUnited States of America
| | - Per Unneberg
- 59Department of Biotechnology, Royal Institute of TechnologyStockholmSweden
| | - Vamsi Veeramachaneni
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Shinya Watanabe
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Laurens Wilming
- 15The Wellcome Trust Sanger Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Norikazu Yasuda
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 7Integrated Database Group, Japan Biological Information Research Center, Japan Biological Informatics ConsortiumTokyoJapan
| | - Hyang-Sook Yoo
- 18Korea Research Institute of Bioscience and BiotechnologyTaejeonKorea
| | - Marvin Stodolsky
- 60Biology Division and Genome Task Group, Office of Biological and Environmental Research, United States Department of EnergyWashington, D.CUnited States of America
| | - Wojciech Makalowski
- 48Pennsylvania State UniversityUniversity Park, PennsylvaniaUnited States of America
| | - Mitiko Go
- 61Faculty of Bio-Science, Nagahama Institute of Bio-Science and TechnologyShigaJapan
| | - Kenta Nakai
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Toshihisa Takagi
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
| | - Minoru Kanehisa
- 12Bioinformatics Center, Institute for Chemical Research, Kyoto UniversityKyotoJapan
| | - Yoshiyuki Sakaki
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 57Human Genome Research Group, Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - John Quackenbush
- 62Institute for Genomic ResearchRockville, MarylandUnited States of America
| | - Yasushi Okazaki
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Yoshihide Hayashizaki
- 26Genome Exploration Research Group, RIKEN Genomic Sciences Center, RIKEN Yokohama InstituteKanagawaJapan
| | - Winston Hide
- 44South African National Bioinformatics Institute, University of the Western CapeBellvilleSouth Africa
| | - Ranajit Chakraborty
- 63Center for Genome Information, Department of Environmental Health, University of CincinnatiCincinnati, OhioUnited States of America
| | - Ken Nishikawa
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Hideaki Sugawara
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Yoshio Tateno
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
| | - Zhu Chen
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
- 37Chinese National Human Genome Center at ShanghaiShanghaiChina
- 64State Key Laboratory of Medical Genomics, Shanghai Institute of Hematology, Rui-Jin Hospital, Shanghai Second Medical UniversityShanghaiChina
| | | | - Peter Tonellato
- 65PointOne SystemsWauwatosa, WisconsinUnited States of America
| | - Rolf Apweiler
- 4EMBL Outstation—European Bioinformatics Institute, Wellcome Trust Genome CampusCambridgeUnited Kingdom
| | - Kousaku Okubo
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Lukas Wagner
- 13National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Stefan Wiemann
- 47Molecular Genome Analysis, German Cancer Research Center-DKFZHeidelbergGermany
| | - Robert L Strausberg
- 16National Cancer Institute, National Institutes of HealthBethesda, MarylandUnited States of America
| | - Takao Isogai
- 10Reverse Proteomics Research InstituteChibaJapan
- 66Graduate School of Life and Environmental Sciences, University of TsukubaIbarakiJapan
| | - Charles Auffray
- 20Genexpress—CNRS—Functional Genomics and Systemic Biology for HealthVillejuif CedexFrance
- 21Sino-French Laboratory in Life Sciences and GenomicsShanghaiChina
| | - Nobuo Nomura
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
| | - Takashi Gojobori
- 1Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 5Center for Information Biology and DNA Data Bank of Japan, National Institute of GeneticsShizuokaJapan
- 67Department of Genetics, Graduate University for Advanced StudiesShizuokaJapan
| | - Sumio Sugano
- 3Human Genome Center, The Institute of Medical Science, The University of TokyoTokyoJapan
- 40Functional Genomics Group, Biological Information Research Center, National Institute of Advanced Industrial Science and TechnologyTokyoJapan
- 68Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of TokyoTokyoJapan
| |
Collapse
|
17
|
Abstract
Many non-coding sequences transcribed from the mammalian genome are proving to have important regulatory roles, but the functions of the majority remain mysterious. For decades, researchers have focused most of their attention on protein-coding genes and proteins. With the completion of the human and mouse genomes and the accumulation of data on the mammalian transcriptome, the focus now shifts to non-coding DNA sequences, RNA-coding genes and their transcripts. Many non-coding transcribed sequences are proving to have important regulatory roles, but the functions of the majority remain mysterious.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
18
|
Shabalina SA, Ogurtsov AY, Rogozin IB, Koonin EV, Lipman DJ. Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids Res 2004; 32:1774-82. [PMID: 15031317 PMCID: PMC390323 DOI: 10.1093/nar/gkh313] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Sequencing of multiple, nearly complete eukaryotic genomes creates opportunities for detecting previously unnoticed, subtle functional signals in non-coding regions. A genome-wide comparative analysis of orthologous sets of mammalian and yeast mRNAs revealed distinct patterns of evolutionary conservation at the boundaries of the untranslated regions (UTRs) and the coding region (CDS). Elevated sequence conservation was detected in approximately 30 nt regions around the start codon. There seems to be a complementary relationship between sequence conservation in the approximately 30 nt regions of the 5'-UTR immediately upstream of the start codon and that in the synonymous positions of the 5'-terminal 30 nt of the CDS: in mammalian mRNAs, the 5'-UTR shows a greater conservation than the CDS, whereas the opposite trend holds for yeast mRNAs. Unexpectedly, a approximately 30 nt region downstream of the stop codon shows a substantially lower level of sequence conservation than the downstream portions of the 3'-UTRs. However, the sequence in this poorly conserved 30 nt portion of the 3'-UTR is non-random in that it has a higher GC content than the rest of the UTR. It is hypothesized that the elevated sequence conservation in the region immediately upstream of the start codon is related to the requirement for initiation factor binding during pre-initiation ribosomal scanning. In contrast, the poorly conserved region downstream of the stop codon could be involved in the post- termination scanning and dissociation of the ribosomes from the mRNA, which requires only the mRNA-ribosome interaction. Additionally, it was found that the choice of the stop codon in mammals, but not in yeasts, and the context in the immediate vicinity of the stop codons in both mammals and yeasts are subject to strong selection. Thus, genome-wide analysis of orthologous gene sets allows detection of previously unrecognized patterns of sequence conservation, which are likely to reflect hidden functional signals, such as ribosomal filters that could regulate translation by modulating the interaction between the mRNA and ribosomes.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | | | |
Collapse
|
19
|
Casadei R, Strippoli P, D'Addabbo P, Canaider S, Lenzi L, Vitale L, Giannone S, Frabetti F, Facchin F, Carinci P, Zannotti M. mRNA 5′ region sequence incompleteness: a potential source of systematic errors in translation initiation codon assignment in human mRNAs. Gene 2003; 321:185-93. [PMID: 14637006 DOI: 10.1016/s0378-1119(03)00835-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The amino acid sequence of gene products is routinely deduced from the nucleotide sequence of the relative cloned cDNA, according to the rules for recognition of start codon (first-AUG rule, optimal sequence context) and the genetic code. From this prediction stem most subsequent types of product analysis, although all standard methods for cDNA cloning are affected by a potential inability to effectively clone the 5' region of mRNA. Revision by bioinformatics and cloning methods of 109 known genes located on human chromosome 21 (HC 21) shows that 60 mRNAs lack any in-frame stop upstream of the first-AUG, and that in five cases (DSCR1, KIAA0184, KIAA0539, SON, and TFF3) the coding region at the 5' end was incompletely characterized in the original descriptions. We describe the respective consequences for genomic annotation, domain and ortholog identification, and functional experiments design. We have also analyzed the sequences of 13,124 human mRNAs (RefSeq databank), discovering that in 6448 cases (49%), an in-frame stop codon is present upstream of the initiation codon, while in the other 6676 mRNAs (51%), identification of additional bases at the mRNA 5' region could well reveal some new upstream in-frame AUG codons in the optimal context. Proportionally to the HC 21 data, about 550 known human genes might thus be affected by this 5' end mRNA artifact.
Collapse
Affiliation(s)
- Raffaella Casadei
- Center for Research into Molecular Genetics Fondazione CARISBO, Institute of Histology and General Embryology, University of Bologna, Via Belmeloro, 8-40126 Bologna, Italy
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Shabalina SA, Ogurtsov AY, Lipman DJ, Kondrashov AS. Patterns in interspecies similarity correlate with nucleotide composition in mammalian 3'UTRs. Nucleic Acids Res 2003; 31:5433-9. [PMID: 12954780 PMCID: PMC203331 DOI: 10.1093/nar/gkg751] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Post-transcriptional regulation and the formation of mRNA 3' ends are crucial for gene expression in eukaryotes. Interspecies conservation of many sequences within 3'UTRs reveals selective constraint due to similar function. To study the pattern of conservation within 3'UTRs, we compiled and aligned 50 sets of complete orthologous 3'UTRs from four orders of mammals. We observed a mosaic pattern of conservation, with alternating regions of high (phylogenetic footprints) and low similarity. Conservation in 3'UTRs correlates with their base composition and also with the synonymous substitution rate in corresponding coding regions. The non-uniform distribution of conservation is more pronounced for 3'UTRs with a moderate or low level of overall conservation, where invariant nucleotides are more numerous, and their runs of lengths 4-7 occur more frequently than if conservation were random. Many runs of invariant nucleotides are AU-rich or pyrimidine-rich. Some of these runs coincide with known functional cis- elements of eukaryotic mRNAs, such as the U-rich upstream element, polyadenylation signal and DICE regulatory signal. More divergent regions of multiple alignments of 3'UTRs are often more G- and/or C-rich. Our results provide evidence on the importance of moderately conserved regions in 3'UTRs and suggest that regulatory functions of 3'UTRs might utilize gene-specific information in these regions.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Institutes of Health, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|