1
|
Ghosh S, Sati S, Sengupta S, Scaria V. Distinct patterns of epigenetic marks and transcription factor binding sites across promoters of sense-intronic long noncoding RNAs. J Genet 2016; 94:17-25. [PMID: 25846873 DOI: 10.1007/s12041-015-0484-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Long noncoding RNAs (lncRNAs) are a new class of noncoding RNAs that have been extensively studied in the recent past as a regulator of gene expression, including modulation of epigenetic regulation. The lncRNAs class encompasses a number of subclasses, classified based on their genomic loci and relation to protein-coding genes. Functional differences between subclasses have been increasingly studied in the recent years, though the regulation of expression and biogenesis of lncRNAs have been poorly studied. The availability of genome-scale datasets of epigenetic marks has motivated us to understand the patterns and processes of epigenetic regulation of lncRNAs. Here we analysed the occurrence of expressive and repressive histone marks at the transcription start site (TSS) of lncRNAs and their subclasses, and compared these profiles with that of the protein-coding regions. We observe distinct differences in the density of histone marks across the TSS of a few lncRNA subclasses. The sense-intronic lncRNA subclass showed a paucity for mapped histone marks across the TSS which were significantly different than all the lncRNAs and protein-coding genes in most cases. Similar pattern was also observed for the density of transcription factor binding sites (TFBS). These observations were generally consistent across cell and tissue types. The differences in density across the promoter were significantly associated with the expression level of the genes, but the differences between the densities across long noncoding and protein-coding gene promoters were consistent irrespective of the expression levels. Apart from suggesting general differences in epigenetic regulatory marks across long noncoding RNA promoters, our analysis suggests a possible alternative mechanism of regulation and/or biogenesis of sense-intronic lncRNAs.
Collapse
Affiliation(s)
- Sourav Ghosh
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology, Mall Road, New Delhi 110 007, India.
| | | | | | | |
Collapse
|
2
|
Philippe N, Bou Samra E, Boureux A, Mancheron A, Rufflé F, Bai Q, De Vos J, Rivals E, Commes T. Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome. Nucleic Acids Res 2013; 42:2820-32. [PMID: 24357408 PMCID: PMC3950697 DOI: 10.1093/nar/gkt1300] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.
Collapse
Affiliation(s)
- Nicolas Philippe
- Transcriptomics, bioinformatics and myeloid leukaemia, INSERM, U1040, Institute for Research in Biotherapy, Montpellier F-34197, France, Université Montpellier 2, Montpellier, France, Institut de Biologie Computationnelle, Maison de la modélisation, Université Montpellier 2, France, LIRMM, MAB, CNRS UMR 5506, Université Montpellier 2, Montpellier, France and Genomic instability of pluripotent stem cells, INSERM, U1040, Institute for Research in Biotherapy, Montpellier F-34197, France
| | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Sun J, Zhou M, Mao ZT, Hao DP, Wang ZZ, Li CX. Systematic analysis of genomic organization and structure of long non-coding RNAs in the human genome. FEBS Lett 2013; 587:976-82. [PMID: 23454638 DOI: 10.1016/j.febslet.2013.02.036] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Revised: 02/17/2013] [Accepted: 02/19/2013] [Indexed: 12/28/2022]
Abstract
The genomic architecture of several functional elements in animals and plants, such as microRNAs and tRNA, has been better characterized. As yet, there is very little known about genomic organization and structure of lncRNA in animals and plants. Here, we conducted a genome-wide systematic computational analysis of genomic architecture of lncRNAs, and further provided a more comprehensive comparative view of genomic organization between lncRNAs and several other functional elements in the human genome. Our study not only provides comprehensive knowledge for further studies into the correlations between the genomic architecture of lncRNAs and their important functional roles in diverse cellular processes and in disease, but also will be valuable for understanding the origin and evolution of lncRNAs.
Collapse
Affiliation(s)
- Jie Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | | | | | | | | | | |
Collapse
|
4
|
Jia H, Osak M, Bogu GK, Stanton LW, Johnson R, Lipovich L. Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA (NEW YORK, N.Y.) 2010; 16:1478-1487. [PMID: 20587619 DOI: 10.1261/rna.1951310.4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from approximately 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical-protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
Collapse
Affiliation(s)
- Hui Jia
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MO 48202, USA
| | | | | | | | | | | |
Collapse
|
5
|
Jia H, Osak M, Bogu GK, Stanton LW, Johnson R, Lipovich L. Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA (NEW YORK, N.Y.) 2010; 16:1478-87. [PMID: 20587619 PMCID: PMC2905748 DOI: 10.1261/rna.1951310] [Citation(s) in RCA: 293] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Experimental evidence suggests that half or more of the mammalian transcriptome consists of noncoding RNA. Noncoding RNAs are divided into short noncoding RNAs (including microRNAs) and long noncoding RNAs (lncRNAs). We defined complementary DNAs (cDNAs) lacking any positive-strand open reading frames (ORFs) longer than 30 amino acids, as well as cDNAs lacking any evidence of interspecies conservation of their longer-than-30-amino acid ORFs, as noncoding. We have identified 5446 lncRNA genes in the human genome from approximately 24,000 full-length cDNAs, using our new ORF-prediction pipeline. We combined them nonredundantly with lncRNAs from four published sources to derive 6736 lncRNA genes. In an effort to distinguish standalone and antisense lncRNA genes from database artifacts, we stratified our catalog of lncRNAs according to the distance between each lncRNA gene candidate and its nearest known protein-coding gene. We concurrently examined the protein-coding capacity of known genes overlapping with lncRNAs. Remarkably, 62% of known genes with "hypothetical protein" names actually lacked protein-coding capacity. This study has greatly expanded the known human lncRNA catalog, increased its accuracy through manual annotation of cDNA-to-genome alignments, and revealed that a large set of hypothetical-protein genes in GenBank lacks protein-coding capacity. In addition, we have developed, independently of existing NCBI tools, command-line programs with high-throughput ORF-finding and BLASTP-parsing functionality, suitable for future automated assessments of protein-coding capacity of novel transcripts.
Collapse
Affiliation(s)
- Hui Jia
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MO 48202, USA
| | | | | | | | | | | |
Collapse
|
6
|
Zhang Y, Liu J, Jia C, Li T, Wu R, Wang J, Chen Y, Zou X, Chen R, Wang XJ, Zhu D. Systematic identification and evolutionary features of rhesus monkey small nucleolar RNAs. BMC Genomics 2010; 11:61. [PMID: 20100322 PMCID: PMC2832892 DOI: 10.1186/1471-2164-11-61] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 01/25/2010] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Recent studies have demonstrated that non-protein-coding RNAs (npcRNAs/ncRNAs) play important roles during eukaryotic development, species evolution, and in the etiology of disease. Rhesus macaques are the most widely used primate model in both biomedical research and primate evolutionary studies. However, most reports on these animals focus on the functional roles of protein-coding sequences, whereas very little is known about macaque ncRNAs. RESULTS In the present study, we performed the first systematic profiling of intermediate-size ncRNAs (50 to 500 nt) from the rhesus monkey by constructing a cDNA library. We identified 117 rhesus monkey ncRNAs, including 80 small nucleolar RNAs (snoRNAs), 29 other types of known RNAs (snRNAs, Y RNA, and others), and eight unclassified ncRNAs. Comparative genomic analysis and northern blot hybridizations demonstrated that some snoRNAs were lineage- or species-specific. Paralogous sequences were found for most rhesus monkey snoRNAs, the expression of which might be attributable to extensive duplication within the rhesus monkey genome. Further investigation of snoRNA flanking sequences showed that some rhesus monkey snoRNAs are retrogenes derived from L1-mediated integration. Finally, phylogenetic analysis demonstrated that birds and primates share some snoRNAs and host genes thereof, suggesting that both the relevant host genes and the snoRNAs contained therein may be inherited from a common ancestor. However, some rhesus monkey snoRNAs hosted by non-ribosome-related genes appeared after the evolutionary divergence between birds and mammals. CONCLUSIONS We provide the first experimentally-derived catalog of rhesus monkey ncRNAs and uncover some interesting genomic and evolutionary features. These findings provide important information for future functional characterization of snoRNAs during primate evolution.
Collapse
Affiliation(s)
- Yong Zhang
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, PR China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Zhang Y, Wang J, Huang S, Zhu X, Liu J, Yang N, Song D, Wu R, Deng W, Skogerbø G, Wang XJ, Chen R, Zhu D. Systematic identification and characterization of chicken (Gallus gallus) ncRNAs. Nucleic Acids Res 2009; 37:6562-74. [PMID: 19720738 PMCID: PMC2770669 DOI: 10.1093/nar/gkp704] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Recent studies have demonstrated that non-coding RNAs (ncRNAs) play important roles during development and evolution. Chicken, the first genome-sequenced non-mammalian amniote, possesses unique features for developmental and evolutionary studies. However, apart from microRNAs, information on chicken ncRNAs has mainly been obtained from computational predictions without experimental validation. In the present study, we performed a systematic identification of intermediate size ncRNAs (50–500 nt) by ncRNA library construction and identified 125 chicken ncRNAs. Importantly, through the bioinformatics and expression analysis, we found the chicken ncRNAs has several novel features: (i) comparative genomic analysis against 18 sequenced vertebrate genomes revealed that the majority of the newly identified ncRNA candidates is not conserved and most are potentially bird/chicken specific, suggesting that ncRNAs play roles in lineage/species specification during evolution. (ii) The expression pattern analysis of intronic snoRNAs and their host genes suggested the coordinated expression between snoRNAs and their host genes. (iii) Several spatio-temporal specific expression patterns suggest involvement of ncRNAs in tissue development. Together, these findings provide new clues for future functional study of ncRNAs during development and evolution.
Collapse
Affiliation(s)
- Yong Zhang
- National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100005, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Weikard R, Goldammer T, Eberlein A, Kuehn C. Novel transcripts discovered by mining genomic DNA from defined regions of bovine chromosome 6. BMC Genomics 2009; 10:186. [PMID: 19393061 PMCID: PMC2681481 DOI: 10.1186/1471-2164-10-186] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2008] [Accepted: 04/24/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Linkage analyses strongly suggest a number of QTL for production, health and conformation traits in the middle part of bovine chromosome 6 (BTA6). The identification of the molecular background underlying the genetic variation at the QTL and subsequent functional studies require a well-annotated gene sequence map of the critical QTL intervals. To complete the sequence map of the defined subchromosomal regions on BTA6 poorly covered with comparative gene information, we focused on targeted isolation of transcribed sequences from bovine bacterial artificial chromosome (BAC) clones mapped to the QTL intervals. RESULTS Using the method of exon trapping, 92 unique exon trapping sequences (ETS) were discovered in a chromosomal region of poor gene coverage. Sequence identity to the current NCBI sequence assembly for BTA6 was detected for 91% of unique ETS. Comparative sequence similarity search revealed that 11% of the isolated ETS displayed high similarity to genomic sequences located on the syntenic chromosomes of the human and mouse reference genome assemblies. Nearly a third of the ETS identified similar equivalent sequences in genomic sequence scaffolds from the alternative Celera-based sequence assembly of the human genome. Screening gene, EST, and protein databases detected 17% of ETS with identity to known transcribed sequences. Expression analysis of a subset of the ETS showed that most ETS (84%) displayed a distinctive expression pattern in a multi-tissue panel of a lactating cow verifying their existence in the bovine transcriptome. CONCLUSION The results of our study demonstrate that the exon trapping method based on region-specific BAC clones is very useful for targeted screening for novel transcripts located within a defined chromosomal region being deficiently endowed with annotated gene information. The majority of identified ETS represents unknown noncoding sequences in intergenic regions on BTA6 displaying a distinctive tissue-specific expression profile. However, their definite regulatory function has to be analyzed in further studies. The novel transcripts will add new sequence information to annotate a complete bovine genome sequence assembly, contribute to establish a detailed transcription map for targeted BTA6 regions and will also be helpful to dissect of the molecular and regulatory background of the QTL detected on BTA6.
Collapse
Affiliation(s)
- Rosemarie Weikard
- Forschungsinstitut für die Biologie Landwirtschaftlicher Nutztiere (FBN), Dummerstorf, Germany.
| | | | | | | |
Collapse
|
9
|
Song D, Yang Y, Yu B, Zheng B, Deng Z, Lu BL, Chen X, Jiang T. Computational prediction of novel non-coding RNAs in Arabidopsis thaliana. BMC Bioinformatics 2009; 10 Suppl 1:S36. [PMID: 19208137 PMCID: PMC2648795 DOI: 10.1186/1471-2105-10-s1-s36] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Non-coding RNA (ncRNA) genes do not encode proteins but produce functional RNA molecules that play crucial roles in many key biological processes. Recent genome-wide transcriptional profiling studies using tiling arrays in organisms such as human and Arabidopsis have revealed a great number of transcripts, a large portion of which have little or no capability to encode proteins. This unexpected finding suggests that the currently known repertoire of ncRNAs may only represent a small fraction of ncRNAs of the organisms. Thus, efficient and effective prediction of ncRNAs has become an important task in bioinformatics in recent years. Among the available computational methods, the comparative genomic approach seems to be the most powerful to detect ncRNAs. The recent completion of the sequencing of several major plant genomes has made the approach possible for plants. Results We have developed a pipeline to predict novel ncRNAs in the Arabidopsis (Arabidopsis thaliana) genome. It starts by comparing the expressed intergenic regions of Arabidopsis as provided in two whole-genome high-density oligo-probe arrays from the literature with the intergenic nucleotide sequences of all completely sequenced plant genomes including rice (Oryza sativa), poplar (Populus trichocarpa), grape (Vitis vinifera), and papaya (Carica papaya). By using multiple sequence alignment, a popular ncRNA prediction program (RNAz), wet-bench experimental validation, protein-coding potential analysis, and stringent screening against various ncRNA databases, the pipeline resulted in 16 families of novel ncRNAs (with a total of 21 ncRNAs). Conclusion In this paper, we undertake a genome-wide search for novel ncRNAs in the genome of Arabidopsis by a comparative genomics approach. The identified novel ncRNAs are evolutionarily conserved between Arabidopsis and other recently sequenced plants, and may conduct interesting novel biological functions.
Collapse
Affiliation(s)
- Dandan Song
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, PR China.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
Non-protein-coding sequences increasingly dominate the genomes of multicellular organisms as their complexity increases, in contrast to protein-coding genes, which remain relatively static. Most of the mammalian genome and indeed that of all eukaryotes is expressed in a cell- and tissue-specific manner, and there is mounting evidence that much of this transcription is involved in the regulation of differentiation and development. Different classes of small and large noncoding RNAs (ncRNAs) have been shown to regulate almost every level of gene expression, including the activation and repression of homeotic genes and the targeting of chromatin-remodeling complexes. ncRNAs are involved in developmental processes in both simple and complex eukaryotes, and we illustrate this in the latter by focusing on the animal germline, brain, and eye. While most have yet to be systematically studied, the emerging evidence suggests that there is a vast hidden layer of regulatory ncRNAs that constitutes the majority of the genomic programming of multicellular organisms and plays a major role in controlling the epigenetic trajectories that underlie their ontogeny.
Collapse
|
11
|
Mallardo M, Poltronieri P, D'Urso OF. Non-protein coding RNA biomarkers and differential expression in cancers: a review. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2008; 27:19. [PMID: 18631387 PMCID: PMC2490676 DOI: 10.1186/1756-9966-27-19] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Accepted: 07/16/2008] [Indexed: 01/03/2023]
Abstract
Background In these years a huge number of human transcripts has been found that do not code for proteins, named non-protein coding RNAs. In most cases, small (miRNAs, snoRNAs) and long RNAs (antisense RNA, dsRNA, and long RNA species) have many roles, functioning as regulators of other mRNAs, at transcriptional and post-transcriptional level, and controlling protein ubiquitination and degradation. Various species of npcRNAs have been found differentially expressed in different types of cancer. This review discusses the published data and new results on the expression of a subset of npcRNAs. Conclusion These results underscore the complexity of the RNA world and provide further evidence on the involvement of functional RNAs in cancer cell growth control.
Collapse
Affiliation(s)
- Massimo Mallardo
- University of Napoli Federico II, Department of Biochemistry and Medical Biotechnologies, Via S, Pansini 5, Napoli, Italy.
| | | | | |
Collapse
|
12
|
Lee TM, Lipovich L. Structural differences of orthologous genes: insights from human-primate comparisons. Genomics 2008; 92:134-43. [PMID: 18606524 DOI: 10.1016/j.ygeno.2008.05.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2007] [Revised: 04/16/2008] [Accepted: 05/02/2008] [Indexed: 01/15/2023]
Abstract
The genomic basis of phenotypic distinctions between humans and nonhuman primates remains insufficiently explained. We hypothesized that interspecies structural differences of orthologous genes can cause such distinctions and searched protein-coding genes conserved between humans and nonhuman primates for species-specific initial and terminal exons. We inferred gene structure differences from genomic locations where portions of primate transcripts aligned with the human genome outside of any human exons. Of 22,466 high-confidence FANTOM3 human transcriptional units, 7424 (33%) had nonhuman primate full-length cDNA support. One hundred eighty-three of the loci contained 68,424 bp of sequence exonic in nonhuman primates but not humans. Fifty-four of 183 included species-specific portions of protein-coding regions. Six genes had evidence of intergenic splicing in a nonhuman primate but not in human. It is imperative that primate transcriptome projects be accelerated on par with genome projects to understand better interspecies gene structure distinctions.
Collapse
Affiliation(s)
- Tuan Meng Lee
- School of Computer Engineering, Nanyang Technological University, Singapore
| | | |
Collapse
|
13
|
Warden CD, Kim SH, Yi SV. Predicted functional RNAs within coding regions constrain evolutionary rates of yeast proteins. PLoS One 2008; 3:e1559. [PMID: 18270559 PMCID: PMC2216430 DOI: 10.1371/journal.pone.0001559] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Accepted: 12/30/2007] [Indexed: 11/25/2022] Open
Abstract
Functional RNAs (fRNAs) are being recognized as an important regulatory component in biological processes. Interestingly, recent computational studies suggest that the number and biological significance of functional RNAs within coding regions (coding fRNAs) may have been underestimated. We hypothesized that such coding fRNAs will impose additional constraint on sequence evolution because the DNA primary sequence has to simultaneously code for functional RNA secondary structures on the messenger RNA in addition to the amino acid codons for the protein sequence. To test this prediction, we first utilized computational methods to predict conserved fRNA secondary structures within multiple species alignments of Saccharomyces sensu strico genomes. We predict that as much as 5% of the genes in the yeast genome contain at least one functional RNA secondary structure within their protein-coding region. We then analyzed the impact of coding fRNAs on the evolutionary rate of protein-coding genes because a decrease in evolutionary rate implies constraint due to biological functionality. We found that our predicted coding fRNAs have a significant influence on evolutionary rates (especially at synonymous sites), independent of other functional measures. Thus, coding fRNA may play a role on sequence evolution. Given that coding regions of humans and flies contain many more predicted coding fRNAs than yeast, the impact of coding fRNAs on sequence evolution may be substantial in genomes of higher eukaryotes.
Collapse
Affiliation(s)
- Charles D. Warden
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Seong-Ho Kim
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Soojin V. Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
- *E-mail:
| |
Collapse
|
14
|
Korbel JO, Urban AE, Grubert F, Du J, Royce TE, Starr P, Zhong G, Emanuel BS, Weissman SM, Snyder M, Gerstein MB. Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci U S A 2007; 104:10110-5. [PMID: 17551006 PMCID: PMC1891248 DOI: 10.1073/pnas.0703834104] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.
Collapse
Affiliation(s)
- Jan O. Korbel
- Departments of *Molecular Biophysics and Biochemistry and
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
- To whom correspondence may be addressed. E-mail: , , or
| | - Alexander Eckehart Urban
- Genetics, Yale University School of Medicine, New Haven, CT 06520
- Departments of Molecular, Cellular, and Developmental Biology and
| | - Fabian Grubert
- Genetics, Yale University School of Medicine, New Haven, CT 06520
| | - Jiang Du
- Computer Science, Yale University, New Haven, CT 06520; and
| | | | - Peter Starr
- Departments of *Molecular Biophysics and Biochemistry and
| | - Guoneng Zhong
- Departments of *Molecular Biophysics and Biochemistry and
| | - Beverly S. Emanuel
- **Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104
| | | | - Michael Snyder
- Departments of Molecular, Cellular, and Developmental Biology and
- To whom correspondence may be addressed. E-mail: , , or
| | - Mark B. Gerstein
- Departments of *Molecular Biophysics and Biochemistry and
- Computer Science, Yale University, New Haven, CT 06520; and
- To whom correspondence may be addressed. E-mail: , , or
| |
Collapse
|
15
|
Philippe H, Blanchette M. Proceedings of the First International Conference on Phylogenomics. March 15-19, 2006. Quebec, Canada. BMC Evol Biol 2007; 7 Suppl 1:S1-16. [PMID: 17288567 PMCID: PMC1796603 DOI: 10.1186/1471-2148-7-s1-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The First Phylogenomics Conference was held in Ste-Adèle (Québec, Canada) in March 2006. Selected papers appear in this special issue of BMC Evolutionary Biology. Here, we give an introduction to the field and provide an overview of the articles presented in this issue.
Collapse
Affiliation(s)
- Hervé Philippe
- Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de Biochimie, Université de Montréal, 2900 Boulevard Édouard-Montpetit, Montréal, Québec, H3T 1J4, Canada
| | - Mathieu Blanchette
- McGill Centre for Bioinformatics, McGill University, 3775 University Steet, Montréal, Québec, H3A 2B4, Canada
| |
Collapse
|