1
|
Wu X, Zeng Y, Guan J, Ji G, Huang R, Li QQ. Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana. BMC Genomics 2015; 16:511. [PMID: 26155789 PMCID: PMC4568572 DOI: 10.1186/s12864-015-1691-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 06/05/2015] [Indexed: 12/22/2022] Open
Abstract
Background Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants. Results Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations. Conclusions The identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1691-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Yong Zeng
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, China. .,Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, Fujian, China.
| | - Rongting Huang
- Department of Automation, Xiamen University, Xiamen, Fujian, China.
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education on Costal Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, China. .,Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, USA. .,Rice Research Institute, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian, China.
| |
Collapse
|
2
|
Nolan DJ, Ginsberg M, Israely E, Palikuqi B, Poulos MG, James D, Ding BS, Schachterle W, Liu Y, Rosenwaks Z, Butler JM, Xiang J, Rafii A, Shido K, Rabbany SY, Elemento O, Rafii S. Molecular signatures of tissue-specific microvascular endothelial cell heterogeneity in organ maintenance and regeneration. Dev Cell 2013; 26:204-19. [PMID: 23871589 DOI: 10.1016/j.devcel.2013.06.017] [Citation(s) in RCA: 468] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 04/01/2013] [Accepted: 06/18/2013] [Indexed: 02/08/2023]
Abstract
Microvascular endothelial cells (ECs) within different tissues are endowed with distinct but as yet unrecognized structural, phenotypic, and functional attributes. We devised EC purification, cultivation, profiling, and transplantation models that establish tissue-specific molecular libraries of ECs devoid of lymphatic ECs or parenchymal cells. These libraries identify attributes that confer ECs with their organotypic features. We show that clusters of transcription factors, angiocrine growth factors, adhesion molecules, and chemokines are expressed in unique combinations by ECs of each organ. Furthermore, ECs respond distinctly in tissue regeneration models, hepatectomy, and myeloablation. To test the data set, we developed a transplantation model that employs generic ECs differentiated from embryonic stem cells. Transplanted generic ECs engraft into regenerating tissues and acquire features of organotypic ECs. Collectively, we demonstrate the utility of informational databases of ECs toward uncovering the extravascular and intrinsic signals that define EC heterogeneity. These factors could be exploited therapeutically to engineer tissue-specific ECs for regeneration.
Collapse
Affiliation(s)
- Daniel J Nolan
- Department of Genetic Medicine, Howard Hughes Medical Institute, Weill Cornell Medical College, New York, NY 10065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS. Human genetics and genomics a decade after the release of the draft sequence of the human genome. Hum Genomics 2012; 5:577-622. [PMID: 22155605 PMCID: PMC3525251 DOI: 10.1186/1479-7364-5-6-577] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Collapse
Affiliation(s)
- Nasheen Naidoo
- Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
4
|
Cooper DN, Chen JM, Ball EV, Howells K, Mort M, Phillips AD, Chuzhanova N, Krawczak M, Kehrer-Sawatzki H, Stenson PD. Genes, mutations, and human inherited disease at the dawn of the age of personalized genomics. Hum Mutat 2010; 31:631-55. [PMID: 20506564 DOI: 10.1002/humu.21260] [Citation(s) in RCA: 117] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of reported germline mutations in human nuclear genes, either underlying or associated with inherited disease, has now exceeded 100,000 in more than 3,700 different genes. The availability of these data has both revolutionized the study of the morbid anatomy of the human genome and facilitated "personalized genomics." With approximately 300 new "inherited disease genes" (and approximately 10,000 new mutations) being identified annually, it is pertinent to ask how many "inherited disease genes" there are in the human genome, how many mutations reside within them, and where such lesions are likely to be located? To address these questions, it is necessary not only to reconsider how we define human genes but also to explore notions of gene "essentiality" and "dispensability."Answers to these questions are now emerging from recent novel insights into genome structure and function and through complete genome sequence information derived from multiple individual human genomes. However, a change in focus toward screening functional genomic elements as opposed to genes sensu stricto will be required if we are to capitalize fully on recent technical and conceptual advances and identify new types of disease-associated mutation within noncoding regions remote from the genes whose function they disrupt.
Collapse
Affiliation(s)
- David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Gerber HP, Senter PD, Grewal IS. Antibody drug-conjugates targeting the tumor vasculature: Current and future developments. MAbs 2010; 1:247-53. [PMID: 20069754 DOI: 10.4161/mabs.1.3.8515] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Reducing the blood supply of tumors is one modality to combat cancer. Monoclonal antibodies are now established as a key therapeutic approach for a range of diseases. Owing to the ability of antibodies to selectively target endothelial cells within the tumor vasculature, vascular targeting programs have become a mainstay in oncology drug development. However, the antitumor activity of single agent administration of conventional anti-angiogenic compounds is limited and the improvements in patient survival are most prominent in combinations with chemotherapy. Furthermore, prolonged treatment with conventional anti-angiogenic drugs is associated with toxicity and drug resistance. These circumstances provide a strong rationale for novel approaches to enhance the efficacy of mAbs targeting tumor vasculature such as antibody-drug conjugates (ADCs).Here, we review trends in the development of ADCs targeting tumor vasculature with the aim of informing future research and development of this class of therapeutics.
Collapse
Affiliation(s)
- Hans-Peter Gerber
- Department of Pre-Clinical Therapeutics, Seattle Genetics, Inc., Bothell, WA 98021, USA
| | | | | |
Collapse
|
6
|
Grinchuk OV, Jenjaroenpun P, Orlov YL, Zhou J, Kuznetsov VA. Integrative analysis of the human cis-antisense gene pairs, miRNAs and their transcription regulation patterns. Nucleic Acids Res 2009; 38:534-47. [PMID: 19906709 PMCID: PMC2811022 DOI: 10.1093/nar/gkp954] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Cis-antisense gene pairs (CASGPs) can transcribe mRNAs from an opposite strand of a given locus. To classify and understand diverse CASGP phenomena in the human we compiled a genome-wide catalog of CASGPs and integrated these sequences with microarray, SAGE and miRNA data. Using the concept of overlapping regions and clustering of SA transcripts by chromosome coordinates, we identified up to 9000 overlapping antisense loci. Four thousand three hundred and seventy-four of these CASGPs form 1759 complex gene architectures. We found that ∼35% (6347/18160) of RefSeq genes are overlapped with the antisense transcripts. About 30% of Affymetrix U133 microarray initial sequences map transcripts of ∼35% CASGPs and reveal mostly concordant expression in CASGPs. We found strong significant overrepresentation of human miRNA genes in loci of CASGPs. We developed a data-driven model of cross-talk between co-expressed CASGPs and DICER1-mediated miRNA pathway in normal spermatogenesis and in severe teratozoospermia. Specifically, we revealed complex SA structural–functional gene module composing the protein-coding genes, WDR6, DALRD3, NDUFAF3 and ncRNA precursors, mir-425 and mir-191, which could provide downregulation of ncRNA pathway via direct targeting DICER1 and basonuclin 2 transcripts by mir-425 and mir-191 in normal spermatogenesis, but this mechanism is switched off in severe teratozoospermia. The database is available from http://globalisland.bii.a-star.edu.sg/∼jiangtao/sas/index3.php?link =about
Collapse
Affiliation(s)
- Oleg V Grinchuk
- Bioinformatics Institute, 30 Biopolis Street #07-01, Singapore 138672, Singapore
| | | | | | | | | |
Collapse
|
7
|
Xu WJ, Wang ZX, Qiao ZD. Modified PCR methods for 3' end amplification from serial analysis of gene expression (SAGE) tags. FEBS J 2009; 276:2657-68. [PMID: 19459930 DOI: 10.1111/j.1742-4658.2009.06981.x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Serial analysis of gene expression (SAGE) is a powerful technique to study gene expression at the genome level. However, a disadvantage of the shortness of SAGE tags is that it prevents further study of SAGE library data, thus limiting extensive application of the SAGE method in gene expression studies. However, this problem can be solved by extension of the SAGE tags to 3' cDNAs. Therefore, several methods based on PCR have been developed to generate a 3' longer fragment cDNA corresponding to a SAGE tag. The list of modified methods is extensive, and includes rapid RT-PCR analysis of unknown SAGE tags (RAST-PCR), generation of longer cDNA fragments from SAGE tags for gene identification (GLGI), a high-throughput GLGI procedure, reverse SAGE (rSAGE), two-step analysis of unknown SAGE tags (TSAT-PCR), etc. These procedures are constantly being updated because they have characteristics and advantages that can be shared. Development of these methods has promoted the widespread use of the SAGE technique, and has accelerated the speed of studies of large-scale gene expression.
Collapse
Affiliation(s)
- Wang-Jie Xu
- College of Life Science and Technology, Bio-X Research Center, Key Laboratory of Developmental Genetics and Neuropsychiatric Diseases, Ministry of Education, Shanghai Jiao Tong University, China
| | | | | |
Collapse
|
8
|
Weikard R, Goldammer T, Eberlein A, Kuehn C. Novel transcripts discovered by mining genomic DNA from defined regions of bovine chromosome 6. BMC Genomics 2009; 10:186. [PMID: 19393061 PMCID: PMC2681481 DOI: 10.1186/1471-2164-10-186] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2008] [Accepted: 04/24/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Linkage analyses strongly suggest a number of QTL for production, health and conformation traits in the middle part of bovine chromosome 6 (BTA6). The identification of the molecular background underlying the genetic variation at the QTL and subsequent functional studies require a well-annotated gene sequence map of the critical QTL intervals. To complete the sequence map of the defined subchromosomal regions on BTA6 poorly covered with comparative gene information, we focused on targeted isolation of transcribed sequences from bovine bacterial artificial chromosome (BAC) clones mapped to the QTL intervals. RESULTS Using the method of exon trapping, 92 unique exon trapping sequences (ETS) were discovered in a chromosomal region of poor gene coverage. Sequence identity to the current NCBI sequence assembly for BTA6 was detected for 91% of unique ETS. Comparative sequence similarity search revealed that 11% of the isolated ETS displayed high similarity to genomic sequences located on the syntenic chromosomes of the human and mouse reference genome assemblies. Nearly a third of the ETS identified similar equivalent sequences in genomic sequence scaffolds from the alternative Celera-based sequence assembly of the human genome. Screening gene, EST, and protein databases detected 17% of ETS with identity to known transcribed sequences. Expression analysis of a subset of the ETS showed that most ETS (84%) displayed a distinctive expression pattern in a multi-tissue panel of a lactating cow verifying their existence in the bovine transcriptome. CONCLUSION The results of our study demonstrate that the exon trapping method based on region-specific BAC clones is very useful for targeted screening for novel transcripts located within a defined chromosomal region being deficiently endowed with annotated gene information. The majority of identified ETS represents unknown noncoding sequences in intergenic regions on BTA6 displaying a distinctive tissue-specific expression profile. However, their definite regulatory function has to be analyzed in further studies. The novel transcripts will add new sequence information to annotate a complete bovine genome sequence assembly, contribute to establish a detailed transcription map for targeted BTA6 regions and will also be helpful to dissect of the molecular and regulatory background of the QTL detected on BTA6.
Collapse
Affiliation(s)
- Rosemarie Weikard
- Forschungsinstitut für die Biologie Landwirtschaftlicher Nutztiere (FBN), Dummerstorf, Germany.
| | | | | | | |
Collapse
|
9
|
He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW. The antisense transcriptomes of human cells. Science 2008; 322:1855-7. [PMID: 19056939 PMCID: PMC2824178 DOI: 10.1126/science.1163853] [Citation(s) in RCA: 413] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Transcription in mammalian cells can be assessed at a genome-wide level, but it has been difficult to reliably determine whether individual transcripts are derived from the plus or minus strands of chromosomes. This distinction can be critical for understanding the relationship between known transcripts (sense) and the complementary antisense transcripts that may regulate them. Here, we describe a technique that can be used to (i) identify the DNA strand of origin for any particular RNA transcript, and (ii) quantify the number of sense and antisense transcripts from expressed genes at a global level. We examined five different human cell types and in each case found evidence for antisense transcripts in 2900 to 6400 human genes. The distribution of antisense transcripts was distinct from that of sense transcripts, was nonrandom across the genome, and differed among cell types. Antisense transcripts thus appear to be a pervasive feature of human cells, which suggests that they are a fundamental component of gene regulation.
Collapse
Affiliation(s)
- Yiping He
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute at The Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Bert Vogelstein
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute at The Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Victor E. Velculescu
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute at The Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Nickolas Papadopoulos
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute at The Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| | - Kenneth W. Kinzler
- The Ludwig Center for Cancer Genetics and Therapeutics and The Howard Hughes Medical Institute at The Johns Hopkins Kimmel Cancer Center, Baltimore, MD 21231, USA
| |
Collapse
|
10
|
Mallardo M, Poltronieri P, D'Urso OF. Non-protein coding RNA biomarkers and differential expression in cancers: a review. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2008; 27:19. [PMID: 18631387 PMCID: PMC2490676 DOI: 10.1186/1756-9966-27-19] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Accepted: 07/16/2008] [Indexed: 01/03/2023]
Abstract
Background In these years a huge number of human transcripts has been found that do not code for proteins, named non-protein coding RNAs. In most cases, small (miRNAs, snoRNAs) and long RNAs (antisense RNA, dsRNA, and long RNA species) have many roles, functioning as regulators of other mRNAs, at transcriptional and post-transcriptional level, and controlling protein ubiquitination and degradation. Various species of npcRNAs have been found differentially expressed in different types of cancer. This review discusses the published data and new results on the expression of a subset of npcRNAs. Conclusion These results underscore the complexity of the RNA world and provide further evidence on the involvement of functional RNAs in cancer cell growth control.
Collapse
Affiliation(s)
- Massimo Mallardo
- University of Napoli Federico II, Department of Biochemistry and Medical Biotechnologies, Via S, Pansini 5, Napoli, Italy.
| | | | | |
Collapse
|
11
|
Guerfali FZ, Laouini D, Guizani-Tabbane L, Ottones F, Ben-Aissa K, Benkahla A, Manchon L, Piquemal D, Smandi S, Mghirbi O, Commes T, Marti J, Dellagi K. Simultaneous gene expression profiling in human macrophages infected with Leishmania major parasites using SAGE. BMC Genomics 2008; 9:238. [PMID: 18495030 PMCID: PMC2430024 DOI: 10.1186/1471-2164-9-238] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2007] [Accepted: 05/21/2008] [Indexed: 01/16/2023] Open
Abstract
Background Leishmania (L) are intracellular protozoan parasites that are able to survive and replicate within the harsh and potentially hostile phagolysosomal environment of mammalian mononuclear phagocytes. A complex interplay then takes place between the macrophage (MΦ) striving to eliminate the pathogen and the parasite struggling for its own survival. To investigate this host-parasite conflict at the transcriptional level, in the context of monocyte-derived human MΦs (MDM) infection by L. major metacyclic promastigotes, the quantitative technique of serial analysis of gene expression (SAGE) was used. Results After extracting mRNA from resting human MΦs, Leishmania-infected human MΦs and L. major parasites, three SAGE libraries were constructed and sequenced generating up to 28,173; 57,514 and 33,906 tags respectively (corresponding to 12,946; 23,442 and 9,530 unique tags). Using computational data analysis and direct comparison to 357,888 publicly available experimental human tags, the parasite and the host cell transcriptomes were then simultaneously characterized from the mixed cellular extract, confidently discriminating host from parasite transcripts. This procedure led us to reliably assign 3,814 tags to MΦs' and 3,666 tags to L. major parasites transcripts. We focused on these, showing significant changes in their expression that are likely to be relevant to the pathogenesis of parasite infection: (i) human MΦs genes, belonging to key immune response proteins (e.g., IFNγ pathway, S100 and chemokine families) and (ii) a group of Leishmania genes showing a preferential expression at the parasite's intra-cellular developing stage. Conclusion Dual SAGE transcriptome analysis provided a useful, powerful and accurate approach to discriminating genes of human or parasitic origin in Leishmania-infected human MΦs. The findings presented in this work suggest that the Leishmania parasite modulates key transcripts in human MΦs that may be beneficial for its establishment and survival. Furthermore, these results provide an overview of gene expression at two developmental stages of the parasite, namely metacyclic promastigotes and intracellular amastigotes and indicate a broad difference between their transcriptomic profiles. Finally, our reported set of expressed genes will be useful in future rounds of data mining and gene annotation.
Collapse
Affiliation(s)
- Fatma Z Guerfali
- Laboratoire d'Immuno-Pathologie, Vaccinologie et Génétique Moléculaire, WHO Collaborating Center for Research and Training in Leishmaniasis, Institut Pasteur de Tunis, 13 place Pasteur, BP 74, 1002 Tunis-Belvédère, Tunisia.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Replogle K, Arnold AP, Ball GF, Band M, Bensch S, Brenowitz EA, Dong S, Drnevich J, Ferris M, George JM, Gong G, Hasselquist D, Hernandez AG, Kim R, Lewin HA, Liu L, Lovell PV, Mello CV, Naurin S, Rodriguez-Zas S, Thimmapuram J, Wade J, Clayton DF. The Songbird Neurogenomics (SoNG) Initiative: community-based tools and strategies for study of brain gene function and evolution. BMC Genomics 2008; 9:131. [PMID: 18366674 PMCID: PMC2329646 DOI: 10.1186/1471-2164-9-131] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2007] [Accepted: 03/18/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Songbirds hold great promise for biomedical, environmental and evolutionary research. A complete draft sequence of the zebra finch genome is imminent, yet a need remains for application of genomic resources within a research community traditionally focused on ethology and neurobiological methods. In response, we developed a core set of genomic tools and a novel collaborative strategy to probe gene expression in diverse songbird species and natural contexts. RESULTS We end-sequenced cDNAs from zebra finch brain and incorporated additional sequences from community sources into a database of 86,784 high quality reads. These assembled into 31,658 non-redundant contigs and singletons, which we annotated via BLAST search of chicken and human databases. The results are publicly available in the ESTIMA:Songbird database. We produced a spotted cDNA microarray with 20,160 addresses representing 17,214 non-redundant products of an estimated 11,500-15,000 genes, validating it by analysis of immediate-early gene (zenk) gene activation following song exposure and by demonstrating effective cross hybridization to genomic DNAs of other songbird species in the Passerida Parvorder. Our assembly was also used in the design of the "Lund-zfa" Affymetrix array representing approximately 22,000 non-redundant sequences. When the two arrays were hybridized to cDNAs from the same set of male and female zebra finch brain samples, both arrays detected a common set of regulated transcripts with a Pearson correlation coefficient of 0.895. To stimulate use of these resources by the songbird research community and to maintain consistent technical standards, we devised a "Community Collaboration" mechanism whereby individual birdsong researchers develop experiments and provide tissues, but a single individual in the community is responsible for all RNA extractions, labelling and microarray hybridizations. CONCLUSION Immediately, these results set the foundation for a coordinated set of 25 planned experiments by 16 research groups probing fundamental links between genome, brain, evolution and behavior in songbirds. Energetic application of genomic resources to research using songbirds should help illuminate how complex neural and behavioral traits emerge and evolve.
Collapse
Affiliation(s)
- Kirstin Replogle
- Cell & Developmental Biology, Univ. of Illinois, Urbana, IL, USA
- Institute for Genomic Biology, Univ. of Illinois, Urbana, IL, USA
| | | | - Gregory F Ball
- Psychological & Brain Sci., Johns Hopkins Univ., Baltimore, MD, USA
| | - Mark Band
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | | | - Eliot A Brenowitz
- Psychology, Biology, and Bloedel Hearing Research Center, Univ. of Washington, Seattle, WA, USA
| | - Shu Dong
- Cell & Developmental Biology, Univ. of Illinois, Urbana, IL, USA
| | - Jenny Drnevich
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | | | - Julia M George
- Mol. & Integrative Physiology, Univ. of Illinois, Urbana, IL, USA
| | - George Gong
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | | | - Alvaro G Hernandez
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | - Ryan Kim
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | - Harris A Lewin
- Institute for Genomic Biology, Univ. of Illinois, Urbana, IL, USA
- Animal Sciences, Univ. of Illinois, Urbana, IL, USA
| | - Lei Liu
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | - Peter V Lovell
- Neurological Sci. Inst., Oregon Hlth. Sci. Univ., Beaverton, OR, USA
| | - Claudio V Mello
- Neurological Sci. Inst., Oregon Hlth. Sci. Univ., Beaverton, OR, USA
| | - Sara Naurin
- Animal Ecology, Lund University, S-223 62 Lund, Sweden
| | | | - Jyothi Thimmapuram
- W.M. Keck Center for Comparative & Functional Genomics, Univ. of Illinois, Urbana, IL, USA
| | - Juli Wade
- Psychology, Zoology & Neuroscience, Michigan State Univ., East Lansing, MI, USA
| | - David F Clayton
- Cell & Developmental Biology, Univ. of Illinois, Urbana, IL, USA
- Institute for Genomic Biology, Univ. of Illinois, Urbana, IL, USA
- Neuroscience Program, Univ. of Illinois, Urbana, IL, USA
| |
Collapse
|
13
|
Abeel T, Saeys Y, Bonnet E, Rouzé P, Van de Peer Y. Generic eukaryotic core promoter prediction using structural features of DNA. Genes Dev 2008; 18:310-23. [PMID: 18096745 PMCID: PMC2203629 DOI: 10.1101/gr.6991408] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 11/14/2007] [Indexed: 11/24/2022]
Abstract
Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.
Collapse
Affiliation(s)
- Thomas Abeel
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Yvan Saeys
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Eric Bonnet
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| | - Pierre Rouzé
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
- Laboratoire Associé de l’INRA (France), Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium
- Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium
| |
Collapse
|
14
|
Abstract
While less than 1.5% of the mammalian genome encodes proteins, it is now evident that the vast majority is transcribed, mainly into non-protein-coding RNAs. This raises the question of what fraction of the genome is functional, i.e., composed of sequences that yield functional products, are required for the expression (regulation or processing) of these products, or are required for chromosome replication and maintenance. Many of the observed noncoding transcripts are differentially expressed, and, while most have not yet been studied, increasing numbers are being shown to be functional and/or trafficked to specific subcellular locations, as well as exhibit subtle evidence of selection. On the other hand, analyses of conservation patterns indicate that only approximately 5% (3%-8%) of the human genome is under purifying selection for functions common to mammals. However, these estimates rely on the assumption that reference sequences (usually ancient transposon-derived sequences) have evolved neutrally, which may not be the case, and if so would lead to an underestimate of the fraction of the genome under evolutionary constraint. These analyses also do not detect functional sequences that are evolving rapidly and/or have acquired lineage-specific functions. Indeed, many regulatory sequences and known functional noncoding RNAs, including many microRNAs, are not conserved over significant evolutionary distances, and recent evidence from the ENCODE project suggests that many functional elements show no detectable level of sequence constraint. Thus, it is likely that much more than 5% of the genome encodes functional information, and although the upper bound is unknown, it may be considerably higher than currently thought.
Collapse
Affiliation(s)
- Michael Pheasant
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland 4072, Australia
| | | |
Collapse
|