1
|
Tullemans BM, Karel MF, Léopold V, ten Brink MS, Baaten CC, Maas SL, de Vos AF, Eble JA, Nijziel MR, van der Vorst EP, Cosemans JM, Heemskerk JW, Claushuis TA, Kuijpers MJ. Comparison of inhibitory effects of irreversible and reversible Btk inhibitors on platelet function. EJHAEM 2021; 2:685-699. [PMID: 35845214 PMCID: PMC9175945 DOI: 10.1002/jha2.269] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/13/2021] [Accepted: 07/13/2021] [Indexed: 12/11/2022]
Abstract
All irreversible Bruton tyrosine kinase (Btk) inhibitors including ibrutinib and acalabrutinib induce platelet dysfunction and increased bleeding risk. New reversible Btk inhibitors were developed, like MK-1026. The mechanism underlying increased bleeding tendency with Btk inhibitors remains unclear. We investigated the effects of ibrutinib, acalabrutinib and MK-1026 on platelet function in healthy volunteers, patients and Btk-deficient mice, together with off-target effects on tyrosine kinase phosphorylation. All inhibitors suppressed GPVI- and CLEC-2-mediated platelet aggregation, activation and secretion in a dose-dependent manner. Only ibrutinib inhibited thrombus formation on vWF-co-coated surfaces, while on collagen this was not affected. In blood from Btk-deficient mice, collagen-induced thrombus formation under flow was reduced, but preincubation with either inhibitor was without additional effects. MK-1026 showed less off-target effects upon GPVI-induced TK phosphorylation as compared to ibrutinib and acalabrutinib. In ibrutinib-treated patients, GPVI-stimulated platelet activation, and adhesion on vWF-co-coated surfaces were inhibited, while CLEC-2 stimulation induced variable responses. The dual inhibition of GPVI and CLEC-2 signalling by Btk inhibitors might account for the increased bleeding tendency, with ibrutinib causing more high-grade bleedings due to additional inhibition of platelet-vWF interaction. As MK-1026 showed less off-target effects and only affected activation of isolated platelets, it might be promising for future treatment.
Collapse
Affiliation(s)
- Bibian M.E. Tullemans
- Department of BiochemistryCardiovascular Research Institute MaastrichtMaastricht UniversityMaastrichtThe Netherlands
| | - Mieke F.A. Karel
- Department of BiochemistryCardiovascular Research Institute MaastrichtMaastricht UniversityMaastrichtThe Netherlands
| | - Valentine Léopold
- Center for Experimental and Molecular MedicineAmsterdam University Medical Centres, Academic Medical CentreUniversity of AmsterdamAmsterdamThe Netherlands
- Hopital LariboisiereDepartment of Anaesthesiology and Critical CareParisFrance
| | - Marieke S. ten Brink
- Center for Experimental and Molecular MedicineAmsterdam University Medical Centres, Academic Medical CentreUniversity of AmsterdamAmsterdamThe Netherlands
| | - Constance C.F.M.J. Baaten
- Department of BiochemistryCardiovascular Research Institute MaastrichtMaastricht UniversityMaastrichtThe Netherlands
- Institute for Molecular Cardiovascular Research (IMCAR)University Hospital AachenAachenGermany
| | - Sanne L. Maas
- Institute for Molecular Cardiovascular Research (IMCAR)University Hospital AachenAachenGermany
- Interdisciplinary Center for Clinical Research (IZKF)RWTH Aachen UniversityAachenGermany
| | - Alex F. de Vos
- Center for Experimental and Molecular MedicineAmsterdam University Medical Centres, Academic Medical CentreUniversity of AmsterdamAmsterdamThe Netherlands
| | - Johannes A. Eble
- Institute of Physiological Chemistry and PathobiochemistryUniversity of MünsterMünsterGermany
| | - Marten R. Nijziel
- Department of HaematologyCatharina Hospital EindhovenEindhovenThe Netherlands
| | - Emiel P.C. van der Vorst
- Institute for Molecular Cardiovascular Research (IMCAR)University Hospital AachenAachenGermany
- Interdisciplinary Center for Clinical Research (IZKF)RWTH Aachen UniversityAachenGermany
- Department of PathologyCardiovascular Research Institute Maastricht (CARIM)Maastricht University Medical CentreMaastrichtNetherlands
- Institute for Cardiovascular Prevention (IPEK)Ludwig‐Maximilians‐University MunichMunichGermany
| | - Judith M.E.M. Cosemans
- Department of BiochemistryCardiovascular Research Institute MaastrichtMaastricht UniversityMaastrichtThe Netherlands
| | - Johan W.M. Heemskerk
- Department of BiochemistryCardiovascular Research Institute MaastrichtMaastricht UniversityMaastrichtThe Netherlands
| | | | - Marijke J.E. Kuijpers
- Department of BiochemistryCardiovascular Research Institute MaastrichtMaastricht UniversityMaastrichtThe Netherlands
- Thrombosis Expertise Centre, Heart and Vascular CentreMaastricht University Medical CentreMaastrichtThe Netherlands
| |
Collapse
|
2
|
Seymour BJ, Singh S, Certo HM, Sommer K, Sather BD, Khim S, Clough C, Hale M, Pangallo J, Ryu BY, Khan IF, Adair JE, Rawlings DJ. Effective, safe, and sustained correction of murine XLA using a UCOE-BTK promoter-based lentiviral vector. MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT 2021; 20:635-651. [PMID: 33718514 PMCID: PMC7907679 DOI: 10.1016/j.omtm.2021.01.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 01/14/2021] [Indexed: 02/06/2023]
Abstract
X-linked agammaglobulinemia (XLA) is an immune disorder caused by mutations in Bruton’s tyrosine kinase (BTK). BTK is expressed in B and myeloid cells, and its deficiency results in a lack of mature B cells and protective antibodies. We previously reported a lentivirus (LV) BTK replacement therapy that restored B cell development and function in Btk and Tec double knockout mice (a phenocopy of human XLA). In this study, with the goal of optimizing both the level and lineage specificity of BTK expression, we generated LV incorporating the proximal human BTK promoter. Hematopoietic stem cells from Btk−/−Tec−/− mice transduced with this vector rescued lineage-specific expression and restored B cell function in Btk−/−Tec−/− recipients. Next, we tested addition of candidate enhancers and/or ubiquitous chromatin opening elements (UCOEs), as well as codon optimization to improve BTK expression. An Eμ enhancer improved B cell rescue, but increased immunoglobulin G (IgG) autoantibodies. Addition of the UCOE avoided autoantibody generation while improving B cell development and function and reducing vector silencing. An optimized vector containing a truncated UCOE upstream of the BTK promoter and codon-optimized BTK cDNA resulted in stable, lineage-regulated BTK expression that mirrored endogenous BTK, making it a strong candidate for XLA therapy.
Collapse
Affiliation(s)
- Brenda J Seymour
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Swati Singh
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Hannah M Certo
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Karen Sommer
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Blythe D Sather
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Socheath Khim
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Courtnee Clough
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Malika Hale
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Joseph Pangallo
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Byoung Y Ryu
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Iram F Khan
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Jennifer E Adair
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.,Department of Medical Oncology, University of Washington, Seattle, WA 98195, USA
| | - David J Rawlings
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA 98101, USA.,Departments of Pediatrics and Immunology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
3
|
Gao Z, Zhao R, Ruan J. A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks. BMC Genomics 2013; 14 Suppl 1:S4. [PMID: 23368633 PMCID: PMC3549801 DOI: 10.1186/1471-2164-14-s1-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Deciphering cis-regulatory networks has become an attractive yet challenging task. This paper presents a simple method for cis-regulatory network discovery which aims to avoid some of the common problems of previous approaches. Results Using promoter sequences and gene expression profiles as input, rather than clustering the genes by the expression data, our method utilizes co-expression neighborhood information for each individual gene, thereby overcoming the disadvantages of current clustering based models which may miss specific information for individual genes. In addition, rather than using a motif database as an input, it implements a simple motif count table for each enumerated k-mer for each gene promoter sequence. Thus, it can be used for species where previous knowledge of cis-regulatory motifs is unknown and has the potential to discover new transcription factor binding sites. Applications on Saccharomyces cerevisiae and Arabidopsis have shown that our method has a good prediction accuracy and outperforms a phylogenetic footprinting approach. Furthermore, the top ranked gene-motif regulatory clusters are evidently functionally co-regulated, and the regulatory relationships between the motifs and the enriched biological functions can often be confirmed by literature. Conclusions Since this method is simple and gene-specific, it can be readily utilized for insufficiently studied species or flexibly used as an additional step or data source for previous transcription regulatory networks discovery models.
Collapse
Affiliation(s)
- Zhen Gao
- Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA.
| | | | | |
Collapse
|
4
|
Kane NC, Barker MS, Zhan SH, Rieseberg LH. Molecular Evolution across the Asteraceae: Micro- and Macroevolutionary Processes. Mol Biol Evol 2011; 28:3225-35. [DOI: 10.1093/molbev/msr166] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
|
5
|
Sather BD, Ryu BY, Stirling BV, Garibov M, Kerns HM, Humblet-Baron S, Astrakhan A, Rawlings DJ. Development of B-lineage predominant lentiviral vectors for use in genetic therapies for B cell disorders. Mol Ther 2010; 19:515-25. [PMID: 21139568 DOI: 10.1038/mt.2010.259] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Sustained, targeted, high-level transgene expression in primary B lymphocytes may be useful for gene therapy in B cell disorders. We developed several candidate B-lineage predominant self-inactivating lentiviral vectors (LV) containing alternative enhancer/promoter elements including: the immunoglobulin β (Igβ) (B29) promoter combined with the immunoglobulin µ enhancer (EµB29); and the endogenous BTK promoter with or without Eµ (EµBtkp or Btkp). LV-driven enhanced green fluorescent protein (eGFP) reporter expression was evaluated in cell lines and primary cells derived from human or murine hematopoietic stem cells (HSC). In murine primary cells, EµB29 and EµBtkp LV-mediated high-level expression in immature and mature B cells compared with all other lineages. Expression increased with B cell maturation and was maintained in peripheral subsets. Expression in T and myeloid cells was much lower in percentage and intensity. Similarly, both EµB29 and EµBtkp LV exhibited high-level activity in human primary B cells. In contrast to EµB29, Btkp and EµBtkp LV also exhibited modest activity in myeloid cells, consistent with the expression profile of endogenous Bruton's tyrosine kinase (Btk). Notably, EµB29 and EµBtkp activity was superior in all expression models to an alternative, B-lineage targeted vector containing the EµS.CD19 enhancer/promoter. In summary, EµB29 and EµBtkp LV comprise efficient delivery platforms for gene expression in B-lineage cells.
Collapse
Affiliation(s)
- Blythe D Sather
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, Washington 98101, USA
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Tran DA, Wong TC, Schep AN, Drewell RA. Characterization of an Ultra-Conserved Putativecis-Regulatory Module at the Mammalian Telomerase Reverse Transcriptase Gene. DNA Cell Biol 2010; 29:499-508. [DOI: 10.1089/dna.2009.0994] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Affiliation(s)
- Diana A. Tran
- Department of Biology, Harvey Mudd College, Claremont, California
| | - Terence C. Wong
- Department of Biology, Harvey Mudd College, Claremont, California
| | - Alicia N. Schep
- Department of Biology, Harvey Mudd College, Claremont, California
| | | |
Collapse
|
7
|
Xu Y, Zhang M, Wang Y, Kadambi P, Dave V, Lu LJ, Whitsett JA. A systems approach to mapping transcriptional networks controlling surfactant homeostasis. BMC Genomics 2010; 11:451. [PMID: 20659319 PMCID: PMC3091648 DOI: 10.1186/1471-2164-11-451] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2009] [Accepted: 07/26/2010] [Indexed: 12/15/2022] Open
Abstract
Background Pulmonary surfactant is required for lung function at birth and throughout life. Lung lipid and surfactant homeostasis requires regulation among multi-tiered processes, coordinating the synthesis of surfactant proteins and lipids, their assembly, trafficking, and storage in type II cells of the lung. The mechanisms regulating these interrelated processes are largely unknown. Results We integrated mRNA microarray data with array independent knowledge using Gene Ontology (GO) similarity analysis, promoter motif searching, protein interaction and literature mining to elucidate genetic networks regulating lipid related biological processes in lung. A Transcription factor (TF) - target gene (TG) similarity matrix was generated by integrating data from different analytic methods. A scoring function was built to rank the likely TF-TG pairs. Using this strategy, we identified and verified critical components of a transcriptional network directing lipogenesis, lipid trafficking and surfactant homeostasis in the mouse lung. Conclusions Within the transcriptional network, SREBP, CEBPA, FOXA2, ETSF, GATA6 and IRF1 were identified as regulatory hubs displaying high connectivity. SREBP, FOXA2 and CEBPA together form a common core regulatory module that controls surfactant lipid homeostasis. The core module cooperates with other factors to regulate lipid metabolism and transport, cell growth and development, cell death and cell mediated immune response. Coordinated interactions of the TFs influence surfactant homeostasis and regulate lung function at birth.
Collapse
Affiliation(s)
- Yan Xu
- Division of Pulmonary Biology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
| | | | | | | | | | | | | |
Collapse
|
8
|
Shen X, Walsh B, Li JJ, Pang HX, Wang WJ, Tao SH. The correlations of the function and positional distribution of the cis-elements CArG around the TSS in the genes of Mus musculus. Genome 2009; 52:217-21. [PMID: 19234549 DOI: 10.1139/g08-117] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
While many studies of cis-elements CArG bound by serum response factor (SRF) are in progress, little is known about the positional distribution of the functional CArG elements around the transcription start site (TSS) of genes that they influence. We use a validated CArG data set to calculate the distance distribution of functional CArG elements around the TSS. Distances between adjacent CArGs were also analyzed. We compare these distributions with those derived using a control set of randomly selected CArGs (that were not experimentally validated for function). Our results show that most functional CArG elements (108 of 152, 71%) exist upstream of the annotated TSS, with copy number increasing as one moves closer to the TSS. Moreover, the average number of the CArG elements in the CArG-containing genes is significantly more than that in the control genes. Our study extends earlier bioinformatic analyses of functional CArG elements and provides an application of comparative sequence data to the identification of transcription factor binding sites.
Collapse
Affiliation(s)
- Xia Shen
- Bioinformatics Center, Northwest A&F University, 712100 Yangling, Shaanxi, China
| | | | | | | | | | | |
Collapse
|
9
|
Genomic promoter analysis predicts functional transcription factor binding. Adv Bioinformatics 2008; 2008:369830. [PMID: 19865592 PMCID: PMC2768302 DOI: 10.1155/2008/369830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Revised: 05/15/2008] [Accepted: 07/17/2008] [Indexed: 02/02/2023] Open
Abstract
Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology.
Results.
We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%.
Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.
Collapse
|
10
|
Elgar G, Vavouri T. Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends Genet 2008; 24:344-52. [PMID: 18514361 DOI: 10.1016/j.tig.2008.04.005] [Citation(s) in RCA: 129] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2008] [Revised: 04/14/2008] [Accepted: 04/14/2008] [Indexed: 01/25/2023]
|
11
|
Elnitski L, Riemer C, Schwartz S, Hardison R, Miller W. PipMaker: a World Wide Web server for genomic sequence alignments. ACTA ACUST UNITED AC 2008; Chapter 10:Unit 10.2. [PMID: 18428692 DOI: 10.1002/0471250953.bi1002s00] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
PipMaker is a World-Wide Web site used to compare two long genomic sequences and identify conserved segments between them. This unit describes the use of the PipMaker server and explains the resulting output files. PipMaker provides an efficient method of aligning genomic sequences and returns a compact, but easy-to-interpret form of output, the percent identity plot (pip). For each aligning segment between two sequences the pip shows both the position relative to the first sequence and the degree of similarity. Optional annotations on the pip provide additional information to assist in the interpretation of the alignment. The default parameters of the underlying blastz alignment program are tuned for human-mouse alignments.
Collapse
Affiliation(s)
- Laura Elnitski
- The Pennsylvania State University, University Park, Pennsylvania, USA
| | | | | | | | | |
Collapse
|
12
|
Proteasome-dependent autoregulation of Bruton tyrosine kinase (Btk) promoter via NF-kappaB. Blood 2008; 111:4617-26. [PMID: 18292289 DOI: 10.1182/blood-2007-10-121137] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Bruton tyrosine kinase (Btk) is critical for B-cell development. Btk regulates a plethora of signaling proteins, among them nuclear factor-[kappa]B (NF-kappaB). Activation of NF-kappaB is a hallmark of B cells, and NF-kappaB signaling is severely compromised in Btk deficiency. We here present strong evidence indicating that NF-kappaB is required for efficient transcription of the Btk gene. First, we found that proteasome blockers and inhibitors of NF-kappaB signaling suppress Btk transcription and intracellular expression. Similar to Btk, proteasome inhibitors also reduced the expression of other members of this family of kinases, Itk, Bmx, and Tec. Second, 2 functional NF-kappaB-binding sites were found in the Btk promoter. Moreover, in live mice, by hydrodynamic transfection, we show that bortezomib (a blocker of proteasomes and NF-kappaB signaling), as well as NF-kappaB binding sequence-oligonucleotide decoys block Btk transcription. We also demonstrate that Btk induces NF-kappaB activity in mice. Collectively, we show that Btk uses a positive autoregulatory feedback mechanism to stimulate transcription from its own promoter via NF-kappaB.
Collapse
|
13
|
Murmann AE, Mincheva A, Scheuermann MO, Gautier M, Yang F, Buitkamp J, Strissel PL, Strick R, Rowley JD, Lichter P. Comparative gene mapping in cattle, Indian muntjac, and Chinese muntjac by fluorescence in situ hybridization. Genetica 2008; 134:345-51. [PMID: 18283540 DOI: 10.1007/s10709-008-9242-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2007] [Accepted: 01/26/2008] [Indexed: 02/04/2023]
Abstract
The Indian muntjac (Muntiacus muntjak vaginalis) has a karyotype of 2n = 6 in the female and 2n = 7 in the male. The karyotypic evolution of Indian muntjac via extensive tandem fusions and several centric fusions are well documented by molecular cytogenetic studies mainly utilizing chromosome paints. To achieve higher resolution mapping, a set of 42 different genomic clones coding for 37 genes and the nucleolar organizer region were used to examine homologies between the cattle (2n = 60), human (2n = 46), Indian muntjac (2n = 6/7) and Chinese muntjac (2n = 46) karyotypes. These genomic clones were mapped by fluorescence in situ hybridization (FISH). Localization of genes on all three pairs of M. m. vaginalis chromosomes and on the acrocentric chromosomes of M. reevesi allowed not only the analysis of the evolution of syntenic regions within the muntjac genus but also allowed a broader comparison of synteny with more distantly related species, such as cattle and human, to shed more light onto the evolving genome organization.
Collapse
Affiliation(s)
- Andrea E Murmann
- Department of Medicine, Section Hematology/Oncology, University of Chicago, Chicago, IL 60637, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Woolfe A, Elgar G. Organization of conserved elements near key developmental regulators in vertebrate genomes. ADVANCES IN GENETICS 2008; 61:307-38. [PMID: 18282512 DOI: 10.1016/s0065-2660(07)00012-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Sequence conservation has traditionally been used as a means to target functional regions of complex genomes. In addition to its use in identifying coding regions of genes, the recent availability of whole genome data for a number of vertebrates has permitted high-resolution analyses of the noncoding "dark matter" of the genome. This has resulted in the identification of a large number of highly conserved sequence elements that appear to be preserved in all bony vertebrates. Further positional analysis of these conserved noncoding elements (CNEs) in the genome demonstrates that they cluster around genes involved in developmental regulation. This chapter describes the identification and characterization of these elements, with particular reference to their composition and organization.
Collapse
Affiliation(s)
- Adam Woolfe
- School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, United Kingdom
| | | |
Collapse
|
15
|
Experimental validation of novel genes predicted in the un-annotated regions of the Arabidopsis genome. BMC Genomics 2007; 8:18. [PMID: 17229318 PMCID: PMC1783852 DOI: 10.1186/1471-2164-8-18] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2006] [Accepted: 01/17/2007] [Indexed: 11/10/2022] Open
Abstract
Background Several lines of evidence support the existence of novel genes and other transcribed units which have not yet been annotated in the Arabidopsis genome. Two gene prediction programs which make use of comparative genomic analysis, Twinscan and EuGene, have recently been deployed on the Arabidopsis genome. The ability of these programs to make use of sequence data from other species has allowed both Twinscan and EuGene to predict over 1000 genes that are intergenic with respect to the most recent annotation release. A high throughput RACE pipeline was utilized in an attempt to verify the structure and expression of these novel genes. Results 1,071 un-annotated loci were targeted by RACE, and full length sequence coverage was obtained for 35% of the targeted genes. We have verified the structure and expression of 378 genes that were not present within the most recent release of the Arabidopsis genome annotation. These 378 genes represent a structurally diverse set of transcripts and encode a functionally diverse set of proteins. Conclusion We have investigated the accuracy of the Twinscan and EuGene gene prediction programs and found them to be reliable predictors of gene structure in Arabidopsis. Several hundred previously un-annotated genes were validated by this work. Based upon this information derived from these efforts it is likely that the Arabidopsis genome annotation continues to overlook several hundred protein coding genes.
Collapse
|
16
|
Cheng JF, Priest JR, Pennacchio LA. Comparative genomics: a tool to functionally annotate human DNA. Methods Mol Biol 2007; 366:229-51. [PMID: 17568128 DOI: 10.1007/978-1-59745-030-0_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The availability of an increasing number of vertebrate genomes has enabled comparative methods to infer functional sequences based on evolutionary constraint. Although this has proved powerful for gene identification, significant progress has also been made in uncovering gene regulatory sequences such as distant acting transcriptional enhancers. These pursuits have led to the development of a variety of valuable databases and resources that should serve as a routine toolbox for biological discovery.
Collapse
Affiliation(s)
- Jan-Fang Cheng
- Genomics Division, Lawrence Berkeley National Laboratory, CA, USA
| | | | | |
Collapse
|
17
|
Müller F, Borycki AG. Sequence analyses to study the evolutionary history and cis-regulatory elements of Hedgehog genes. Methods Mol Biol 2007; 397:231-250. [PMID: 18025724 DOI: 10.1007/978-1-59745-516-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Sequence analysis and comparative genomics are powerful tools to gain knowledge on multiple aspects of gene and protein regulation and function. These have been widely used to understand the evolutionary history and the biochemistry of Hedgehog (Hh) proteins, and the molecular control of Hedgehog gene expression. Here, we report on some of the methods available to retrieve protein and genomic sequences. We describe how protein sequence comparison can produce information on the evolutionary history of Hh proteins. Moreover, we describe the use of genomic sequence analysis including phylogenetic footprinting and transcription factor-binding site search tools, techniques that allow for the characterization of cis-regulatory elements of developmental genes such as the Hedgehog genes.
Collapse
|
18
|
Abstract
BOB.1/OBF.1 is a lymphocyte-restricted transcriptional coactivator. It binds together with the Oct1 and Oct2 transcription factors to DNA and enhances their transactivation potential. Mice deficient for the transcriptional coactivator BOB.1/OBF.1 show several defects in differentiation, function and signaling of B cells. In search of BOB.1/OBF.1 regulated genes we identified Btk—a cytoplasmic tyrosine kinase—as a direct target of BOB.1/OBF.1. Analyses of the human as well as murine Btk promoters revealed a non-consensus octamer site close to the start site of transcription. Here we show that Oct proteins together with BOB.1/OBF.1 are able to form ternary complexes on these sites in vitro and in vivo. This in turn leads to the induction of Btk promoter activity in synergism with the transcription factor PU.1. Btk, like BOB.1/OBF.1, plays a critical role in B cell development and B cell receptor signalling. Therefore the down-regulation of Btk expression in BOB.1/OBF.1-deficient B cells could be related to the functional and developmental defects observed in BOB.1/OBF.1-deficient mice.
Collapse
Affiliation(s)
| | - Thomas Wirth
- To whom correspondence should be addressed. Tel: 0049 731 502 3262; Fax: 0049 731 502 2892;
| |
Collapse
|
19
|
Tam JLY, Triantaphyllopoulos K, Todd H, Raguz S, de Wit T, Morgan JE, Partridge TA, Makrinou E, Grosveld F, Antoniou M. The human desmin locus: gene organization and LCR-mediated transcriptional control. Genomics 2006; 87:733-46. [PMID: 16545539 DOI: 10.1016/j.ygeno.2006.01.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2005] [Revised: 01/20/2006] [Accepted: 01/29/2006] [Indexed: 12/16/2022]
Abstract
Locus control regions (LCRs) are defined by their ability to confer reproducible physiological levels of transgene expression in mice and therefore thought to possess the ability to generate dominantly a transcriptionally active chromatin structure. We report the first characterization of a muscle-cell-specific LCR, which is linked to the human desmin gene (DES). The DES LCR consists of five regions of muscle-specific DNase I hypersensitivity (HS) localized between -9 and -18 kb 5' of DES and reproducibly drives full physiological levels of expression in all muscle cell types. The DES LCR DNase I HS regions are highly conserved between humans and other mammals and can potentially bind a broad range of muscle-specific and ubiquitous transcription factors. Bioinformatics and direct molecular analysis show that the DES locus consists of three muscle-specific (DES) or muscle preferentially expressed genes (APEG1 and SPEG, the human orthologue of murine striated-muscle-specific serine/threonine protein kinase, Speg). The DES LCR may therefore regulate expression of SPEG and APEG1 as well as DES.
Collapse
Affiliation(s)
- Jennifer L Y Tam
- Nuclear Biology Group, Department of Medical and Molecular Genetics, King's College London School of Medicine, King's College London-Guy's Campus, 8th Floor Guy's Tower, Guy's Hospital, London SE1 9RT, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Choi D, Fang Y, Mathers WD. Condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. Genomics 2006; 87:500-8. [PMID: 16431075 DOI: 10.1016/j.ygeno.2005.11.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2005] [Accepted: 11/26/2005] [Indexed: 11/30/2022]
Abstract
Deciphering genetic regulatory codes remains a challenge. Here, we present an effective approach to identifying in vivo condition-specific coregulation with cis-regulatory motifs and modules in the mouse genome. A resampling-based algorithm was adopted to cluster our microarray data of a stress response, which generated 35 tight clusters with unique expression patterns containing 811 genes of 5652 genes significantly altered. Database searches identified many known motifs within the 3-kb regulatory regions of 40 genes from 3 clusters and modules with six to nine motifs that were commonly shared by 60-100% of these genes. The upstream regulatory region contained the highest frequency of these common motifs. CisModule program predictions were comparable with the results from database searches and found four potentially novel motifs. This result indicates that these motifs and modules could be responsible for gene coregulation of the stress response in the lacrimal gland.
Collapse
Affiliation(s)
- Dongseok Choi
- Division of Biostatistics, Department of Public Health & Preventive Medicine, Oregon Health & Science University, 3375 SW Terwilliger Boulevard, Portland, OR 97239, USA
| | | | | |
Collapse
|
21
|
Stone EA, Cooper GM, Sidow A. Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. Annu Rev Genomics Hum Genet 2005; 6:143-64. [PMID: 16124857 DOI: 10.1146/annurev.genom.6.080604.162146] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
As whole-genome sequencing efforts extend beyond more traditional model organisms to include a deep diversity of species, comparative genomic analyses will be further empowered to reveal insights into the human genome and its evolution. The discovery and annotation of functional genomic elements is a necessary step toward a detailed understanding of our biology, and sequence comparisons have proven to be an integral tool for that task. This review is structured to broadly reflect the statistical challenges in discriminating these functional elements from the bulk of the genome that has evolved neutrally. Specifically, we review the comparative genomics literature in terms of specificity, sensitivity, and phylogenetic scope, as well as the trade-offs that relate these factors in standard analyses. We consider the impact of an expanding diversity of orthologous sequences on our ability to resolve functional elements. This impact is assessed through both recent comparative analyses of deep alignments and mathematical modeling.
Collapse
Affiliation(s)
- Eric A Stone
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | | | | |
Collapse
|
22
|
Grice EA, Rochelle ES, Green ED, Chakravarti A, McCallion AS. Evaluation of the RET regulatory landscape reveals the biological relevance of a HSCR-implicated enhancer. Hum Mol Genet 2005; 14:3837-45. [PMID: 16269442 DOI: 10.1093/hmg/ddi408] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Evolutionary sequence conservation is now a relatively common approach for the prediction of functional DNA sequences. However, the fraction of conserved non-coding sequences with regulatory potential is still unknown. In this study, we focus on elucidating the regulatory landscape of RET, a crucial developmental gene within which we have recently identified a regulatory Hirschsprung disease (HSCR) susceptibility variant. We report a systematic examination of conserved non-coding sequences (n=45) identified in a 220 kb interval encompassing RET. We demonstrate that most of these conserved elements are capable of enhancer or suppressor activity in vitro, and the majority of the elements exert cell type-dependent control. We show that discrete sequences within regulatory elements can bind nuclear protein in a cell type-dependent manner that is consistent with their identified in vitro regulatory control. Finally, we focused our attention on the enhancer implicated in HSCR to demonstrate that this element drives reporter expression in cell populations of the excretory system and central nervous system (CNS) and peripheral nervous system (PNS), consistent with expression of the endogenous RET protein. Importantly, this sequence also modulates expression in the enteric nervous system consistent with its proposed role in HSCR.
Collapse
Affiliation(s)
- Elizabeth A Grice
- McKusick-Nathans Institute of Genetic Medicine, Department of Comparative Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | | | | | | |
Collapse
|
23
|
Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJK, Cooke JE, Elgar G. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 2005; 3:e7. [PMID: 15630479 PMCID: PMC526512 DOI: 10.1371/journal.pbio.0030007] [Citation(s) in RCA: 682] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2004] [Accepted: 10/21/2004] [Indexed: 02/06/2023] Open
Abstract
In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.
Collapse
Affiliation(s)
- Adam Woolfe
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Martin Goodson
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Debbie K Goode
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Phil Snell
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Gayle K McEwen
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Tanya Vavouri
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Sarah F Smith
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Phil North
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Heather Callaway
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Krys Kelly
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Klaudia Walter
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Irina Abnizova
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Walter Gilks
- 2Medical Research Council Biostatistics Unit, Institute of Public Health, Addenbrookes HospitalCambridgeUnited Kingdom
| | - Yvonne J. K Edwards
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Julie E Cooke
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| | - Greg Elgar
- 1Medical Research Council Rosalind Franklin Centre for Genomics ResearchHinxton, CambridgeUnited Kingdom
| |
Collapse
|
24
|
Revilla-i-Domingo R, Minokawa T, Davidson EH. R11: a cis-regulatory node of the sea urchin embryo gene network that controls early expression of SpDelta in micromeres. Dev Biol 2004; 274:438-51. [PMID: 15385170 DOI: 10.1016/j.ydbio.2004.07.008] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2004] [Accepted: 07/09/2004] [Indexed: 11/19/2022]
Abstract
A gene regulatory network (GRN) controls the process by which the endomesoderm of the sea urchin embryo is specified. In this GRN, the program of gene expression unique to the skeletogenic micromere lineage is set in train by activation of the pmar1 gene. Through a double repression system, this gene is responsible for localization of expression of downstream regulatory and signaling genes to cells of this lineage. One of these genes, delta, encodes a Notch ligand, and its expression in the right place and time is crucial to the specification of the endomesoderm. Here we report a cis-regulatory element R11 that is responsible for localizing the expression of delta by means of its response to the pmar1 repression system. R11 was identified as an evolutionarily conserved genomic sequence located about 13 kb downstream of the last exon of the delta gene. We demonstrate here that this cis-regulatory element is able to drive the expression of a reporter gene in the same cells and at the same time that the endogenous delta gene is expressed, and that temporally, spatially, and quantitatively it responds to the pmar1 repression system just as predicted for the delta gene in the endomesoderm GRN. This work illustrates the application of cis-regulatory analysis to the validation of predictions of the GRN model. In addition, we introduce new methodological tools for quantitative measurement of the output of expression constructs that promise to be of general value for cis-regulatory analysis in sea urchin embryos.
Collapse
|
25
|
Microarray and comparative genomics-based identification of genes and gene regulatory regions of the mouse immune system. BMC Genomics 2004; 5:82. [PMID: 15504237 PMCID: PMC534115 DOI: 10.1186/1471-2164-5-82] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2004] [Accepted: 10/25/2004] [Indexed: 12/30/2022] Open
Abstract
Background In this study we have built and mined a gene expression database composed of 65 diverse mouse tissues for genes preferentially expressed in immune tissues and cell types. Using expression pattern criteria, we identified 360 genes with preferential expression in thymus, spleen, peripheral blood mononuclear cells, lymph nodes (unstimulated or stimulated), or in vitro activated T-cells. Results Gene clusters, formed based on similarity of expression-pattern across either all tissues or the immune tissues only, had highly significant associations both with immunological processes such as chemokine-mediated response, antigen processing, receptor-related signal transduction, and transcriptional regulation, and also with more general processes such as replication and cell cycle control. Within-cluster gene correlations implicated known associations of known genes, as well as immune process-related roles for poorly described genes. To characterize regulatory mechanisms and cis-elements of genes with similar patterns of expression, we used a new version of a comparative genomics-based cis-element analysis tool to identify clusters of cis-elements with compositional similarity among multiple genes. Several clusters contained genes that shared 5–6 cis-elements that included ETS and zinc-finger binding sites. cis-Elements AP2 EGRF ETSF MAZF SP1F ZF5F and AREB ETSF MZF1 PAX5 STAT were shared in a thymus-expressed set; AP4R E2FF EBOX ETSF MAZF SP1F ZF5F and CREB E2FF MAZF PCAT SP1F STAT cis-clusters occurred in activated T-cells; CEBP CREB NFKB SORY and GATA NKXH OCT1 RBIT occurred in stimulated lymph nodes. Conclusion This study demonstrates a series of analytic approaches that have allowed the implication of genes and regulatory elements that participate in the differentiation, maintenance, and function of the immune system. Polymorphism or mutation of these could adversely impact immune system functions.
Collapse
|
26
|
Abstract
The genomes from three mammals (human, mouse, and rat), two worms, and several yeasts have been sequenced, and more genomes will be completed in the near future for comparison with those of the major model organisms. Scientists have used various methods to align and compare the sequenced genomes to address critical issues in genome function and evolution. This review covers some of the major new insights about gene content, gene regulation, and the fraction of mammalian genomes that are under purifying selection and presumed functional. We review the evolutionary processes that shape genomes, with particular attention to variation in rates within genomes and along different lineages. Internet resources for accessing and analyzing the treasure trove of sequence alignments and annotations are reviewed, and we discuss critical problems to address in new bioinformatic developments in comparative genomics.
Collapse
Affiliation(s)
- Webb Miller
- The Center for Comparative Genomics and Bioinformatics, The Huck Institutes of Life Sciences, Department of Biology, Pennsylvania State University, University Park, Pennsylvania, USA.
| | | | | | | |
Collapse
|
27
|
Kellis M, Patterson N, Birren B, Berger B, Lander ES. Methods in comparative genomics: genome correspondence, gene identification and regulatory motif discovery. J Comput Biol 2004; 11:319-55. [PMID: 15285895 DOI: 10.1089/1066527041410319] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In Kellis et al. (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.
Collapse
Affiliation(s)
- Manolis Kellis
- Whitehead Institute Center for Genome Research, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | | | | | | | | |
Collapse
|
28
|
Karanam S, Moreno CS. CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets. Nucleic Acids Res 2004; 32:W475-84. [PMID: 15215433 PMCID: PMC441491 DOI: 10.1093/nar/gkh353] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The advent of DNA microarray technology and the sequencing of multiple vertebrate genomes has provided a unique opportunity for the integration of comparative genomics with high-throughput gene expression analysis. Here we describe the conserved transcription factor binding site (CONFAC) software that enables the high-throughput identification of conserved transcription factor binding sites (TFBSs) in the regulatory regions of hundreds of genes at a time (http://morenolab.whitehead.emory.edu/cgi-bin/confac/login.pl). The CONFAC software compares non-coding regulatory sequences between human and mouse genomes to enable identification of conserved TFBSs that are significantly enriched in promoters of gene clusters from microarray analyses compared to sets of unchanging control genes using a Mann-Whitney U-test. Analysis of random gene sets demonstrated that using our approach, over 98% of TFBSs had false positive rates below 5%. As a proof-of-principle, we have validated the CONFAC software using gene sets from four separate microarray studies and identified TFBSs known to be functionally important for regulation of each of the four gene sets.
Collapse
Affiliation(s)
- Suresh Karanam
- Program in Bioinformatics, School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | |
Collapse
|
29
|
Tsuchiya KD, Greally JM, Yi Y, Noel KP, Truong JP, Disteche CM. Comparative sequence and x-inactivation analyses of a domain of escape in human xp11.2 and the conserved segment in mouse. Genome Res 2004; 14:1275-84. [PMID: 15197169 PMCID: PMC442142 DOI: 10.1101/gr.2575904] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We have performed X-inactivation and sequence analyses on 350 kb of sequence from human Xp11.2, a region shown previously to contain a cluster of genes that escape X inactivation, and we compared this region with the region of conserved synteny in mouse. We identified several new transcripts from this region in human and in mouse, which defined the full extent of the domain escaping X inactivation in both species. In human, escape from X inactivation involves an uninterrupted 235-kb domain of multiple genes. Despite highly conserved gene content and order between the two species, Smcx is the only mouse gene from the conserved segment that escapes inactivation. As repetitive sequences are believed to facilitate spreading of X inactivation along the chromosome, we compared the repetitive sequence composition of this region between the two species. We found that long terminal repeats (LTRs) were decreased in the human domain of escape, but not in the majority of the conserved mouse region adjacent to Smcx in which genes were subject to X inactivation, suggesting that these repeats might be excluded from escape domains to prevent spreading of silencing. Our findings indicate that genomic context, as well as gene-specific regulatory elements, interact to determine expression of a gene from the inactive X-chromosome.
Collapse
Affiliation(s)
- Karen D Tsuchiya
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA.
| | | | | | | | | | | |
Collapse
|
30
|
Tufarelli C, Hardison R, Miller W, Hughes J, Clark K, Ventress N, Frischauf AM, Higgs DR. Comparative analysis of the alpha-like globin clusters in mouse, rat, and human chromosomes indicates a mechanism underlying breaks in conserved synteny. Genome Res 2004; 14:623-30. [PMID: 15060003 PMCID: PMC383306 DOI: 10.1101/gr.2143604] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have sequenced and fully annotated a 65,871-bp region of mouse Chromosome 17 including the Hba-ps4 alpha-globin pseudogene. Comparative sequence analysis with the functional alpha-globin loci at human Chromosome 16p13.3 and mouse Chromosome 11 shows that this segment of mouse Chromosome 17 contains a group of three alpha-like pseudogenes (Hba-psm-Hba-ps4-Hba-q3), similar to the duplicated sets found at the functional mouse cluster on Chromosome 11. In addition, exons 7 to 12 of the mLuc7L gene are present just downstream from the pseudogene cluster, indicating that this clone contains the region in which human 16p13.3 switches in synteny between mouse Chromosomes 11 and 17. Comparison of the sequences around the alpha-like clusters on the two mouse chromosomes reveals the presence of conserved tandem repeats. We propose that these repetitive elements have played a role in the fragmentation of the mouse alpha cluster during evolution.
Collapse
Affiliation(s)
- Cristina Tufarelli
- MRC Molecular Haematology Unit, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Headington, Oxford OX3 9DS, UK
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Ovcharenko I, Loots GG, Hardison RC, Miller W, Stubbs L. zPicture: dynamic alignment and visualization tool for analyzing conservation profiles. Genome Res 2004; 14:472-7. [PMID: 14993211 PMCID: PMC353235 DOI: 10.1101/gr.2129504] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Comparative sequence analysis has evolved as an essential technique for identifying functional coding and noncoding elements conserved throughout evolution. Here, we introduce zPicture, an interactive Web-based sequence alignment and visualization tool for dynamically generating conservation profiles and identifying evolutionarily conserved regions (ECRs). zPicture is highly flexible, because critical parameters can be modified interactively, allowing users to differentially predict ECRs in comparisons of sequences of different phylogenetic distances and evolutionary rates. We demonstrate the application of this module to identify a known regulatory element in the HOXD locus, in which functional ECRs are difficult to discern against the highly conserved genomic background. zPicture also facilitates transcription factor binding-site analysis via the rVista tool portal. We present an example of the HBB complex when zPicture/rVista combination specifically pinpoints to two ECRs containing GATA-1, NF-E2, and TAL1/E47 binding sites that were identified previously as transcriptional enhancers. In addition, zPicture is linked to the UCSC Genome Browser, allowing users to automatically extract sequences and gene annotations for any recorded locus. Finally, we describe how this tool can be efficiently applied to the analysis of nonvertebrate genomes, including those of microbial organisms.
Collapse
Affiliation(s)
- Ivan Ovcharenko
- Energy, Environment, Biology and Institutional Computing, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | | | | | | | | |
Collapse
|
32
|
Lee DU, Avni O, Chen L, Rao A. A Distal Enhancer in the Interferon-γ (IFN-γ) Locus Revealed by Genome Sequence Comparison. J Biol Chem 2004; 279:4802-10. [PMID: 14607827 DOI: 10.1074/jbc.m307904200] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Large-scale cross-species DNA sequence comparison has become a powerful tool to identify conserved cis-regulatory modules of genes. However, bioinformatic analysis alone cannot reveal how an evolutionarily conserved region regulates gene expression: whether it functions as an enhancer, silencer, or insulator; whether its function is cell-type restricted; and whether biologically relevant transcription factors bind to the element. Here we combine bioinformatics with wet-lab techniques to illustrate a general and systematic method of identifying functional conserved regulatory regions of genes. We applied this approach to the interferon-gamma (IFN-gamma) gene. Comparison of human and mouse IFN-gamma reveals a highly conserved non-coding sequence located approximately 5 kb 5' of the transcription start site. This region coincides with constitutive and inducible DNase I hypersensitivity sites present in IFN-gamma-producing Th1 cells but not in Th2 cells that do not produce IFN-gamma. Histone methylation at the 5' conserved non-coding sequences indicates a more accessible chromatin structure in Th1 cells compared with Th2 cells. This element binds two transcription factors known to be essential for IFN-gamma expression: nuclear factor of activated T cells, an inducible transcription factor, and T-box protein expressed in T cells, a cell lineage-restricted transcription factor. Together, these findings identify a highly conserved distal enhancer in the IFN-gamma cytokine locus and validate our approach as a successful method to detect cis-regulatory elements.
Collapse
Affiliation(s)
- Dong U Lee
- Center for Blood Research and the Department of Pathology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | |
Collapse
|
33
|
Xie T, Rowen L, Aguado B, Ahearn ME, Madan A, Qin S, Campbell RD, Hood L. Analysis of the gene-dense major histocompatibility complex class III region and its comparison to mouse. Genome Res 2004; 13:2621-36. [PMID: 14656967 PMCID: PMC403804 DOI: 10.1101/gr.1736803] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
In mammals, the Major Histocompatibility Complex class I and II gene clusters are separated by an approximately 700-kb stretch of sequence called the MHC class III region, which has been associated with susceptibility to numerous diseases. To facilitate understanding of this medically important and architecturally interesting portion of the genome, we have sequenced and analyzed both the human and mouse class III regions. The cross-species comparison has facilitated the identification of 60 genes in human and 61 in mouse, including a potential RNA gene for which the introns are more conserved across species than the exons. Delineation of global organization, gene structure, alternative splice forms, protein similarities, and potential cis-regulatory elements leads to several conclusions: (1) The human MHC class III region is the most gene-dense region of the human genome: >14% of the sequence is coding, approximately 72% of the region is transcribed, and there is an average of 8.5 genes per 100 kb. (2) Gene sizes, number of exons, and intergenic distances are for the most part similar in both species, implying that interspersed repeats have had little impact in disrupting the tight organization of this densely packed set of genes. (3) The region contains a heterogeneous mixture of genes, only a few of which have a clearly defined and proven function. Although many of the genes are of ancient origin, some appear to exist only in mammals and fish, implying they might be specific to vertebrates. (4) Conserved noncoding sequences are found primarily in or near the 5'-UTR or the first intron of genes, and seldom in the intergenic regions. Many of these conserved blocks are likely to be cis-regulatory elements.
Collapse
Affiliation(s)
- Tao Xie
- Institute for Systems Biology, Seattle, Washington 98103, USA
| | | | | | | | | | | | | | | |
Collapse
|
34
|
A combinatorial network of evolutionarily conserved myelin basic protein regulatory sequences confers distinct glial-specific phenotypes. J Neurosci 2003. [PMID: 14614079 DOI: 10.1523/jneurosci.23-32-10214.2003] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Myelin basic protein (MBP) is required for normal myelin compaction and is implicated in both experimental and human demyelinating diseases. In this study, as an initial step in defining the regulatory network controlling MBP transcription, we located and characterized the function of evolutionarily conserved regulatory sequences. Long-range human-mouse sequence comparison revealed over 1 kb of conserved noncoding MBP 5' flanking sequence distributed into four widely spaced modules ranging from 0.1 to 0.4 kb. We demonstrate first that a controlled strategy of transgenesis provides an effective means to assign and compare qualitative and quantitative in vivo regulatory programs. Using this strategy, single-copy reporter constructs, designed to evaluate the regulatory significance of modular and intermodular sequences, were introduced by homologous recombination into the mouse hprt (hypoxanthine-guanine phosphoribosyltransferase) locus. The proximal modules M1 and M2 confer comparatively low-level oligodendrocyte expression primarily limited to early postnatal development, whereas the upstream M3 confers high-level oligodendrocyte expression extending throughout maturity. Furthermore, constructs devoid of M3 fail to target expression to newly myelinating oligodendrocytes in the mature CNS. Mutation of putative Nkx6.2/Gtx sites within M3, although not eliminating oligodendrocyte targeting, significantly decreases transgene expression levels. High-level and continuous expression is conferred to myelinating or remyelinating Schwann cells by M4. In addition, when isolated from surrounding MBP sequences, M3 confers transient expression to Schwann cells elaborating myelin. These observations define the in vivo regulatory roles played by conserved noncoding MBP sequences and lead to a combinatorial model in which different regulatory modules are engaged during primary myelination, myelin maintenance, and remyelination.
Collapse
|
35
|
Matsuyama A, Shiraishi T, Trapasso F, Kuroki T, Alder H, Mori M, Huebner K, Croce CM. Fragile site orthologs FHIT/FRA3B and Fhit/Fra14A2: evolutionarily conserved but highly recombinogenic. Proc Natl Acad Sci U S A 2003; 100:14988-93. [PMID: 14630947 PMCID: PMC299872 DOI: 10.1073/pnas.2336256100] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Common fragile sites are regions that show elevated susceptibility to DNA damage, leading to alterations that can contribute to cancer development. FRA3B, located at chromosome region 3p14.2, is the most frequently expressed human common fragile site, and allelic losses at FRA3B have been observed in many types of cancer. The FHIT gene, encompassing the FRA3B region, is a tumor-suppressor gene. To identify the features of FHIT/FRA3B that might contribute to fragility, sequences of the human FHIT and the flanking PTPRG gene were compared with those of murine Fhit and Ptprg. Human and mouse orthologous genes, FHIT and Fhit, are more highly conserved through evolution than PTPRG/Ptprg and yet contain more sequence elements that are exquisitely sensitive to genomic rearrangements, such as high-flexibility regions and long interspersed nuclear element 1s, suggesting that common fragile sites serve a function. The conserved AT-rich high-flexibility regions are the most characteristic of common fragile sites.
Collapse
Affiliation(s)
- Ayumi Matsuyama
- Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th Street, Philadelphia, PA 19107, USA
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Qiu P. Recent advances in computational promoter analysis in understanding the transcriptional regulatory network. Biochem Biophys Res Commun 2003; 309:495-501. [PMID: 12963016 DOI: 10.1016/j.bbrc.2003.08.052] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
The computational approach to the study of transcriptional regulation networks has become more attractive and feasible with the rapid accumulation of complete genome sequences and the advance of high-throughput expression profiling technology. In this review, current computational approaches for understanding the transcriptional regulatory network, including promoter prediction, transcription factor binding site identification, combinatorial regulatory elements prediction, and transcription factor target gene identification, are discussed. The role of comparative genomics in transcription regulatory region analysis is also reviewed.
Collapse
Affiliation(s)
- Ping Qiu
- Bioinformatics Group and Discovery Technology Department at Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, USA.
| |
Collapse
|
37
|
Qiu P. Computational approaches for deciphering the transcriptional regulatory network by promoter analysis. ACTA ACUST UNITED AC 2003. [DOI: 10.1016/s1478-5382(03)02341-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
38
|
Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 2003; 423:241-54. [PMID: 12748633 DOI: 10.1038/nature01644] [Citation(s) in RCA: 1305] [Impact Index Per Article: 59.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2003] [Accepted: 04/01/2003] [Indexed: 11/09/2022]
Abstract
Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.
Collapse
Affiliation(s)
- Manolis Kellis
- Whitehead/MIT Center for Genome Research, Nine Cambridge Center, Cambridge, Massachusetts 02142, USA.
| | | | | | | | | |
Collapse
|
39
|
Qiu P, Qin L, Sorrentino RP, Greene JR, Wang L, Partridge NC. Comparative promoter analysis and its application in analysis of PTH-regulated gene expression. J Mol Biol 2003; 326:1327-36. [PMID: 12595247 DOI: 10.1016/s0022-2836(03)00053-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Taking advantage of the "working draft" of the human genome and the MIT shotgun assembly of the mouse genome, we performed a comparative promoter analysis of human RefSeq mRNA (sequences from GenBank's RefSeq database). By combining this analysis with a transcription factor (TF) binding site analysis using a TRANSFAC position weight matrix (PWM) search, 86% of non-specific TF sites were removed. Using a set of genes that are regulated by parathyroid hormone (PTH), a statistical analysis was performed on the conserved TF binding sites among a set of eight human and mouse genes. From among the eight genes tested, we obtained a set of 31 TFs, suggesting possible roles for associated genes in PTH-mediated pathways. All three known PTH-responsive TFs (AP1, RUNX2, CREB) were correctly predicted by this analysis as well as two other potential TFs (VDR and CEBP Delta). Additionally, a model was made to describe the TF site characteristic module of PTH-regulated genes. This model was then used to search all human RefSeq gene promoters with established human-mouse ortholog relationships to identify other PTH-regulated genes. This comparative approach combined with statistical analysis proved to be sufficiently specific to decipher critical TFs involved in PTH-regulated pathways.
Collapse
Affiliation(s)
- Ping Qiu
- Bioinformatics Group and Discovery Technology Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, USA.
| | | | | | | | | | | |
Collapse
|
40
|
Ovcharenko I, Loots GG. Comparative genomic tools for exploring the human genome. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2003; 68:283-91. [PMID: 15338628 DOI: 10.1101/sqb.2003.68.283] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Affiliation(s)
- I Ovcharenko
- EEBI Computing Division, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | | |
Collapse
|
41
|
Abstract
Phylogenetic footprinting is an approach to finding functionally important sequences in the genome that relies on detecting their high degrees of conservation across different species. A new study shows how much it improves the prediction of gene-regulatory elements in the human genome.
Collapse
Affiliation(s)
- Zhaolei Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520-8114, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520-8114, USA
| |
Collapse
|
42
|
Hardison RC, Roskin KM, Yang S, Diekhans M, Kent WJ, Weber R, Elnitski L, Li J, O'Connor M, Kolbe D, Schwartz S, Furey TS, Whelan S, Goldman N, Smit A, Miller W, Chiaromonte F, Haussler D. Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res 2003; 13:13-26. [PMID: 12529302 PMCID: PMC430971 DOI: 10.1101/gr.844103] [Citation(s) in RCA: 225] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2002] [Accepted: 11/14/2002] [Indexed: 11/24/2022]
Abstract
Six measures of evolutionary change in the human genome were studied, three derived from the aligned human and mouse genomes in conjunction with the Mouse Genome Sequencing Consortium, consisting of (1) nucleotide substitution per fourfold degenerate site in coding regions, (2) nucleotide substitution per site in relics of transposable elements active only before the human-mouse speciation, and (3) the nonaligning fraction of human DNA that is nonrepetitive or in ancestral repeats; and three derived from human genome data alone, consisting of (4) SNP density, (5) frequency of insertion of transposable elements, and (6) rate of recombination. Features 1 and 2 are measures of nucleotide substitutions at two classes of "neutral" sites, whereas 4 is a measure of recent mutations. Feature 3 is a measure dominated by deletions in mouse, whereas 5 represents insertions in human. It was found that all six vary significantly in megabase-sized regions genome-wide, and many vary together. This indicates that some regions of a genome change slowly by all processes that alter DNA, and others change faster. Regional variation in all processes is correlated with, but not completely accounted for, by GC content in human and the difference between GC content in human and mouse.
Collapse
Affiliation(s)
- Ross C Hardison
- Department of Biochemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I. Strategies and tools for whole-genome alignments. Genome Res 2003; 13:73-80. [PMID: 12529308 PMCID: PMC430965 DOI: 10.1101/gr.762503] [Citation(s) in RCA: 159] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2002] [Accepted: 11/06/2002] [Indexed: 11/25/2022]
Abstract
The availability of the assembled mouse genome makes possible, for the first time, an alignment and comparison of two large vertebrate genomes. We investigated different strategies of alignment for the subsequent analysis of conservation of genomes that are effective for assemblies of different quality. These strategies were applied to the comparison of the working draft of the human genome with the Mouse Genome Sequencing Consortium assembly, as well as other intermediate mouse assemblies. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. We obtained such coverage while preserving specificity. With a view towards the end user, we developed a suite of tools and Web sites for automatically aligning and subsequently browsing and working with whole-genome comparisons. We describe the use of these tools to identify conserved non-coding regions between the human and mouse genomes, some of which have not been identified by other methods.
Collapse
Affiliation(s)
- Olivier Couronne
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Flicek P, Keibler E, Hu P, Korf I, Brent MR. Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map. Genome Res 2003; 13:46-54. [PMID: 12529305 PMCID: PMC430948 DOI: 10.1101/gr.830003] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The availability of draft sequences for both the mouse and human genomes makes it possible, for the first time, to annotate whole mammalian genomes using comparative methods. TWINSCAN is a gene-prediction system that combines the methods of single-genome predictors like GENSCAN with information derived from genome comparison, thereby improving accuracy. Because TWINSCAN uses genomic sequence only, it is less biased toward highly and/or ubiquitously expressed genes than GENEWISE, GENOMESCAN, and other methods based on evidence derived from transcripts. We show that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. TWINSCAN improves on the prior state of the art even when alignments from only 1X coverage of the mouse genome are available. Gene prediction accuracy improves steadily from 1X through 3X, more slowly from 3X to 4X, and relatively little thereafter. The assembly and the synteny map greatly speed the computations, however. Our human annotation using the mouse assembly is conservative, predicting only 25,622 genes, and appears to be one of the best de novo annotations of the human genome to date.
Collapse
Affiliation(s)
- Paul Flicek
- Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | | | | | | | | |
Collapse
|
45
|
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES. Initial sequencing and comparative analysis of the mouse genome. Nature 2002; 420:520-62. [PMID: 12466850 DOI: 10.1038/nature01262] [Citation(s) in RCA: 4889] [Impact Index Per Article: 212.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2002] [Accepted: 10/31/2002] [Indexed: 12/18/2022]
Abstract
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Collapse
MESH Headings
- Animals
- Base Composition
- Chromosomes, Mammalian/genetics
- Conserved Sequence/genetics
- CpG Islands/genetics
- Evolution, Molecular
- Gene Expression Regulation
- Genes/genetics
- Genetic Variation/genetics
- Genome
- Genome, Human
- Genomics
- Humans
- Mice/classification
- Mice/genetics
- Mice, Knockout
- Mice, Transgenic
- Models, Animal
- Multigene Family/genetics
- Mutagenesis
- Neoplasms/genetics
- Physical Chromosome Mapping
- Proteome/genetics
- Pseudogenes/genetics
- Quantitative Trait Loci/genetics
- RNA, Untranslated/genetics
- Repetitive Sequences, Nucleic Acid/genetics
- Selection, Genetic
- Sequence Analysis, DNA
- Sex Chromosomes/genetics
- Species Specificity
- Synteny
Collapse
|
46
|
Bush JO, Lan Y, Maltby KM, Jiang R. Isolation and developmental expression analysis of Tbx22, the mouse homolog of the human X-linked cleft palate gene. Dev Dyn 2002; 225:322-6. [PMID: 12412015 DOI: 10.1002/dvdy.10154] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Mutations in the TBX22 gene have been identified recently in patients with the X-linked cleft palate and ankyloglossia syndrome, suggesting that the TBX22 transcription factor plays an important role in palate development. However, because ankyloglossia has been reported in the majority of patients with TBX22 mutations, it has been speculated that the cleft palate phenotype is secondary to defective fetal tongue movement. To understand the role of TBX22 in disease pathogenesis and in normal development, it is necessary to carry out a detailed temporal and spatial gene expression analysis. We report here the isolation and developmental expression analysis of the mouse homolog Tbx22. The mouse Tbx22 gene encodes a putative protein of 517 amino acid residues, which shares 72% overall amino acid sequence identity with the human TBX22 protein. By using interspecific backcross analysis, we have localized the Tbx22 gene to mouse chromosome X, in a region syntenic to human chromosome Xq21, where the TBX22 gene resides, indicating that Tbx22 is the ortholog of human TBX22. Our in situ hybridization analysis shows that Tbx22 is expressed in a temporally and spatially highly restricted pattern during mouse palate and tongue development. Together with the mutant phenotypes in human patients, our data indicate a primary role for Tbx22 in both palate and tongue development.
Collapse
Affiliation(s)
- Jeffrey O Bush
- Department of Biology, University of Rochester, Rochester, New York 14642, USA
| | | | | | | |
Collapse
|
47
|
Jegga AG, Sherwood SP, Carman JW, Pinski AT, Phillips JL, Pestian JP, Aronow BJ. Detection and visualization of compositionally similar cis-regulatory element clusters in orthologous and coordinately controlled genes. Genome Res 2002; 12:1408-17. [PMID: 12213778 PMCID: PMC186658 DOI: 10.1101/gr.255002] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2002] [Accepted: 07/18/2002] [Indexed: 02/02/2023]
Abstract
Evolutionarily conserved noncoding genomic sequences represent a potentially rich source for the discovery of gene regulatory regions. However, detecting and visualizing compositionally similar cis-element clusters in the context of conserved sequences is challenging. We have explored potential solutions and developed an algorithm and visualization method that combines the results of conserved sequence analyses (BLASTZ) with those of transcription factor binding site analyses (MatInspector) (http://trafac.chmcc.org). We define hits as the density of co-occurring cis-element transcription factor (TF)-binding sites measured within a 200-bp moving average window through phylogenetically conserved regions. The results are depicted as a Regulogram, in which the hit count is plotted as a function of position within each of the two genomic regions of the aligned orthologs. Within a high-scoring region, the relative arrangement of shared cis-elements within compositionally similar TF-binding site clusters is depicted in a Trafacgram. On the basis of analyses of several training data sets, the approach also allows for the detection of similarities in composition and relative arrangement of cis-element clusters within nonorthologous genes, promoters, and enhancers that exhibit coordinate regulatory properties. Known functional regulatory regions of nonorthologous and less-conserved orthologous genes frequently showed cis-element shuffling, demonstrating that compositional similarity can be more sensitive than sequence similarity. These results show that combining sequence similarity with cis-element compositional similarity provides a powerful aid for the identification of potential control regions.
Collapse
Affiliation(s)
- Anil G Jegga
- Divisions of Pediatric Informatics, University of Cincinnati, Cincinnati, Ohio, 45229 USA
| | | | | | | | | | | | | |
Collapse
|
48
|
Gregory SG, Sekhon M, Schein J, Zhao S, Osoegawa K, Scott CE, Evans RS, Burridge PW, Cox TV, Fox CA, Hutton RD, Mullenger IR, Phillips KJ, Smith J, Stalker J, Threadgold GJ, Birney E, Wylie K, Chinwalla A, Wallis J, Hillier L, Carter J, Gaige T, Jaeger S, Kremitzki C, Layman D, Maas J, McGrane R, Mead K, Walker R, Jones S, Smith M, Asano J, Bosdet I, Chan S, Chittaranjan S, Chiu R, Fjell C, Fuhrmann D, Girn N, Gray C, Guin R, Hsiao L, Krzywinski M, Kutsche R, Lee SS, Mathewson C, McLeavy C, Messervier S, Ness S, Pandoh P, Prabhu AL, Saeedi P, Smailus D, Spence L, Stott J, Taylor S, Terpstra W, Tsai M, Vardy J, Wye N, Yang G, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Shvartsbeyn A, Gebregeorgis E, Krol M, Russell D, Overton L, Malek JA, Holmes M, Heaney M, Shetty J, Feldblyum T, Nierman WC, Catanese JJ, Hubbard T, Waterston RH, Rogers J, de Jong PJ, Fraser CM, Marra M, McPherson JD, Bentley DR. A physical map of the mouse genome. Nature 2002; 418:743-50. [PMID: 12181558 DOI: 10.1038/nature00957] [Citation(s) in RCA: 205] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs were aligned to the human genome sequence on the basis of 51,486 homology matches, thus enabling use of the conserved synteny (correspondence between chromosome blocks) of the two genomes to accelerate construction of the mouse map. The map provides a framework for assembly of whole-genome shotgun sequence data, and a tile path of clones for generation of the reference sequence. Definition of the human-mouse alignment at this level of resolution enables identification of a mouse clone that corresponds to almost any position in the human genome. The human sequence may be used to facilitate construction of other mammalian genome maps using the same strategy.
Collapse
Affiliation(s)
- Simon G Gregory
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Krummel KA, Denison SR, Calhoun E, Phillips LA, Smith DI. The common fragile site FRA16D and its associated gene WWOX are highly conserved in the mouse at Fra8E1. Genes Chromosomes Cancer 2002; 34:154-67. [PMID: 11979549 DOI: 10.1002/gcc.10047] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Recently, several common fragile sites (CFSs) have been cloned and characterized, including the two most frequently observed in the human population, FRA3B and FRA16D. In addition to their high frequency of breakage, FRA3B and FRA16D colocalize with genes crossing large regions of breakage. At FRA3B, the fragile histidine triad (FHIT) gene spans more than 1 Mb, and at FRA16D, the WWOX gene spans more than 750 kb. It has also been shown that in Mus musculus, a CFS Fra14A2 and the mouse Fhit gene are conserved in the orthologous region of the genome. In this study, we positioned the ortholog to WWOX (Wox1) at chromosome band 8E1 in the mouse genome. To determine whether, like Fra14A2 and Fhit, Fra8E1 and Wox1 colocalized in the mouse, we prepared bacterial and yeast artificial chromosome probes, and we hybridized them to aphidicolin-treated mouse metaphase chromosomes. Our data demonstrate that Wox1 colocalizes with Fra8E1. Furthermore, the sequence from this region, including introns, is highly conserved over at least a 100-kb region. This evolutionary conservation suggests that the two most active CFSs share many features, and that CFSs and their associated genes may be necessary for cell survival.
Collapse
Affiliation(s)
- Kurt A Krummel
- Division of Experimental Pathology, Department of Laboratory Medicine and Pathology, Mayo Clinic Cancer Center, Rochester, Minnesota 55905, USA
| | | | | | | | | |
Collapse
|
50
|
Müller F, Blader P, Strähle U. Search for enhancers: teleost models in comparative genomic and transgenic analysis of cis regulatory elements. Bioessays 2002; 24:564-72. [PMID: 12111739 DOI: 10.1002/bies.10096] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Homology searches between DNA sequences of evolutionary distant species (phylogenetic footprinting) offer a fast detection method for regulatory sequences. Because of the small size of their genomes, tetraodontid species such as the Japanese pufferfish and green spotted pufferfish have become attractive models for comparative genomics. A disadvantage of the tetraodontid species is, however, that they cannot be bred and manipulated routinely under laboratory conditions, so these species are less attractive for developmental and genetic analysis. In contrast, an increasing arsenal of transgene techniques with the developmental model species zebrafish and medaka are being used for functional analysis of cis regulatory sequences. The main disadvantage is the much larger genome. While comparison between many loci proved the suitability of phylogenetic footprinting using fish and mammalian sequences, fast rate of change in enhancer structure and gene duplication within teleosts may obscure detection of homologies. Here we discuss the contribution and potentials provided by different teleost models for the detection and functional analysis of conserved cis-regulatory elements.
Collapse
Affiliation(s)
- Ferenc Müller
- Institute of Toxicology and Genetics, Research Center Karlsruhe, Germany.
| | | | | |
Collapse
|