1
|
Reis LM, Sorokina EA, Dudakova L, Moravikova J, Skalicka P, Malinka F, Seese SE, Thompson S, Bardakjian T, Capasso J, Allen W, Glaser T, Levin AV, Schneider A, Khan A, Liskova P, Semina EV. Comprehensive phenotypic and functional analysis of dominant and recessive FOXE3 alleles in ocular developmental disorders. Hum Mol Genet 2021; 30:1591-1606. [PMID: 34046667 PMCID: PMC8369840 DOI: 10.1093/hmg/ddab142] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/18/2021] [Accepted: 05/19/2021] [Indexed: 11/16/2022] Open
Abstract
The forkhead transcription factor FOXE3 is critical for vertebrate eye development. Recessive and dominant variants cause human ocular disease but the full range of phenotypes and mechanisms of action for the two classes of variants are unknown. We identified FOXE3 variants in individuals with congenital eye malformations and carried out in vitro functional analysis on selected alleles. Sixteen new recessive and dominant families, including six novel variants, were identified. Analysis of new and previously reported genetic and clinical data demonstrated a broad phenotypic range with an overlap between recessive and dominant disease. Most families with recessive alleles, composed of truncating and forkhead-domain missense variants, had severe corneal opacity (90%; sclerocornea in 47%), aphakia (83%) and microphthalmia (80%), but some had milder features including isolated cataract. The phenotype was most variable for recessive missense variants, suggesting that the functional consequences may be highly dependent on the type of amino acid substitution and its position. When assessed, aniridia or iris hypoplasia were noted in 89% and optic nerve anomalies in 60% of recessive cases, indicating that these defects are also common and may be underrecognized. In dominant pedigrees, caused by extension variants, normal eye size (96%), cataracts (99%) and variable anterior segment anomalies were seen in most, but some individuals had microphthalmia, aphakia or sclerocornea, more typical of recessive disease. Functional studies identified variable effects on the protein stability, DNA binding, nuclear localization and transcriptional activity for recessive FOXE3 variants, whereas dominant alleles showed severe impairment in all areas and dominant-negative characteristics.
Collapse
Affiliation(s)
- Linda M Reis
- Department of Pediatrics and Children's Research Institute at the Medical College of Wisconsin and Children's Hospital of Wisconsin, Milwaukee, WI 53226, USA
| | - Elena A Sorokina
- Department of Pediatrics and Children's Research Institute at the Medical College of Wisconsin and Children's Hospital of Wisconsin, Milwaukee, WI 53226, USA
| | - Lubica Dudakova
- Department of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Jana Moravikova
- Department of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Pavlina Skalicka
- Department of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic.,Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Frantisek Malinka
- Department of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic.,Department of Computer Science, Czech Technical University in Prague, Prague, Czech Republic
| | - Sarah E Seese
- Department of Pediatrics and Children's Research Institute at the Medical College of Wisconsin and Children's Hospital of Wisconsin, Milwaukee, WI 53226, USA
| | - Samuel Thompson
- Department of Pediatrics and Children's Research Institute at the Medical College of Wisconsin and Children's Hospital of Wisconsin, Milwaukee, WI 53226, USA
| | - Tanya Bardakjian
- Department of Pediatrics, Albert Einstein Medical Center, Philadelphia, PA 19141, USA
| | - Jenina Capasso
- Pediatric Ophthalmology and Ocular Genetics, Flaum Eye Institute, Pediatric Genetics, Golisano Children's Hospital, University of Rochester, Rochester, NY 14534 USA
| | - William Allen
- Fullerton Genetics Center, Mission Hospitals, HCA, Asheville, NC, 28803 USA
| | - Tom Glaser
- Cell Biology and Human Anatomy Department, UC-Davis School of Medicine, Davis, CA 95616, USA
| | - Alex V Levin
- Pediatric Ophthalmology and Ocular Genetics, Flaum Eye Institute, Pediatric Genetics, Golisano Children's Hospital, University of Rochester, Rochester, NY 14534 USA
| | - Adele Schneider
- Department of Pediatrics, Albert Einstein Medical Center, Philadelphia, PA 19141, USA
| | - Ayesha Khan
- Pediatric Ophthalmology & Strabismus Unit, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan.,Consultant Pediatric Ophthalmologist, Al Jalila Children's Specialty Hospital, United Arab Emirates
| | - Petra Liskova
- Department of Pediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic.,Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital, Prague, Czech Republic
| | - Elena V Semina
- Department of Pediatrics and Children's Research Institute at the Medical College of Wisconsin and Children's Hospital of Wisconsin, Milwaukee, WI 53226, USA.,Departments of Ophthalmology and Cell Biology, Neurobiology and Anatomy at the Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
2
|
Reineke AR, Bornberg-Bauer E, Gu J. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes. Nucleic Acids Res 2011; 39:6029-43. [PMID: 21470961 PMCID: PMC3152334 DOI: 10.1093/nar/gkr179] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Revised: 02/22/2011] [Accepted: 03/15/2011] [Indexed: 12/17/2022] Open
Abstract
The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of ~100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs.
Collapse
Affiliation(s)
| | | | - Jenny Gu
- Institute for Evolution and Biodiversity, University of Münster, Hüfferstrasse 1, 48149, Münster, Germany
| |
Collapse
|
3
|
Bernard V, Lecharny A, Brunaud V. Improved detection of motifs with preferential location in promoters. Genome 2011; 53:739-52. [PMID: 20924423 DOI: 10.1139/g10-042] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Many transcription factor binding sites (TFBSs) involved in gene expression regulation are preferentially located relative to the transcription start site. This property is exploited in in silico prediction approaches, one of which involves studying the local overrepresentation of motifs using a sliding window to scan promoters with considerable accuracy. Nevertheless, the consequences of the choice of the sliding window size have never before been analysed. We propose an automatic adaptation of this size to each motif distribution profile. This approach allows a better characterization of the topological constraints of the motifs and the lists of genes containing them. Moreover, our approach allowed us to highlight a nonconstant frequency of occurrence of spurious motifs that could be counter-selected close to their functional area. Therefore, to improve the accuracy of in silico prediction of TFBSs and the sensitivity of the promoter cartography, we propose, in addition to automatic adaptation of window size, consideration of the nonconstant frequency of motifs in promoters.
Collapse
Affiliation(s)
- Virginie Bernard
- Unité de Recherche en Génomique Végétale (URGV), UMR INRA 1165 - CNRS 8114 - UEVE, 91057 Evry CEDEX, France
| | | | | |
Collapse
|
4
|
The effect of orthology and coregulation on detecting regulatory motifs. PLoS One 2010; 5:e8938. [PMID: 20140085 PMCID: PMC2815771 DOI: 10.1371/journal.pone.0008938] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 01/05/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. METHODOLOGY We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. RESULTS AND CONCLUSIONS Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
Collapse
|
5
|
Francke C, Kerkhoven R, Wels M, Siezen RJ. A generic approach to identify Transcription Factor-specific operator motifs; Inferences for LacI-family mediated regulation in Lactobacillus plantarum WCFS1. BMC Genomics 2008; 9:145. [PMID: 18371204 PMCID: PMC2329647 DOI: 10.1186/1471-2164-9-145] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 03/27/2008] [Indexed: 12/18/2022] Open
Abstract
Background A key problem in the sequence-based reconstruction of regulatory networks in bacteria is the lack of specificity in operator predictions. The problem is especially prominent in the identification of transcription factor (TF) specific binding sites. More in particular, homologous TFs are abundant and, as they are structurally very similar, it proves difficult to distinguish the related operators by automated means. This also holds for the LacI-family, a family of TFs that is well-studied and has many members that fulfill crucial roles in the control of carbohydrate catabolism in bacteria including catabolite repression. To overcome the specificity problem, a comprehensive footprinting approach was formulated to identify TF-specific operator motifs and was applied to the LacI-family of TFs in the model gram positive organism, Lactobacillus plantarum WCFS1. The main premise behind the approach is that only orthologous sequences that share orthologous genomic context will share equivalent regulatory sites. Results When the approach was applied to the 12 LacI-family TFs of the model species, a specific operator motif was identified for each of them. With the TF-specific operator motifs, potential binding sites were found on the genome and putative minimal regulons could be defined. Moreover, specific inducers could in most cases be linked to the TFs through phylogeny, thereby unveiling the biological role of these regulons. The operator predictions indicated that the LacI-family TFs can be separated into two subfamilies with clearly distinct operator motifs. They also established that the operator related to the 'global' regulator CcpA is not inherently distinct from that of other LacI-family members, only more degenerate. Analysis of the chromosomal position of the identified putative binding sites confirmed that the LacI-family TFs are mostly auto-regulatory and relate mainly to carbohydrate uptake and catabolism. Conclusion Our approach to identify specific operator motifs for different TF-family members is specific and in essence generic. The data infer that, although the specific operator motifs can be used to identify minimal regulons, experimental knowledge on TF activity especially is essential to determine complete regulons as well as to estimate the overlap between TF affinities.
Collapse
Affiliation(s)
- Christof Francke
- TI Food and Nutrition, P,O, Box 557, 6700AN Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
6
|
Sanges R, Kalmar E, Claudiani P, D'Amato M, Muller F, Stupka E. Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage. Genome Biol 2007; 7:R56. [PMID: 16859531 PMCID: PMC1779573 DOI: 10.1186/gb-2006-7-7-r56] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2006] [Revised: 04/05/2006] [Accepted: 06/27/2006] [Indexed: 02/06/2023] Open
Abstract
Alignment of orthologous vertebrate loci reveals that a significant proportion of conserved cis-regulatory elements have undergone shuffling during evolution. Background All vertebrates share a remarkable degree of similarity in their development as well as in the basic functions of their cells. Despite this, attempts at unearthing genome-wide regulatory elements conserved throughout the vertebrate lineage using BLAST-like approaches have thus far detected noncoding conservation in only a few hundred genes, mostly associated with regulation of transcription and development. Results We used a unique combination of tools to obtain regional global-local alignments of orthologous loci. This approach takes into account shuffling of regulatory regions that are likely to occur over evolutionary distances greater than those separating mammalian genomes. This approach revealed one order of magnitude more vertebrate conserved elements than was previously reported in over 2,000 genes, including a high number of genes found in the membrane and extracellular regions. Our analysis revealed that 72% of the elements identified have undergone shuffling. We tested the ability of the elements identified to enhance transcription in zebrafish embryos and compared their activity with a set of control fragments. We found that more than 80% of the elements tested were able to enhance transcription significantly, prevalently in a tissue-restricted manner corresponding to the expression domain of the neighboring gene. Conclusion Our work elucidates the importance of shuffling in the detection of cis-regulatory elements. It also elucidates how similarities across the vertebrate lineage, which go well beyond development, can be explained not only within the realm of coding genes but also in that of the sequences that ultimately govern their expression.
Collapse
Affiliation(s)
- Remo Sanges
- Telethon Institute of Genetics and Medicine, Via P. Castellino, 80131 Napoli, Italy
| | - Eva Kalmar
- Institute of Toxicology and Genetics, Forschungzenbrum, Karlsruhe, Postfach 3640, D-76021 Karlsruhe, Germany
| | - Pamela Claudiani
- Telethon Institute of Genetics and Medicine, Via P. Castellino, 80131 Napoli, Italy
| | - Maria D'Amato
- Telethon Institute of Genetics and Medicine, Via P. Castellino, 80131 Napoli, Italy
| | - Ferenc Muller
- Institute of Toxicology and Genetics, Forschungzenbrum, Karlsruhe, Postfach 3640, D-76021 Karlsruhe, Germany
| | - Elia Stupka
- Telethon Institute of Genetics and Medicine, Via P. Castellino, 80131 Napoli, Italy
| |
Collapse
|
7
|
Freeling M, Rapaka L, Lyons E, Pedersen B, Thomas BC. G-boxes, bigfoot genes, and environmental response: characterization of intragenomic conserved noncoding sequences in Arabidopsis. THE PLANT CELL 2007; 19:1441-57. [PMID: 17496117 PMCID: PMC1913728 DOI: 10.1105/tpc.107.050419] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2007] [Revised: 03/10/2007] [Accepted: 04/19/2007] [Indexed: 05/15/2023]
Abstract
A tetraploidy left Arabidopsis thaliana with 6358 pairs of homoeologs that, when aligned, generated 14,944 intragenomic conserved noncoding sequences (CNSs). Our previous work assembled these phylogenetic footprints into a database. We show that known transcription factor (TF) binding motifs, including the G-box, are overrepresented in these CNSs. A total of 254 genes spanning long lengths of CNS-rich chromosomes (Bigfoot) dominate this database. Therefore, we made subdatabases: one containing Bigfoot genes and the other containing genes with three to five CNSs (Smallfoot). Bigfoot genes are generally TFs that respond to signals, with their modal CNS positioned 3.1 kb 5' from the ATG. Smallfoot genes encode components of signal transduction machinery, the cytoskeleton, or involve transcription. We queried each subdatabase with each possible 7-nucleotide sequence. Among hundreds of hits, most were purified from CNSs, and almost all of those significantly enriched in CNSs had no experimental history. The 7-mers in CNSs are not 5'- to 3'-oriented in Bigfoot genes but are often oriented in Smallfoot genes. CNSs with one G-box tend to have two G-boxes. CNSs were shared with the homoeolog only and with no other gene, suggesting that binding site turnover impedes detection. Bigfoot genes may function in adaptation to environmental change.
Collapse
Affiliation(s)
- Michael Freeling
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA.
| | | | | | | | | |
Collapse
|
8
|
Evolutionary hierarchies of conserved blocks in 5'-noncoding sequences of dicot rbcS genes. BMC Evol Biol 2007; 7:51. [PMID: 17407546 PMCID: PMC1852302 DOI: 10.1186/1471-2148-7-51] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 04/02/2007] [Indexed: 11/10/2022] Open
Abstract
Background Evolutionary processes in gene regulatory regions are major determinants of organismal evolution, but exceptionally challenging to study. We explored the possibilities of evolutionary analysis of phylogenetic footprints in 5'-noncoding sequences (NCS) from 27 ribulose-1,5-bisphosphate carboxylase small subunit (rbcS) genes, from three dicot families (Brassicaceae, Fabaceae and Solanaceae). Results Sequences of up to 400 bp encompassing proximal promoter and 5'-untranslated regions were analyzed. We conducted phylogenetic footprinting by several alternative methods: generalized Lempel-Ziv complexity (CLZ), multiple alignments with DIALIGN and ALIGN-M, and the MOTIF SAMPLER Gibbs sampling algorithm. These tools collectively defined 36 conserved blocks of mean length 12.8 bp. On average, 12.5 blocks were found in each 5'-NCS. The blocks occurred in arrays whose relative order was absolutely conserved, confirming the existence of 'conserved modular arrays' in promoters. Identities of half of the blocks confirmed past rbcS research, including versions of the I-box, G-box, and GT-1 sites such as Box II. Over 90% of blocks overlapped DNase-protected regions in tomato 5'-NCS. Regions characterized by low CLZ in sliding-window analyses were also frequently associated with DNase-protection. Blocks could be assigned to evolutionary hierarchies based on taxonomic distribution and estimated age. Lineage divergence dates implied that 13 blocks found in all three plant families were of Cretaceous antiquity, while other family-specific blocks were much younger. Blocks were also dated by formation of multigene families, using genome and coding sequence information. Dendrograms of evolutionary relations of the 5'-NCS were produced by several methods, including: cluster analysis using pairwise CLZ values; evolutionary trees of DIALIGN sequence alignments; and cladistic analysis of conserved blocks. Conclusion Dicot 5'-NCS contain conserved modular arrays of recurrent sequence blocks, which are coincident with functional elements. These blocks are amenable to evolutionary interpretation as hierarchies in which ancient, taxonomically widespread blocks can be distinguished from more recent, taxon-specific ones.
Collapse
|
9
|
Vandepoele K, Casneuf T, Van de Peer Y. Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics. Genome Biol 2007; 7:R103. [PMID: 17090307 PMCID: PMC1794593 DOI: 10.1186/gb-2006-7-11-r103] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Revised: 09/15/2006] [Accepted: 11/07/2006] [Indexed: 11/30/2022] Open
Abstract
A strategy combining classical motif overrepresentation in co-regulated genes with comparative footprinting is applied to identify 80 transcription factor binding sites and 139 regulatory modules in Arabidopsis thaliana. Background Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. Results Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other. Conclusion These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Technologiepark, B-9052 Ghent, Belgium
| | - Tineke Casneuf
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Technologiepark, B-9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent University, Technologiepark, B-9052 Ghent, Belgium
| |
Collapse
|