1
|
Asma H, Liu L, Halfon MS. SCRMshaw: Supervised cis-regulatory module prediction for insect genomes. PLoS One 2024; 19:e0311752. [PMID: 39637210 PMCID: PMC11620701 DOI: 10.1371/journal.pone.0311752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 09/24/2024] [Indexed: 12/07/2024] Open
Abstract
As the number of sequenced insect genomes continues to grow, there is a pressing need for rapid and accurate annotation of their regulatory component. SCRMshaw is a computational tool designed to predict cis-regulatory modules ("enhancers") in the genomes of various insect species. A key advantage of SCRMshaw is its accessibility. It requires minimal resources-just a genome sequence and training data from known Drosophila regulatory sequences, which are readily available for download. Even users with modest computational skills can run SCRMshaw on a desktop computer for basic applications, although a high-performance computing cluster is recommended for optimal results. SCRMshaw can be tailored to specific needs: users can employ a single set of training data to predict enhancers associated with a particular gene expression pattern, or utilize multiple sets to provide a first-pass regulatory annotation for a newly-sequenced genome. This protocol provides an extensive update to the previously published SCRMshaw protocol and aligns with the methods used in a recent annotation of over 30 insect regulatory genomes. It includes the most recent modifications to the SCRMshaw protocol and details an end-to-end pipeline that begins with a sequenced genome and ends with a fully-annotated regulatory genome. Relevant scripts are available via GitHub, and a living protocol that will be updated as necessary is linked to this article at protocols.io.
Collapse
Affiliation(s)
- Hasiba Asma
- Departments of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| | - Luna Liu
- Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| | - Marc S. Halfon
- Departments of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY, United States of America
- Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY, United States of America
- Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY, United States of America
| |
Collapse
|
2
|
Asma H, Tieke E, Deem KD, Rahmat J, Dong T, Huang X, Tomoyasu Y, Halfon MS. Regulatory genome annotation of 33 insect species. eLife 2024; 13:RP96738. [PMID: 39392676 PMCID: PMC11469670 DOI: 10.7554/elife.96738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024] Open
Abstract
Annotation of newly sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis-regulatory modules-e.g., enhancers and silencers-that regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a diverged set of species to a significantly higher degree than random expectation would allow. Three, we demonstrate that our predictions are highly enriched for regions of accessible chromatin. Four, we achieve a validation rate in excess of 70% using in vivo reporter gene assays. As we continue to annotate both new tissues and new species, our regulatory annotation resource will provide a rich source of data for the research community and will have utility for both small-scale (single gene, single species) and large-scale (many genes, many species) studies of gene regulation. In particular, the ability to search for functionally related regulatory elements in orthologous loci should greatly facilitate studies of enhancer evolution even among distantly related species.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Ellen Tieke
- Department of Biology, Miami UniversityOxfordUnited States
| | - Kevin D Deem
- Department of Biology, Miami UniversityOxfordUnited States
| | - Jabale Rahmat
- Department of Biology, Miami UniversityOxfordUnited States
| | - Tiffany Dong
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | - Xinbo Huang
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
| | | | - Marc S Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biochemistry, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biomedical Informatics, University at Buffalo-State University of New YorkBuffaloUnited States
- Department of Biological Sciences, University at Buffalo-State University of New YorkBuffaloUnited States
| |
Collapse
|
3
|
Wudarski J, Aliabadi S, Gulia-Nuss M. Arthropod promoters for genetic control of disease vectors. Trends Parasitol 2024; 40:619-632. [PMID: 38824066 PMCID: PMC11223965 DOI: 10.1016/j.pt.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 06/03/2024]
Abstract
Vector-borne diseases (VBDs) impose devastating effects on human health and a heavy financial burden. Malaria, Lyme disease, and dengue fever are just a few examples of VBDs that cause severe illnesses. The current strategies to control VBDs consist mainly of environmental modification and chemical use, and to a small extent, genetic approaches. The genetic approaches, including transgenesis/genome modification and gene-drive technologies, provide the basis for developing new tools for VBD prevention by suppressing vector populations or reducing their capacity to transmit pathogens. The regulatory elements such as promoters are required for a robust sex-, tissue-, and stage-specific transgene expression. As discussed in this review, information on the regulatory elements is available for mosquito vectors but is scant for other vectors.
Collapse
Affiliation(s)
- Jakub Wudarski
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, USA
| | - Simindokht Aliabadi
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, USA
| | - Monika Gulia-Nuss
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, NV, USA.
| |
Collapse
|
4
|
Schember I, Reid W, Sterling-Lentsch G, Halfon MS. Conserved and novel enhancers in the Aedes aegypti single-minded locus recapitulate embryonic ventral midline gene expression. PLoS Genet 2024; 20:e1010891. [PMID: 38683842 PMCID: PMC11081499 DOI: 10.1371/journal.pgen.1010891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 05/09/2024] [Accepted: 04/16/2024] [Indexed: 05/02/2024] Open
Abstract
Transcriptional cis-regulatory modules, e.g., enhancers, control the time and location of metazoan gene expression. While changes in enhancers can provide a powerful force for evolution, there is also significant deep conservation of enhancers for developmentally important genes, with function and sequence characteristics maintained over hundreds of millions of years of divergence. Not well understood, however, is how the overall regulatory composition of a locus evolves, with important outstanding questions such as how many enhancers are conserved vs. novel, and to what extent are the locations of conserved enhancers within a locus maintained? We begin here to address these questions with a comparison of the respective single-minded (sim) loci in the two dipteran species Drosophila melanogaster (fruit fly) and Aedes aegypti (mosquito). sim encodes a highly conserved transcription factor that mediates development of the arthropod embryonic ventral midline. We identify two enhancers in the A. aegypti sim locus and demonstrate that they function equivalently in both transgenic flies and transgenic mosquitoes. One A. aegypti enhancer is highly similar to known Drosophila counterparts in its activity, location, and autoregulatory capability. The other differs from any known Drosophila sim enhancers with a novel location, failure to autoregulate, and regulation of expression in a unique subset of midline cells. Our results suggest that the conserved pattern of sim expression in the two species is the result of both conserved and novel regulatory sequences. Further examination of this locus will help to illuminate how the overall regulatory landscape of a conserved developmental gene evolves.
Collapse
Affiliation(s)
- Isabella Schember
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - William Reid
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - Geyenna Sterling-Lentsch
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
| | - Marc S. Halfon
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, New York, United States of America
- New York State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, New York, United States of America
| |
Collapse
|
5
|
Nowling RJ, Njoya K, Peters JG, Riehle MM. Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique. Front Cell Infect Microbiol 2023; 13:1182567. [PMID: 37600946 PMCID: PMC10433755 DOI: 10.3389/fcimb.2023.1182567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
Introduction Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers. Methods Here, machine learning models are employed to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Results and discussion Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction.
Collapse
Affiliation(s)
- Ronald J. Nowling
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States
| | - Kimani Njoya
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - John G. Peters
- Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States
| | - Michelle M. Riehle
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
6
|
Bottino-Rojas V, James AA. Use of Insect Promoters in Genetic Engineering to Control Mosquito-Borne Diseases. Biomolecules 2022; 13:16. [PMID: 36671401 PMCID: PMC9855440 DOI: 10.3390/biom13010016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 12/16/2022] [Accepted: 12/18/2022] [Indexed: 12/24/2022] Open
Abstract
Mosquito transgenesis and gene-drive technologies provide the basis for developing promising new tools for vector-borne disease prevention by either suppressing wild mosquito populations or reducing their capacity from transmitting pathogens. Many studies of the regulatory DNA and promoters of genes with robust sex-, tissue- and stage-specific expression profiles have supported the development of new tools and strategies that could bring mosquito-borne diseases under control. Although the list of regulatory elements available is significant, only a limited set of those can reliably drive spatial-temporal expression. Here, we review the advances in our ability to express beneficial and other genes in mosquitoes, and highlight the information needed for the development of new mosquito-control and anti-disease strategies.
Collapse
Affiliation(s)
- Vanessa Bottino-Rojas
- Department of Microbiology and Molecular Genetics, University of California, Irvine, CA 92697, USA
| | - Anthony A. James
- Department of Microbiology and Molecular Genetics, University of California, Irvine, CA 92697, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, CA 92697, USA
| |
Collapse
|
7
|
Giraldo-Calderón GI, Harb OS, Kelly SA, Rund SS, Roos DS, McDowell MA. VectorBase.org updates: bioinformatic resources for invertebrate vectors of human pathogens and related organisms. CURRENT OPINION IN INSECT SCIENCE 2022; 50:100860. [PMID: 34864248 PMCID: PMC9133010 DOI: 10.1016/j.cois.2021.11.008] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 11/29/2021] [Indexed: 06/12/2023]
Abstract
VectorBase (VectorBase.org) is part of the VEuPathDB Bioinformatics Resource Center, providing free online access to multi-omics and population biology data, focusing on arthropod vectors and invertebrates of importance to human health. VectorBase includes genomics and functional genomics data from bed bugs, biting midges, body lice, kissing bugs, mites, mosquitoes, sand flies, ticks, tsetse flies, stable flies, house flies, fruit flies, and a snail intermediate host. Tools include the Search Strategy system and MapVEu, enabling users to interrogate and visualize diverse 'omics and population-level data using a graphical interface (no programming experience required). Users can also analyze their own private data, such as transcriptomic sequences, exploring their results in the context of other publicly-available information in the database. Help Desk: help@vectorbase.org.
Collapse
Affiliation(s)
- Gloria I Giraldo-Calderón
- Department of Biological Sciences, Eck Institute for Global Health, University Notre Dame, Notre Dame, IN 46556, USA; Dept. Ciencias Biológicas & Dept. Ciencias Básicas Médicas, Universidad Icesi, Calle 18 No 122-135, Cali, Colombia
| | - Omar S Harb
- Department of Biology, University of Pennsylvania, Philadelphia 19104, PA, USA
| | - Sarah A Kelly
- Department of Life Sciences, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Samuel Sc Rund
- Department of Biological Sciences, Eck Institute for Global Health, University Notre Dame, Notre Dame, IN 46556, USA
| | - David S Roos
- Department of Biology, University of Pennsylvania, Philadelphia 19104, PA, USA
| | - Mary Ann McDowell
- Department of Biological Sciences, Eck Institute for Global Health, University Notre Dame, Notre Dame, IN 46556, USA.
| |
Collapse
|
8
|
Holm I, Nardini L, Pain A, Bischoff E, Anderson CE, Zongo S, Guelbeogo WM, Sagnon N, Gohl DM, Nowling RJ, Vernick KD, Riehle MM. Comprehensive Genomic Discovery of Non-Coding Transcriptional Enhancers in the African Malaria Vector Anopheles coluzzii. Front Genet 2022; 12:785934. [PMID: 35082832 PMCID: PMC8784733 DOI: 10.3389/fgene.2021.785934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/10/2021] [Indexed: 11/24/2022] Open
Abstract
Almost all regulation of gene expression in eukaryotic genomes is mediated by the action of distant non-coding transcriptional enhancers upon proximal gene promoters. Enhancer locations cannot be accurately predicted bioinformatically because of the absence of a defined sequence code, and thus functional assays are required for their direct detection. Here we used a massively parallel reporter assay, Self-Transcribing Active Regulatory Region sequencing (STARR-seq), to generate the first comprehensive genome-wide map of enhancers in Anopheles coluzzii, a major African malaria vector in the Gambiae species complex. The screen was carried out by transfecting reporter libraries created from the genomic DNA of 60 wild A. coluzzii from Burkina Faso into A. coluzzii 4a3A cells, in order to functionally query enhancer activity of the natural population within the homologous cellular context. We report a catalog of 3,288 active genomic enhancers that were significant across three biological replicates, 74% of them located in intergenic and intronic regions. The STARR-seq enhancer screen is chromatin-free and thus detects inherent activity of a comprehensive catalog of enhancers that may be restricted in vivo to specific cell types or developmental stages. Testing of a validation panel of enhancer candidates using manual luciferase assays confirmed enhancer function in 26 of 28 (93%) of the candidates over a wide dynamic range of activity from two to at least 16-fold activity above baseline. The enhancers occupy only 0.7% of the genome, and display distinct composition features. The enhancer compartment is significantly enriched for 15 transcription factor binding site signatures, and displays divergence for specific dinucleotide repeats, as compared to matched non-enhancer genomic controls. The genome-wide catalog of A. coluzzii enhancers is publicly available in a simple searchable graphic format. This enhancer catalogue will be valuable in linking genetic and phenotypic variation, in identifying regulatory elements that could be employed in vector manipulation, and in better targeting of chromosome editing to minimize extraneous regulation influences on the introduced sequences. Importance: Understanding the role of the non-coding regulatory genome in complex disease phenotypes is essential, but even in well-characterized model organisms, identification of regulatory regions within the vast non-coding genome remains a challenge. We used a large-scale assay to generate a genome wide map of transcriptional enhancers. Such a catalogue for the important malaria vector, Anopheles coluzzii, will be an important research tool as the role of non-coding regulatory variation in differential susceptibility to malaria infection is explored and as a public resource for research on this important insect vector of disease.
Collapse
Affiliation(s)
- Inge Holm
- Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
| | - Luisa Nardini
- Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
| | - Adrien Pain
- Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France.,Institut Pasteur, Université de Paris, Hub de Bioinformatique et Biostatistique, Paris, France
| | - Emmanuel Bischoff
- Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
| | - Cameron E Anderson
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Soumanaba Zongo
- Centre National de Recherche et de Formation sur le Paludisme (CNRFP), Ministry of Health, Ouagadougou, Burkina Faso
| | - Wamdaogo M Guelbeogo
- Centre National de Recherche et de Formation sur le Paludisme (CNRFP), Ministry of Health, Ouagadougou, Burkina Faso
| | - N'Fale Sagnon
- Centre National de Recherche et de Formation sur le Paludisme (CNRFP), Ministry of Health, Ouagadougou, Burkina Faso
| | - Daryl M Gohl
- University of Minnesota Genomics Center, Minneapolis, MN, United States.,Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN, United States
| | - Ronald J Nowling
- Department of Electrical Engineering and Computer Science, Milwaukee School of Engineering (MSOE), Milwaukee, WI, United States
| | - Kenneth D Vernick
- Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
| | - Michelle M Riehle
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
9
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|