1
|
Garza AB, Garcia R, Solis LM, Halfon MS, Girgis HZ. EnhancerTracker: Comparing cell-type-specific enhancer activity of DNA sequence triplets via an ensemble of deep convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.23.573198. [PMID: 38187673 PMCID: PMC10769370 DOI: 10.1101/2023.12.23.573198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Motivation Transcriptional enhancers - unlike promoters - are unrestrained by distance or strand orientation with respect to their target genes, making their computational identification a challenge. Further, there are insufficient numbers of confirmed enhancers for many cell types, preventing robust training of machine-learning-based models for enhancer prediction for such cell types. Results We present EnhancerTracker , a novel tool that leverages an ensemble of deep separable convolutional neural networks to identify cell-type-specific enhancers with the need of only two confirmed enhancers. EnhancerTracker is trained, validated, and tested on 52,789 putative enhancers obtained from the FANTOM5 Project and control sequences derived from the human genome. Unlike available tools, which accept one sequence at a time, the input to our tool is three sequences; the first two are enhancers active in the same cell type. EnhancerTracker outputs 1 if the third sequence is an enhancer active in the same cell type(s) where the first two enhancers are active. It outputs 0 otherwise. On a held-out set (15%), EnhancerTracker achieved an accuracy of 64%, a specificity of 93%, a recall of 35%, a precision of 84%, and an F1 score of 49%. Availability and implementation https://github.com/BioinformaticsToolsmith/EnhancerTracker. Contact hani.girgis@tamuk.edu.
Collapse
|
2
|
Rayhan M, Siddiquee MF, Shahriar A, Ahmed H, Mahmud AR, Alam MS, Uddin MR, Acharjee M, Shimu MSS, Shamsir MS, Emran TB. Structural characterization of a novel luciferase-like-monooxygenase from Pseudomonas meliae– an in-silico approach.. [DOI: 10.1101/2023.03.27.534437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
AbstractBackgroundLuciferase is a well-known oxidative enzyme that produces bioluminescence. ThePseudomonas meliaeis a plant pathogen that causes wood rot on nectarine and peach and possesses a luciferase-like monooxygenase. After activation, it produces bioluminescence, and the pathogen’s bioluminescence is a visual indicator of diseased plants.MethodsThe present study aims to model and characterize the luciferase-like monooxygenase protein inP. meliaefor its similarity to well-established luciferase. In this study, the luciferase-like monooxygenase fromP. meliaeinfects chinaberry plants has been modeled first and then studied by comparing it with existing known luciferase. Also, the similarities between uncharacterized luciferase fromP. meliaeand template fromGeobacillus thermodenitrificanswere analyzed to find the novelty ofP. meliae.ResultsThe results suggest that the absence of bioluminescence inP. meliaecould be due to the evolutionary mutation in positions 138 and 311. The active site remains identical except for two amino acids;P. meliaeTyr138 instead of His138 and Leu311 instead of His311. Therefore, theP. meliaewill have a potential future application, and mutation of the residues 138 and 311 can be restored luciferase light-emitting ability.ConclusionsThis study will help further improve, activate, and repurpose the luciferase fromP. meliaeas a reporter for gene expression.
Collapse
|
3
|
Girgis HZ, James BT, Luczak BB. Identity: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models. NAR Genom Bioinform 2021; 3:lqab001. [PMID: 33554117 PMCID: PMC7850047 DOI: 10.1093/nargab/lqab001] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/07/2020] [Accepted: 01/08/2021] [Indexed: 11/12/2022] Open
Abstract
Pairwise global alignment is a fundamental step in sequence analysis. Optimal alignment algorithms are quadratic-slow especially on long sequences. In many applications that involve large sequence datasets, all what is needed is calculating the identity scores (percentage of identical nucleotides in an optimal alignment-including gaps-of two sequences); there is no need for visualizing how every two sequences are aligned. For these applications, we propose Identity, which produces global identity scores for a large number of pairs of DNA sequences using alignment-free methods and self-supervised general linear models. For the first time, the new tool can predict pairwise identity scores in linear time and space. On two large-scale sequence databases, Identity provided the best compromise between sensitivity and precision while being faster than BLAST, Mash, MUMmer4 and USEARCH by 2-80 times. Identity was the best performing tool when searching for low-identity matches. While constructing phylogenetic trees from about 6000 transcripts, the tree due to the scores reported by Identity was the closest to the reference tree (in contrast to andi, FSWM and Mash). Identity is capable of producing pairwise identity scores of millions-of-nucleotides-long bacterial genomes; this task cannot be accomplished by any global-alignment-based tool. Availability: https://github.com/BioinformaticsToolsmith/Identity.
Collapse
Affiliation(s)
- Hani Z Girgis
- Bioinformatics Toolsmith Laboratory, Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, 700 University Boulevard, Kingsville, TX 78363, USA
| | - Benjamin T James
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA
| | - Brian B Luczak
- Department of Mathematics, Vanderbilt University, 1326 Stevenson Center Lane, Nashville, TN 3721, USA
| |
Collapse
|
4
|
Valencia JD, Girgis HZ. LtrDetector: A tool-suite for detecting long terminal repeat retrotransposons de-novo. BMC Genomics 2019; 20:450. [PMID: 31159720 PMCID: PMC6547461 DOI: 10.1186/s12864-019-5796-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 05/14/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Long terminal repeat retrotransposons are the most abundant transposons in plants. They play important roles in alternative splicing, recombination, gene regulation, and defense mechanisms. Large-scale sequencing projects for plant genomes are currently underway. Software tools are important for annotating long terminal repeat retrotransposons in these newly available genomes. However, the available tools are not very sensitive to known elements and perform inconsistently on different genomes. Some are hard to install or obsolete. They may struggle to process large plant genomes. None can be executed in parallel out of the box and very few have features to support visual review of new elements. To overcome these limitations, we developed LtrDetector, which uses techniques inspired by signal-processing. RESULTS We compared LtrDetector to LTR_Finder and LTRharvest, the two most successful predecessor tools, on six plant genomes. For each organism, we constructed a ground truth data set based on queries from a consensus sequence database. According to this evaluation, LtrDetector was the most sensitive tool, achieving 16-23% improvement in sensitivity over LTRharvest and 21% improvement over LTR_Finder. All three tools had low false positive rates, with LtrDetector achieving 98.2% precision, in between its two competitors. Overall, LtrDetector provides the best compromise between high sensitivity and low false positive rate while requiring moderate time and utilizing memory available on personal computers. CONCLUSIONS LtrDetector uses a novel methodology revolving around k-mer distributions, which allows it to produce high-quality results using relatively lightweight procedures. It is easy to install and use. It is not species specific, performing well using its default parameters on genomes of varying size and repeat content. It is automatically configured for parallel execution and runs efficiently on an ordinary personal computer. It includes a k-mer scores visualization tool to facilitate manual review of the identified elements. These features make LtrDetector an attractive tool for future annotation projects involving long terminal repeat retrotransposons.
Collapse
Affiliation(s)
- Joseph D Valencia
- The Bioinformatics Toolsmith Laboratory, Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104, OK, USA
| | - Hani Z Girgis
- The Bioinformatics Toolsmith Laboratory, Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104, OK, USA.
| |
Collapse
|
5
|
Steuernagel L, Meckbach C, Heinrich F, Zeidler S, Schmitt AO, Gültas M. Computational identification of tissue-specific transcription factor cooperation in ten cattle tissues. PLoS One 2019; 14:e0216475. [PMID: 31095599 PMCID: PMC6522001 DOI: 10.1371/journal.pone.0216475] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 04/22/2019] [Indexed: 01/01/2023] Open
Abstract
Transcription factors (TFs) are a special class of DNA-binding proteins that orchestrate gene transcription by recruiting other TFs, co-activators or co-repressors. Their combinatorial interplay in higher organisms maintains homeostasis and governs cell identity by finely controlling and regulating tissue-specific gene expression. Despite the rich literature on the importance of cooperative TFs for deciphering the mechanisms of individual regulatory programs that control tissue specificity in several organisms such as human, mouse, or Drosophila melanogaster, to date, there is still need for a comprehensive study to detect specific TF cooperations in regulatory processes of cattle tissues. To address the needs of knowledge about specific combinatorial gene regulation in cattle tissues, we made use of three publicly available RNA-seq datasets and obtained tissue-specific gene (TSG) sets for ten tissues (heart, lung, liver, kidney, duodenum, muscle tissue, adipose tissue, colon, spleen and testis). By analyzing these TSG-sets, tissue-specific TF cooperations of each tissue have been identified. The results reveal that similar to the combinatorial regulatory events of model organisms, TFs change their partners depending on their biological functions in different tissues. Particularly with regard to preferential partner choice of the transcription factors STAT3 and NR2C2, this phenomenon has been highlighted with their five different specific cooperation partners in multiple tissues. The information about cooperative TFs could be promising: i) to understand the molecular mechanisms of regulating processes; and ii) to extend the existing knowledge on the importance of single TFs in cattle tissues.
Collapse
Affiliation(s)
- Lukas Steuernagel
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
| | - Cornelia Meckbach
- Institute of Medical Bioinformatics, Goldschmidtstraße 1, University Medical Center Göttingen, Georg-August-University, 37077 Göttingen, Germany
| | - Felix Heinrich
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
| | - Sebastian Zeidler
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
| | - Armin O. Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075, Göttingen, Germany
| | - Mehmet Gültas
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Albrecht-Thaer-Weg 3, Georg-August University, 37075, Göttingen, Germany
- * E-mail:
| |
Collapse
|
6
|
Girgis HZ, Velasco A, Reyes ZE. HebbPlot: an intelligent tool for learning and visualizing chromatin mark signatures. BMC Bioinformatics 2018; 19:310. [PMID: 30176808 PMCID: PMC6122555 DOI: 10.1186/s12859-018-2312-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 08/14/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Histone modifications play important roles in gene regulation, heredity, imprinting, and many human diseases. The histone code is complex and consists of more than 100 marks. Therefore, biologists need computational tools to characterize general signatures representing the distributions of tens of chromatin marks around thousands of regions. RESULTS To this end, we developed a software tool, HebbPlot, which utilizes a Hebbian neural network in learning a general chromatin signature from regions with a common function. Hebbian networks can learn the associations between tens of marks and thousands of regions. HebbPlot presents a signature as a digital image, which can be easily interpreted. Moreover, signatures produced by HebbPlot can be compared quantitatively. We validated HebbPlot in six case studies. The results of these case studies are novel or validating results already reported in the literature, indicating the accuracy of HebbPlot. Our results indicate that promoters have a directional chromatin signature; several marks tend to stretch downstream or upstream. H3K4me3 and H3K79me2 have clear directional distributions around active promoters. In addition, the signatures of high- and low-CpG promoters are different; H3K4me3, H3K9ac, and H3K27ac are the most different marks. When we studied the signatures of enhancers active in eight tissues, we observed that these signatures are similar, but not identical. Further, we identified some histone modifications - H3K36me3, H3K79me1, H3K79me2, and H4K8ac - that are associated with coding regions of active genes. Other marks - H4K12ac, H3K14ac, H3K27me3, and H2AK5ac - were found to be weakly associated with coding regions of inactive genes. CONCLUSIONS This study resulted in a novel software tool, HebbPlot, for learning and visualizing the chromatin signature of a genetic element. Using HebbPlot, we produced a visual catalog of the signatures of multiple genetic elements in 57 cell types available through the Roadmap Epigenomics Project. Furthermore, we made a progress toward a functional catalog consisting of 22 histone marks. In sum, HebbPlot is applicable to a wide array of studies, facilitating the deciphering of the histone code.
Collapse
Affiliation(s)
- Hani Z. Girgis
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| | - Alfredo Velasco
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| | - Zachary E. Reyes
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| |
Collapse
|
7
|
Detection of cooperatively bound transcription factor pairs using ChIP-seq peak intensities and expectation maximization. PLoS One 2018; 13:e0199771. [PMID: 30016330 PMCID: PMC6049898 DOI: 10.1371/journal.pone.0199771] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 06/13/2018] [Indexed: 11/19/2022] Open
Abstract
Transcription factors (TFs) often work cooperatively, where the binding of one TF to DNA enhances the binding affinity of a second TF to a nearby location. Such cooperative binding is important for activating gene expression from promoters and enhancers in both prokaryotic and eukaryotic cells. Existing methods to detect cooperative binding of a TF pair rely on analyzing the sequence that is bound. We propose a method that uses, instead, only ChIP-seq peak intensities and an expectation maximization (CPI-EM) algorithm. We validate our method using ChIP-seq data from cells where one of a pair of TFs under consideration has been genetically knocked out. Our algorithm relies on our observation that cooperative TF-TF binding is correlated with weak binding of one of the TFs, which we demonstrate in a variety of cell types, including E. coli, S. cerevisiae and M. musculus cells. We show that this method performs significantly better than a predictor based only on the ChIP-seq peak distance of the TFs under consideration. This suggests that peak intensities contain information that can help detect the cooperative binding of a TF pair. CPI-EM also outperforms an existing sequence-based algorithm in detecting cooperative binding. The CPI-EM algorithm is available at https://github.com/vishakad/cpi-em.
Collapse
|
8
|
Meckbach C, Wingender E, Gültas M. Removing Background Co-occurrences of Transcription Factor Binding Sites Greatly Improves the Prediction of Specific Transcription Factor Cooperations. Front Genet 2018; 9:189. [PMID: 29896218 PMCID: PMC5986914 DOI: 10.3389/fgene.2018.00189] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Accepted: 05/08/2018] [Indexed: 12/17/2022] Open
Abstract
Today, it is well-known that in eukaryotic cells the complex interplay of transcription factors (TFs) bound to the DNA of promoters and enhancers is the basis for precise and specific control of transcription. Computational methods have been developed for the identification of potentially cooperating TFs through the co-occurrence of their binding sites (TFBSs). One challenge of these methods is the differentiation of TFBS pairs that are specific for a given sequence set from those that are ubiquitously appearing, rendering the results highly dependent on the choice of a proper background set. Here, we present an extension of our previous PC-TraFF approach that estimates the background co-occurrence of any TF pair by preserving the (oligo-) nucleotide composition and, thus, the core of TFBSs in the sequences of interest. Applying our approach to a simulated data set with implanted TFBS pairs, we could successfully identify them as sequence-set specific under a variety of conditions. When we analyzed the gene expression data sets of five breast cancer associated subtypes, the number of overlapping pairs could be dramatically reduced in comparison to our previous approach. As a result, we could identify potentially cooperating transcriptional regulators that are characteristic for each of the five breast cancer subtypes. This indicates that our approach is able to discriminate specific potential TF cooperations against ubiquitously occurring combinations. The results obtained with our method may help to understand the genetic programs governing specific biological processes such as the development of different tumor types.
Collapse
Affiliation(s)
- Cornelia Meckbach
- Institute of Bioinformatics, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| | - Edgar Wingender
- Institute of Bioinformatics, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany
| | - Mehmet Gültas
- Institute of Bioinformatics, University Medical Center Göttingen, Georg-August-University Göttingen, Göttingen, Germany.,Department of Breeding Informatics, Georg-August University Göttingen, Göttingen, Germany.,Center for Integrated Breeding Research (CiBreed), Georg-August University Göttingen, Göttingen, Germany
| |
Collapse
|
9
|
Kho SJ, Manickam S, Malek S, Mosleh M, Dhillon SK. Automated plant identification using artificial neural network and support vector machine. FRONTIERS IN LIFE SCIENCE 2018. [DOI: 10.1080/21553769.2017.1412361] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Soon Jye Kho
- Faculty of Science, Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, University of Malaya, Kuala Lumpur, Malaysia
| | - Sugumaran Manickam
- Faculty of Science, Rimba Ilmu Botanic Garden, Institute of Biological Sciences, University of Malaya, Kuala Lumpur, Malaysia
| | - Sorayya Malek
- Faculty of Science, Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, University of Malaya, Kuala Lumpur, Malaysia
| | - Mogeeb Mosleh
- Faculty of Engineering & Information Technology, Software Engineering Department, Taiz University, Taiz, Yemen
| | - Sarinder Kaur Dhillon
- Faculty of Science, Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
10
|
de Leeuw CN, Korecki AJ, Berry GE, Hickmott JW, Lam SL, Lengyell TC, Bonaguro RJ, Borretta LJ, Chopra V, Chou AY, D'Souza CA, Kaspieva O, Laprise S, McInerny SC, Portales-Casamar E, Swanson-Newman MI, Wong K, Yang GS, Zhou M, Jones SJM, Holt RA, Asokan A, Goldowitz D, Wasserman WW, Simpson EM. rAAV-compatible MiniPromoters for restricted expression in the brain and eye. Mol Brain 2016; 9:52. [PMID: 27164903 PMCID: PMC4862195 DOI: 10.1186/s13041-016-0232-4] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 04/30/2016] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Small promoters that recapitulate endogenous gene expression patterns are important for basic, preclinical, and now clinical research. Recently, there has been a promising revival of gene therapy for diseases with unmet therapeutic needs. To date, most gene therapies have used viral-based ubiquitous promoters-however, promoters that restrict expression to target cells will minimize off-target side effects, broaden the palette of deliverable therapeutics, and thereby improve safety and efficacy. Here, we take steps towards filling the need for such promoters by developing a high-throughput pipeline that goes from genome-based bioinformatic design to rapid testing in vivo. METHODS For much of this work, therapeutically interesting Pleiades MiniPromoters (MiniPs; ~4 kb human DNA regulatory elements), previously tested in knock-in mice, were "cut down" to ~2.5 kb and tested in recombinant adeno-associated virus (rAAV), the virus of choice for gene therapy of the central nervous system. To evaluate our methods, we generated 29 experimental rAAV2/9 viruses carrying 19 different MiniPs, which were injected intravenously into neonatal mice to allow broad unbiased distribution, and characterized in neural tissues by X-gal immunohistochemistry for icre, or immunofluorescent detection of GFP. RESULTS The data showed that 16 of the 19 (84 %) MiniPs recapitulated the expression pattern of their design source. This included expression of: Ple67 in brain raphe nuclei; Ple155 in Purkinje cells of the cerebellum, and retinal bipolar ON cells; Ple261 in endothelial cells of brain blood vessels; and Ple264 in retinal Müller glia. CONCLUSIONS Overall, the methodology and MiniPs presented here represent important advances for basic and preclinical research, and may enable a paradigm shift in gene therapy.
Collapse
Affiliation(s)
- Charles N de Leeuw
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Andrea J Korecki
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Garrett E Berry
- Gene Therapy Centre, University of North Carolina, Chapel Hill, NC, 27599, U.S.A
| | - Jack W Hickmott
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Siu Ling Lam
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Tess C Lengyell
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Russell J Bonaguro
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Lisa J Borretta
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Vikramjit Chopra
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Alice Y Chou
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Cletus A D'Souza
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Olga Kaspieva
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Stéphanie Laprise
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Simone C McInerny
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Elodie Portales-Casamar
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Magdalena I Swanson-Newman
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Kaelan Wong
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - George S Yang
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Michelle Zhou
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada
| | - Steven J M Jones
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada.,Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada.,Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Robert A Holt
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada.,Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada.,Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada.,Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 2A1, Canada
| | - Aravind Asokan
- Gene Therapy Centre, University of North Carolina, Chapel Hill, NC, 27599, U.S.A
| | - Daniel Goldowitz
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - Elizabeth M Simpson
- Centre for Molecular Medicine and Therapeutics at the Child & Family Research Institute, University of British Columbia, 950 W 28 Ave, Vancouver, BC, V5Z 4H4, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada. .,Department of Psychiatry, University of British Columbia, Vancouver, BC, V6T 2A1, Canada.
| |
Collapse
|
11
|
Abstract
Transcriptional control of gene expression requires interactions between the cis-regulatory elements (CREs) controlling gene promoters. We developed a sensitive computational method to identify CRE combinations with conserved spacing that does not require genome alignments. When applied to seven sensu stricto and sensu lato Saccharomyces species, 80% of the predicted interactions displayed some evidence of combinatorial transcriptional behavior in several existing datasets including: (1) chromatin immunoprecipitation data for colocalization of transcription factors, (2) gene expression data for coexpression of predicted regulatory targets, and (3) gene ontology databases for common pathway membership of predicted regulatory targets. We tested several predicted CRE interactions with chromatin immunoprecipitation experiments in a wild-type strain and strains in which a predicted cofactor was deleted. Our experiments confirmed that transcription factor (TF) occupancy at the promoters of the CRE combination target genes depends on the predicted cofactor while occupancy of other promoters is independent of the predicted cofactor. Our method has the additional advantage of identifying regulatory differences between species. By analyzing the S. cerevisiae and S. bayanus genomes, we identified differences in combinatorial cis-regulation between the species and showed that the predicted changes in gene regulation explain several of the species-specific differences seen in gene expression datasets. In some instances, the same CRE combinations appear to regulate genes involved in distinct biological processes in the two different species. The results of this research demonstrate that (1) combinatorial cis-regulation can be inferred by multi-genome analysis and (2) combinatorial cis-regulation can explain differences in gene expression between species.
Collapse
|
12
|
Meckbach C, Tacke R, Hua X, Waack S, Wingender E, Gültas M. PC-TraFF: identification of potentially collaborating transcription factors using pointwise mutual information. BMC Bioinformatics 2015; 16:400. [PMID: 26627005 PMCID: PMC4667426 DOI: 10.1186/s12859-015-0827-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 11/17/2015] [Indexed: 01/06/2023] Open
Abstract
Background Transcription factors (TFs) are important regulatory proteins that govern transcriptional regulation. Today, it is known that in higher organisms different TFs have to cooperate rather than acting individually in order to control complex genetic programs. The identification of these interactions is an important challenge for understanding the molecular mechanisms of regulating biological processes. In this study, we present a new method based on pointwise mutual information, PC-TraFF, which considers the genome as a document, the sequences as sentences, and TF binding sites (TFBSs) as words to identify interacting TFs in a set of sequences. Results To demonstrate the effectiveness of PC-TraFF, we performed a genome-wide analysis and a breast cancer-associated sequence set analysis for protein coding and miRNA genes. Our results show that in any of these sequence sets, PC-TraFF is able to identify important interacting TF pairs, for most of which we found support by previously published experimental results. Further, we made a pairwise comparison between PC-TraFF and three conventional methods. The outcome of this comparison study strongly suggests that all these methods focus on different important aspects of interaction between TFs and thus the pairwise overlap between any of them is only marginal. Conclusions In this study, adopting the idea from the field of linguistics in the field of bioinformatics, we develop a new information theoretic method, PC-TraFF, for the identification of potentially collaborating transcription factors based on the idiosyncrasy of their binding site distributions on the genome. The results of our study show that PC-TraFF can succesfully identify known interacting TF pairs and thus its currently biologically uncorfirmed predictions could provide new hypotheses for further experimental validation. Additionally, the comparison of the results of PC-TraFF with the results of previous methods demonstrates that different methods with their specific scopes can perfectly supplement each other. Overall, our analyses indicate that PC-TraFF is a time-efficient method where its algorithm has a tractable computational time and memory consumption. The PC-TraFF server is freely accessible at http://pctraff.bioinf.med.uni-goettingen.de/ Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0827-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cornelia Meckbach
- Institute of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, Göttingen, 37077, Germany.
| | - Rebecca Tacke
- Institute of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, Göttingen, 37077, Germany.
| | - Xu Hua
- Institute of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, Göttingen, 37077, Germany.
| | - Stephan Waack
- Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany.
| | - Edgar Wingender
- Institute of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, Göttingen, 37077, Germany.
| | - Mehmet Gültas
- Institute of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, Göttingen, 37077, Germany.
| |
Collapse
|
13
|
High resolution mapping of enhancer-promoter interactions. PLoS One 2015; 10:e0122420. [PMID: 25970635 PMCID: PMC4430501 DOI: 10.1371/journal.pone.0122420] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 02/20/2015] [Indexed: 01/19/2023] Open
Abstract
RNA Polymerase II ChIA-PET data has revealed enhancers that are active in a profiled cell type and the genes that the enhancers regulate through chromatin interactions. The most commonly used computational method for analyzing ChIA-PET data, the ChIA-PET Tool, discovers interaction anchors at a spatial resolution that is insufficient to accurately identify individual enhancers. We introduce Germ, a computational method that estimates the likelihood that any two narrowly defined genomic locations are jointly occupied by RNA Polymerase II. Germ takes a blind deconvolution approach to simultaneously estimate the likelihood of RNA Polymerase II occupation as well as a model of the arrangement of read alignments relative to locations occupied by RNA Polymerase II. Both types of information are utilized to estimate the likelihood that RNA Polymerase II jointly occupies any two genomic locations. We apply Germ to RNA Polymerase II ChIA-PET data from embryonic stem cells to identify the genomic locations that are jointly occupied along with transcription start sites. We show that these genomic locations align more closely with features of active enhancers measured by ChIP-Seq than the locations identified using the ChIA-PET Tool. We also apply Germ to RNA Polymerase II ChIA-PET data from motor neuron progenitors. Based on the Germ results, we observe that a combination of cell type specific and cell type independent regulatory interactions are utilized by cells to regulate gene expression.
Collapse
|
14
|
Coltelli P, Barsanti L, Evangelista V, Frassanito AM, Passarelli V, Gualtieri P. Automatic and real time recognition of microalgae by means of pigment signature and shape. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2013; 15:1397-1410. [PMID: 23712130 DOI: 10.1039/c3em00160a] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Microalgae are unicellular photoautotrophic organisms that grow in any habitat such as fresh and salt water bodies, hot springs, ice, air, and in or on other organisms and substrates. Massive growth of microalgae may produce harmful effects on the marine and freshwater ecological environment and fishery resources. Therefore, rapid and accurate recognition and classification of microalgae is one of the most important issues in water resource management. In this paper, a new methodology for automatic and real time identification of microalgae by means of microscopy image analysis is presented. This methodology is based on segmentation, shape features extraction, and characteristic colour (i.e. pigment signature) determination. A classifier algorithm based on the minimum distance criterion was used for microalgae grouping according to the measured features. 96.6% accuracy from a set of 3423 images of 24 different microalgae representing the major algal phyla was achieved by this methodology.
Collapse
Affiliation(s)
- Primo Coltelli
- Istituto di Scienza e Tecnologia Informazione, CNR, Via Moruzzi 1, 56124 Pisa, Italy
| | | | | | | | | | | |
Collapse
|
15
|
Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol 2013; 9:e1002968. [PMID: 23526891 PMCID: PMC3597546 DOI: 10.1371/journal.pcbi.1002968] [Citation(s) in RCA: 157] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 01/20/2013] [Indexed: 01/08/2023] Open
Abstract
Transcriptional enhancers play critical roles in regulation of gene expression, but their identification in the eukaryotic genome has been challenging. Recently, it was shown that enhancers in the mammalian genome are associated with characteristic histone modification patterns, which have been increasingly exploited for enhancer identification. However, only a limited number of cell types or chromatin marks have previously been investigated for this purpose, leaving the question unanswered whether there exists an optimal set of histone modifications for enhancer prediction in different cell types. Here, we address this issue by exploring genome-wide profiles of 24 histone modifications in two distinct human cell types, embryonic stem cells and lung fibroblasts. We developed a Random-Forest based algorithm, RFECS (Random Forest based Enhancer identification from Chromatin States) to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. We show that RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction. Enhancers are regions in the genome that can activate the expression of a gene irrespective of their location with respect to the gene. Identifying these elements is critical in understanding regulatory differences between different cell-types. Since enhancers lack characteristic sequence features and can be far away from the gene they regulate, their identification is not trivial. Experimentally determining the genome-wide binding sites of transcriptional co-activator p300 is one way of finding enhancers but it can only identify a subset of enhancers. A few years ago, it was observed that the binding sites of p300 are marked by distinctive, post-translational histone modifications. Several groups have exploited this discovery to predict genome-wide enhancers based on their similarity to the histone modification profiles of p300 binding sites. We here report a novel algorithm for this purpose and show that it has much greater accuracy than existing methods. Another unique feature of our algorithm is the ability to automatically deduce the most informative set of histone modifications required for enhancer prediction. We expect that this method will become increasingly useful with the expanding number of known histone modifications and rapid accumulation of epigenomic datasets for various cell types and species.
Collapse
Affiliation(s)
- Nisha Rajagopal
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
- Bioinformatics and Systems Biology program, University of California at San Diego, La Jolla, California, United States of America
| | - Wei Xie
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
| | - Yan Li
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
| | - Uli Wagner
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California, United States of America
| | - John Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Jason Ernst
- Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California, United States of America
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Bing Ren
- Ludwig Institute for Cancer Research, University of California at San Diego, La Jolla, California, United States of America
- Bioinformatics and Systems Biology program, University of California at San Diego, La Jolla, California, United States of America
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, and Moores Cancer Center, University of California at San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|