1
|
Zhang F, Cao H, Si H, Zang J, Dong J, Xing J, Zhang K. FGCD: a database of fungal gene clusters related to secondary metabolism. Database (Oxford) 2024; 2024:baae011. [PMID: 38502608 PMCID: PMC11022746 DOI: 10.1093/database/baae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/09/2024] [Accepted: 02/16/2024] [Indexed: 03/21/2024]
Abstract
Fungal secondary metabolites are not necessary for growth, but they are important for fungal metabolism and ecology because they provide selective advantages for competition, survival and interactions with the environment. These various metabolites are widely used as medicinal precursors and insecticides. Secondary metabolism genes are commonly arranged in clusters along chromosomes, which allow for the coordinate control of complete pathways. In this study, we created the Fungal Gene Cluster Database to store, retrieve, and visualize secondary metabolite gene cluster information across fungal species. The database was created by merging data from RNA sequencing, Basic Local Alignment Search Tool, genome browser, enrichment analysis and the R Shiny web framework to visualize and query putative gene clusters. This database facilitated the rapid and thorough examination of significant gene clusters across fungal species by detecting, defining and graphically displaying the architecture, organization and expression patterns of secondary metabolite gene clusters. In general, this genomic resource makes use of the tremendous chemical variety of the products of these ecologically and biotechnologically significant gene clusters to our further understanding of fungal secondary metabolism. Database URL: https://www.hebaubioinformatics.cn/FungalGeneCluster/.
Collapse
Affiliation(s)
- Fuyuan Zhang
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- College of Life Science, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| | - Hongzhe Cao
- College of Life Science, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| | - Helong Si
- College of Life Science, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| | - Jinping Zang
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| | - Jingao Dong
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| | - Jihong Xing
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- College of Life Science, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| | - Kang Zhang
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- College of Life Science, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, No. 289 Lingyusi Street, Baoding 071000, China
| |
Collapse
|
2
|
Junaid A, Singh B, Bhatia S. Evolutionary insights into 3D genome organization and epigenetic landscape of Vigna mungo. Life Sci Alliance 2024; 7:e202302074. [PMID: 37923361 PMCID: PMC10624639 DOI: 10.26508/lsa.202302074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 10/17/2023] [Accepted: 10/19/2023] [Indexed: 11/07/2023] Open
Abstract
Eukaryotic genomes show an intricate three-dimensional (3D) organization within the nucleus that regulates multiple biological processes including gene expression. Contrary to animals, understanding of 3D genome organization in plants remains at a nascent stage. Here, we investigate the evolution of 3D chromatin architecture in legumes. By using cutting-edge PacBio, Illumina, and Hi-C contact reads, we report a gap-free, chromosome-scale reference genome assembly of Vigna mungo, an important minor legume cultivated in Southeast Asia. We spatially resolved V. mungo chromosomes into euchromatic, transcriptionally active A compartment and heterochromatic, transcriptionally-dormant B compartment. We report the presence of TAD-like-regions throughout the diagonal of the HiC matrix that resembled transcriptional quiescent centers based on their genomic and epigenomic features. We observed high syntenic breakpoints but also high coverage of syntenic sequences and conserved blocks in boundary regions than in the TAD-like region domains. Our findings present unprecedented evolutionary insights into spatial 3D genome organization and epigenetic patterns and their interaction within the V. mungo genome. This will aid future genomics and epigenomics research and breeding programs of V. mungo.
Collapse
Affiliation(s)
- Alim Junaid
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, India
| | - Baljinder Singh
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, India
| | - Sabhyata Bhatia
- National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, India
| |
Collapse
|
3
|
GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads. Nat Commun 2023; 14:204. [PMID: 36639368 PMCID: PMC9839709 DOI: 10.1038/s41467-022-35670-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 12/16/2022] [Indexed: 01/15/2023] Open
Abstract
High-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we report on GALA (Gap-free long-read Assembly tool), a computational framework for chromosome-based sequencing data separation and de novo assembly implemented through a multi-layer graph that identifies discordances within preliminary assemblies and partitions the data into chromosome-scale scaffolding groups. The subsequent independent assembly of each scaffolding group generates a gap-free assembly likely free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, and even motif analyses to generate gap-free chromosome-scale assemblies. As a proof of principle we de novo assemble the C. elegans genome using combined PacBio and Nanopore sequencing data and a rice cultivar genome using Nanopore sequencing data from publicly available datasets. We also demonstrate the proposed method's applicability with a gap-free assembly of the human genome using PacBio high-fidelity (HiFi) long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.
Collapse
|
4
|
Abstract
The study of chromosome evolution is undergoing a resurgence of interest owing to advances in DNA sequencing technology that facilitate the production of chromosome-scale whole-genome assemblies de novo. This review focuses on the history, methods, discoveries, and current challenges facing the field, with an emphasis on vertebrate genomes. A detailed examination of the literature on the biology of chromosome rearrangements is presented, specifically the relationship between chromosome rearrangements and phenotypic evolution, adaptation, and speciation. A critical review of the methods for identifying, characterizing, and visualizing chromosome rearrangements and computationally reconstructing ancestral karyotypes is presented. We conclude by looking to the future, identifying the enormous technical and scientific challenges presented by the accumulation of hundreds and eventually thousands of chromosome-scale assemblies.
Collapse
Affiliation(s)
- Joana Damas
- The Genome Center, University of California, Davis, California 95616, USA; , ,
| | - Marco Corbo
- The Genome Center, University of California, Davis, California 95616, USA; , ,
| | - Harris A Lewin
- The Genome Center, University of California, Davis, California 95616, USA; , , .,Department of Evolution and Ecology, College of Biological Sciences, University of California, Davis, California 95616, USA
| |
Collapse
|
5
|
Jia B, Li X, Liu W, Lu C, Lu X, Ma L, Li YY, Wei C. GLAPD: Whole Genome Based LAMP Primer Design for a Set of Target Genomes. Front Microbiol 2019; 10:2860. [PMID: 31921040 PMCID: PMC6923652 DOI: 10.3389/fmicb.2019.02860] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 11/26/2019] [Indexed: 11/23/2022] Open
Abstract
Loop-mediated isothermal amplification (LAMP) technology has been applied in a wide range of fields such as detection of foodborne bacteria and clinical pathogens due to its simplicity and efficiency. However, existing LAMP primer designing systems require a conserved gene or a short genome region as input, and they can’t design group-specific primers. With the growing number of whole genomes available, it is possible to design better primers to target a set of genomes with high specificity based on whole genomes. We present here a whole Genome based LAMP primer designer (GLAPD), a new system to design LAMP primer for a set of target genomes using whole genomes. Candidate single primer regions are identified genome wide and then combined into LAMP primer sets. For a given set of target genomes, only primer sets amplifying them and only these genomes will be output. In order to accelerate the primer designing, a GPU version is provided as well. The effectiveness of primers designed by GLAPD has been assessed for a wide range of foodborne bacteria. GLAPD can be accessed at http://cgm.sjtu.edu.cn/GLAPD/ or https://github.com/jiqingxiaoxi/GLAPD.git. A simple online version is also supplied to help users to learn and test GLAPD: http://cgm.sjtu.edu.cn/GLAPD/online/.
Collapse
Affiliation(s)
- Ben Jia
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Xueling Li
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Wei Liu
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Changde Lu
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Xiaoting Lu
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Liangxiao Ma
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yuan-Yuan Li
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Chaochun Wei
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Shanghai Center for Bioinformation Technology, Shanghai, China
| |
Collapse
|
6
|
Arneson A, Ernst J. Systematic discovery of conservation states for single-nucleotide annotation of the human genome. Commun Biol 2019; 2:248. [PMID: 31286065 PMCID: PMC6606595 DOI: 10.1038/s42003-019-0488-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Comparative genomics sequence data is an important source of information for interpreting genomes. Genome-wide annotations based on this data have largely focused on univariate scores or binary elements of evolutionary constraint. Here we present a complementary whole genome annotation approach, ConsHMM, which applies a multivariate hidden Markov model to learn de novo 'conservation states' based on the combinatorial and spatial patterns of which species align to and match a reference genome in a multiple species DNA sequence alignment. We applied ConsHMM to a 100-way vertebrate sequence alignment to annotate the human genome at single nucleotide resolution into 100 conservation states. These states have distinct enrichments for other genomic information including gene annotations, chromatin states, repeat families, and bases prioritized by various variant prioritization scores. Constrained elements have distinct heritability partitioning enrichments depending on their conservation state assignment. ConsHMM conservation states are a resource for analyzing genomes and genetic variants.
Collapse
Affiliation(s)
- Adriana Arneson
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095 USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, CA 90095 USA
- Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095 USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095 USA
| |
Collapse
|
7
|
Gärtner F, Höner zu Siederdissen C, Müller L, Stadler PF. Coordinate systems for supergenomes. Algorithms Mol Biol 2018; 13:15. [PMID: 30258487 PMCID: PMC6151955 DOI: 10.1186/s13015-018-0133-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 09/07/2018] [Indexed: 01/05/2023] Open
Abstract
Background Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large. Results The computational problem then becomes to determine an order and orientations of optimal local alignments that are as co-linear as possible with all the genome sequences. We first review the most prominent approaches to model the problem formally and then proceed to showing that it can be phrased as a particular variant of the Betweenness Problem. It is NP hard in general. As exact solutions are beyond reach for the problem sizes of practical interest, we introduce a collection of heuristic simplifiers to resolve ordering conflicts. Conclusion Benchmarks on real-life data ranging from bacterial to fly genomes demonstrate the feasibility of computing good common coordinate systems. Electronic supplementary material The online version of this article (10.1186/s13015-018-0133-4) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Liu D, Hunt M, Tsai IJ. Inferring synteny between genome assemblies: a systematic evaluation. BMC Bioinformatics 2018; 19:26. [PMID: 29382321 PMCID: PMC5791376 DOI: 10.1186/s12859-018-2026-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 01/15/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis. RESULTS We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. CONCLUSIONS Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.
Collapse
Affiliation(s)
- Dang Liu
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Martin Hunt
- Nuffield Department of Clinical Medicine, Experimental Medicine Division, John Radcliffe Hospital, University of Oxford, Oxford, OX1 1NF UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Isheng J Tsai
- Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
9
|
Lowdon RF, Jang HS, Wang T. Evolution of Epigenetic Regulation in Vertebrate Genomes. Trends Genet 2016; 32:269-283. [PMID: 27080453 PMCID: PMC4842087 DOI: 10.1016/j.tig.2016.03.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Revised: 03/02/2016] [Accepted: 03/03/2016] [Indexed: 12/31/2022]
Abstract
Empirical models of sequence evolution have spurred progress in the field of evolutionary genetics for decades. We are now realizing the importance and complexity of the eukaryotic epigenome. While epigenome analysis has been applied to genomes from single-cell eukaryotes to human, comparative analyses are still relatively few and computational algorithms to quantify epigenome evolution remain scarce. Accordingly, a quantitative model of epigenome evolution remains to be established. We review here the comparative epigenomics literature and synthesize its overarching themes. We also suggest one mechanism, transcription factor binding site (TFBS) turnover, which relates sequence evolution to epigenetic conservation or divergence. Lastly, we propose a framework for how the field can move forward to build a coherent quantitative model of epigenome evolution.
Collapse
Affiliation(s)
- Rebecca F Lowdon
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA.
| | - Hyo Sik Jang
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Ting Wang
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
10
|
Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, Vilella AJ, Searle SMJ, Amode R, Brent S, Spooner W, Kulesha E, Yates A, Flicek P. Ensembl comparative genomics resources. Database (Oxford) 2016; 2016:bav096. [PMID: 26896847 PMCID: PMC4761110 DOI: 10.1093/database/bav096] [Citation(s) in RCA: 191] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 08/10/2015] [Accepted: 09/04/2015] [Indexed: 01/08/2023]
Abstract
Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.
Collapse
Affiliation(s)
- Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Stephen Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Leo Gordon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Albert J. Vilella
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | | | - Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Simon Brent
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - William Spooner
- Eagle Genomics Ltd., Babraham Research Campus, Cambridge, CB22 3AT, UK, and
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Eugene Kulesha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SA
| |
Collapse
|
11
|
Taher L, Narlikar L, Ovcharenko I. Identification and computational analysis of gene regulatory elements. Cold Spring Harb Protoc 2015; 2015:pdb.top083642. [PMID: 25561628 PMCID: PMC5885252 DOI: 10.1101/pdb.top083642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Over the last two decades, advances in experimental and computational technologies have greatly facilitated genomic research. Next-generation sequencing technologies have made de novo sequencing of large genomes affordable, and powerful computational approaches have enabled accurate annotations of genomic DNA sequences. Charting functional regions in genomes must account for not only the coding sequences, but also noncoding RNAs, repetitive elements, chromatin states, epigenetic modifications, and gene regulatory elements. A mix of comparative genomics, high-throughput biological experiments, and machine learning approaches has played a major role in this truly global effort. Here we describe some of these approaches and provide an account of our current understanding of the complex landscape of the human genome. We also present overviews of different publicly available, large-scale experimental data sets and computational tools, which we hope will prove beneficial for researchers working with large and complex genomes.
Collapse
Affiliation(s)
- Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, 18051 Rostock, Germany
| | - Leelavati Narlikar
- Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Pune 411008, India
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
12
|
Shellman ER, Chen Y, Lin X, Burant CF, Schnell S. Metabolic network motifs can provide novel insights into evolution: The evolutionary origin of Eukaryotic organelles as a case study. Comput Biol Chem 2014; 53PB:242-250. [PMID: 25462333 PMCID: PMC4254655 DOI: 10.1016/j.compbiolchem.2014.09.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 09/15/2014] [Accepted: 09/15/2014] [Indexed: 01/28/2023]
Abstract
Phylogenetic trees are typically constructed using genetic and genomic data, and provide robust evolutionary relationships of species from the genomic point of view. We present an application of network motif mining and analysis of metabolic pathways that when used in combination with phylogenetic trees can provide a more complete picture of evolution. By using distributions of three-node motifs as a proxy for metabolic similarity, we analyze the ancestral origin of Eukaryotic organelles from the metabolic point of view to illustrate the application of our motif mining and analysis network approach. Our analysis suggests that the hypothesis of an early proto-Eukaryote could be valid. It also suggests that a δ- or ϵ-Proteobacteria may have been the endosymbiotic partner that gave rise to modern mitochondria. Our evolutionary analysis needs to be extended by building metabolic network reconstructions of species from the phylum Crenarchaeota, which is considered to be a possible archaeal ancestor of the eukaryotic cell. In this paper, we also propose a methodology for constructing phylogenetic trees that incorporates metabolic network signatures to identify regions of genomically-estimated phylogenies that may be spurious. We find that results generated from our approach are consistent with a parallel phylogenetic analysis using the method of feature frequency profiles.
Collapse
Affiliation(s)
- Erin R Shellman
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Yu Chen
- Department of Chemical Engineering, University of Michigan School of Engineering, Ann Arbor, MI, USA
| | - Xiaoxia Lin
- Department of Chemical Engineering, University of Michigan School of Engineering, Ann Arbor, MI, USA
| | - Charles F Burant
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA; Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Santiago Schnell
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA; Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
13
|
Abstract
Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
Collapse
|
14
|
Yokoyama KD, Zhang Y, Ma J. Tracing the evolution of lineage-specific transcription factor binding sites in a birth-death framework. PLoS Comput Biol 2014; 10:e1003771. [PMID: 25144359 PMCID: PMC4140645 DOI: 10.1371/journal.pcbi.1003771] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 06/27/2014] [Indexed: 11/24/2022] Open
Abstract
Changes in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58–79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species. Recent experimental studies showed that the evolution of transcription factor binding sites (TFBS) is highly dynamic, with sites differing a great deal even between closely related mammalian species. Despite the substantial experimental evidence for rapid divergence of regulatory protein-binding events across species, computational methods designed to analyze regulatory elements evolution have focused primarily on phylogenetic footprinting approaches, in which putative functional regulatory elements are identified according to strong sequence conservation. Cross-species comparisons of non-coding sequences are limited in their ability to fully understand the evolution of regulatory sequences, particularly in cases where the elements are selected for novelty or species-specific. We have developed a novel framework to reconstruct the history of lineage-specific TFBS and showed that large amount of TFBS in human were born after human-mouse divergence. These elements also have distinct biological implications as compared to more ancient ones. This method can help understand the roles of lineage-specific TFBS in shaping gene regulation across different species.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Yang Zhang
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jian Ma
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
15
|
Poliakov A, Foong J, Brudno M, Dubchak I. GenomeVISTA--an integrated software package for whole-genome alignment and visualization. ACTA ACUST UNITED AC 2014; 30:2654-5. [PMID: 24860159 DOI: 10.1093/bioinformatics/btu355] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED With the ubiquitous generation of complete genome assemblies for a variety of species, efficient tools for whole-genome alignment along with user-friendly visualization are critically important. Our VISTA family of tools for comparative genomics, based on algorithms for pairwise and multiple alignments of genomic sequences and whole-genome assemblies, has become one of the standard techniques for comparative analysis. Most of the VISTA programs have been implemented as Web-accessible servers and are extensively used by the biomedical community. In this manuscript, we introduce GenomeVISTA: a novel implementation that incorporates most features of the VISTA family--fast and accurate alignment, visualization capabilities, GUI and analytical tools within a stand-alone software package. GenomeVISTA thus provides flexibility and security for users who need to conduct whole-genome comparisons on their own computers. AVAILABILITY AND IMPLEMENTATION Implemented in Perl, C/C++ and Java, the source code is freely available for download at the VISTA Web site: http://genome.lbl.gov/vista/.
Collapse
Affiliation(s)
- Alexandre Poliakov
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| | - Justin Foong
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| | - Michael Brudno
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| | - Inna Dubchak
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| |
Collapse
|
16
|
Abstract
MicroRNAs (miRNAs) have been implicated in virtually every metazoan biological process, exerting a widespread impact on gene expression. MicroRNA repression is conferred by relatively short "seed match" sequences, although the degree of repression varies widely for individual target sites. The factors controlling whether, and to what extent, a target site is repressed are not fully understood. As an alternative to target prediction based on sequence alone, comparative genomics has emerged as an invaluable tool for identifying miRNA targets that are conserved by natural selection, and hence likely effective and important. Here we present a general method for quantifying conservation of miRNA seed match sites, separating it from background conservation, controlling for various biases, and predicting miRNA targets. This method is useful not only for generating predictions but also as a tool for empirically evaluating the importance of various target prediction criteria.
Collapse
|
17
|
Thermally assisted quantum annealing of a 16-qubit problem. Nat Commun 2013; 4:1903. [DOI: 10.1038/ncomms2920] [Citation(s) in RCA: 160] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 04/23/2013] [Indexed: 11/09/2022] Open
|
18
|
Zheng W, Zhao H. Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data. Stat Appl Genet Mol Biol 2013; 12:1-15. [PMID: 23446869 DOI: 10.1515/sagmb-2012-0004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Recent technology advances make it possible to collect whole-genome transcription factor binding (TFB) profiles from multiple species through the ChIP-Seq data. This provides rich information to understand TFB evolution. However, few rigorous statistical models are available to infer TFB evolution from these data. We have developed a phylogenetic tree based method to model the on/off rates of TFB events. There are two unique features of our method compared to existing models. First, we mask nucleotide substitutions and focus on INDEL disruption of TFB events, which are rarer evolution events and more appropriate for divergent species and non-coding regulatory regions. Second, we correct for ascertainment bias in ChIP-Seq data by maximizing likelihood conditional on the observed (incomplete) data. Simulations show that our method works well in model selection and parameter estimation when there are sufficient aligned TFB events. When this method is applied to a ChIP-Seq data set with five vertebrates, we find that the instantaneous transition rates to INDELs are higher in TFB regions than in homologous non-binding regions. This is driven by an excess of alignment columns showing binding in one species but gaps in all other species. When we compare the inferred transition rates between the conserved and non-conserved regions, as expected, the conserved regions are estimated to have lower transition rates. The R package TFBphylo that implements the described model can be downloaded from http://bioinformatics.med.yale.edu/.
Collapse
Affiliation(s)
- Wei Zheng
- Yale University – Keck Biostatistics Resources, New Haven, CT 06511, USA
| | | |
Collapse
|
19
|
Bakış Y, Otu HH, Taşçı N, Meydan C, Bilgin N, Yüzbaşıoğlu S, Sezerman OU. Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. using the relative complexity measure. BMC Bioinformatics 2013; 14:20. [PMID: 23323678 PMCID: PMC3564700 DOI: 10.1186/1471-2105-14-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 12/27/2012] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Most phylogeny analysis methods based on molecular sequences use multiple alignment where the quality of the alignment, which is dependent on the alignment parameters, determines the accuracy of the resulting trees. Different parameter combinations chosen for the multiple alignment may result in different phylogenies. A new non-alignment based approach, Relative Complexity Measure (RCM), has been introduced to tackle this problem and proven to work in fungi and mitochondrial DNA. RESULT In this work, we present an application of the RCM method to reconstruct robust phylogenetic trees using sequence data for genus Galanthus obtained from different regions in Turkey. Phylogenies have been analyzed using nuclear and chloroplast DNA sequences. Results showed that, the tree obtained from nuclear ribosomal RNA gene sequences was more robust, while the tree obtained from the chloroplast DNA showed a higher degree of variation. CONCLUSIONS Phylogenies generated by Relative Complexity Measure were found to be robust and results of RCM were more reliable than the compared techniques. Particularly, to overcome MSA-based problems, RCM seems to be a reasonable way and a good alternative to MSA-based phylogenetic analysis. We believe our method will become a mainstream phylogeny construction method especially for the highly variable sequence families where the accuracy of the MSA heavily depends on the alignment parameters.
Collapse
Affiliation(s)
- Yasin Bakış
- Department of Biology, Abant İzzet Baysal University, Bolu, 14280, Turkey
| | - Hasan H Otu
- Department of Medicine, BIDMC Genomics Center, Harvard Medical School, Boston, MA, 02115, USA
- İstanbul Bilgi University, Department of Genetics and Bioengineering, Eyüp, İstanbul, 34060, Turkey
| | - Nivart Taşçı
- Department of Molecular Biology and Genetics, Boğaziçi University, Bebek, İstanbul, 34342, Turkey
| | - Cem Meydan
- Biological Sciences and Bioengineering, Sabancı University, Tuzla, İstanbul, 34956, Turkey
| | - Neş’e Bilgin
- Department of Molecular Biology and Genetics, Boğaziçi University, Bebek, İstanbul, 34342, Turkey
| | - Sırrı Yüzbaşıoğlu
- Department of Botany, İstanbul University, Süleymaniye, İstanbul, 34460, Turkey
| | - O Uğur Sezerman
- Biological Sciences and Bioengineering, Sabancı University, Tuzla, İstanbul, 34956, Turkey
| |
Collapse
|
20
|
Young JM, Luche RM, Trask BJ. Rigorous and thorough bioinformatic analyses of olfactory receptor promoters confirm enrichment of O/E and homeodomain binding sites but reveal no new common motifs. BMC Genomics 2011; 12:561. [PMID: 22085861 PMCID: PMC3247239 DOI: 10.1186/1471-2164-12-561] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 11/15/2011] [Indexed: 12/02/2022] Open
Abstract
Background Mammalian olfactory receptors (ORs) are subject to a remarkable but poorly understood regime of transcriptional regulation, whereby individual olfactory neurons each express only one allele of a single member of the large OR gene family. Results We performed a rigorous search for enriched sequence motifs in the largest dataset of OR promoter regions analyzed to date. We combined measures of cross-species conservation with databases of known transcription factor binding sites and ab initio motif-finding algorithms. We found strong enrichment of binding sites for the O/E family of transcription factors and for homeodomain factors, both already known to be involved in the transcriptional control of ORs, but did not identify any novel enriched sequences. We also found that TATA-boxes are present in at least a subset of OR promoters. Conclusions Our rigorous approach provides a template for the analysis of the regulation of large gene families and demonstrates some of the difficulties and pitfalls of such analyses. Although currently available bioinformatics methods cannot detect all transcriptional regulatory elements, our thorough analysis of OR promoters shows that in the case of this gene family, experimental approaches have probably already identified all the binding factors common to large fractions of OR promoters.
Collapse
Affiliation(s)
- Janet M Young
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
| | | | | |
Collapse
|
21
|
Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons. Genome Res 2011; 22:35-50. [PMID: 21974994 DOI: 10.1101/gr.119834.110] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Exon-intron architecture is one of the major features directing the splicing machinery to the short exons that are located within long flanking introns. However, the evolutionary dynamics of exon-intron architecture and its impact on splicing is largely unknown. Using a comparative genomic approach, we analyzed 17 vertebrate genomes and reconstructed the ancestral motifs of both 3' and 5' splice sites, as also the ancestral length of exons and introns. Our analyses suggest that vertebrate introns increased in length from the shortest ancestral introns to the longest primate introns. An evolutionary analysis of splice sites revealed that weak splice sites act as a restrictive force keeping introns short. In contrast, strong splice sites allow recognition of exons flanked by long introns. Reconstruction of the ancestral state suggests these phenomena were not prevalent in the vertebrate ancestor, but appeared during vertebrate evolution. By calculating evolutionary rate shifts in exons, we identified cis-acting regulatory sequences that became fixed during the transition from early vertebrates to mammals. Experimental validations performed on a selection of these hexamers confirmed their regulatory function. We additionally revealed many features of exons that can discriminate alternative from constitutive exons. These features were integrated into a machine-learning approach to predict whether an exon is alternative. Our algorithm obtains very high predictive power (AUC of 0.91), and using these predictions we have identified and successfully validated novel alternatively spliced exons. Overall, we provide novel insights regarding the evolutionary constraints acting upon exons and their recognition by the splicing machinery.
Collapse
|
22
|
Benson CC, Zhou Q, Long X, Miano JM. Identifying functional single nucleotide polymorphisms in the human CArGome. Physiol Genomics 2011; 43:1038-48. [PMID: 21771879 DOI: 10.1152/physiolgenomics.00098.2011] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Regulatory SNPs (rSNPs) reside primarily within the nonprotein coding genome and are thought to disturb normal patterns of gene expression by altering DNA binding of transcription factors. Nevertheless, despite the explosive rise in SNP association studies, there is little information as to the function of rSNPs in human disease. Serum response factor (SRF) is a widely expressed DNA-binding transcription factor that has variable affinity to at least 1,216 permutations of a 10 bp transcription factor binding site (TFBS) known as the CArG box. We developed a robust in silico bioinformatics screening method to evaluate sequences around RefSeq genes for conserved CArG boxes. Utilizing a predetermined phastCons threshold score, we identified 8,252 strand-specific CArGs within an 8 kb window around the transcription start site of 5,213 genes, including all previously defined SRF target genes. We then interrogated this CArG dataset for the presence of previously annotated common polymorphisms. We found a total of 118 unique CArG boxes harboring a SNP within the 10 bp CArG sequence and 1,130 CArG boxes with SNPs located just outside the CArG element. Gel shift and luciferase reporter assays validated SRF binding and functional activity of several new CArG boxes. Importantly, SNPs within or just outside the CArG box often resulted in altered SRF binding and activity. Collectively, these findings demonstrate a powerful approach to computationally define rSNPs in the human CArGome and provide a foundation for similar analyses of other TFBS. Such information may find utility in genetic association studies of human disease where little insight is known regarding the functionality of rSNPs.
Collapse
Affiliation(s)
- Craig C Benson
- University of Rochester Medical Center, Rochester, NY, USA
| | | | | | | |
Collapse
|
23
|
Chen CH, Liao BY, Chen FC. Exploring the selective constraint on the sizes of insertions and deletions in 5' untranslated regions in mammals. BMC Evol Biol 2011; 11:192. [PMID: 21726469 PMCID: PMC3146882 DOI: 10.1186/1471-2148-11-192] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Accepted: 07/05/2011] [Indexed: 12/30/2022] Open
Abstract
Background Small insertions and deletions ("indels" with size ≦ 100 bp) whose lengths are not multiples of three (non-3n) are strongly constrained and depleted in protein-coding sequences. Such a constraint has never been reported in noncoding genomic regions. In 5'untranslated regions (5'UTRs) in mammalian genomes, upstream start codons (uAUGs) and upstream open reading frames (uORFs) can regulate protein translation. The presence of non-3n indels in uORFs can potentially disrupt the functions of these regulatory elements. We thus hypothesize that natural selection disfavors non-3n indels in 5'UTRs when these regulatory elements are present. Results We design the Indel Selection Index to measure the selective constraint on non-3n indels in 5'UTRs. The index controls for the genomic compositions of the analyzed 5'UTRs and measures the probability of non-3n indel depletion downstream of uAUGs. By comparing the experimentally supported transcripts of human-mouse orthologous genes, we demonstrate that non-3n indels downstream of two types of uAUGs (alternative translation initiation sites and the uAUGs of coding sequence-overlapping uORFs) are underrepresented. The results hold well regardless of differences in alignment tool, gene structures between human and mouse, or the criteria in selecting alternatively spliced isoforms used for the analysis. Conclusions To our knowledge, this is the first study to demonstrate selective constraints on non-3n indels in 5'UTRs. Such constraints may be associated with the regulatory functions of uAUGs/uORFs in translational regulation or the generation of protein isoforms. Our study thus brings a new perspective to the evolution of 5'UTRs in mammals.
Collapse
Affiliation(s)
- Chun-Hsi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, 350 Taiwan
| | | | | |
Collapse
|
24
|
Abstract
Many interesting but practically intractable problems can be reduced to that of finding the ground state of a system of interacting spins; however, finding such a ground state remains computationally difficult. It is believed that the ground state of some naturally occurring spin systems can be effectively attained through a process called quantum annealing. If it could be harnessed, quantum annealing might improve on known methods for solving certain types of problem. However, physical investigation of quantum annealing has been largely confined to microscopic spins in condensed-matter systems. Here we use quantum annealing to find the ground state of an artificial Ising spin system comprising an array of eight superconducting flux quantum bits with programmable spin-spin couplings. We observe a clear signature of quantum annealing, distinguishable from classical thermal annealing through the temperature dependence of the time at which the system dynamics freezes. Our implementation can be configured in situ to realize a wide variety of different spin networks, each of which can be monitored as it moves towards a low-energy configuration. This programmable artificial spin network bridges the gap between the theoretical study of ideal isolated spin networks and the experimental investigation of bulk magnetic samples. Moreover, with an increased number of spins, such a system may provide a practical physical means to implement a quantum algorithm, possibly allowing more-effective approaches to solving certain classes of hard combinatorial optimization problems.
Collapse
|
25
|
Horvath JE, Sheedy CB, Merrett SL, Diallo AB, Swofford DL, NISC Comparative Sequencing Program, Green ED, Willard HF. Comparative analysis of the primate X-inactivation center region and reconstruction of the ancestral primate XIST locus. Genome Res 2011; 21:850-62. [PMID: 21518738 DOI: 10.1101/gr.111849.110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years.
Collapse
Affiliation(s)
- Julie E Horvath
- Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708, USA.
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Abstract
Motivation: The relative ease and low cost of current generation sequencing technologies has led to a dramatic increase in the number of sequenced genomes for species across the tree of life. This increasing volume of data requires tools that can quickly compare multiple whole-genome sequences, millions of base pairs in length, to aid in the study of populations, pan-genomes, and genome evolution. Results: We present a new multiple alignment tool for whole genomes named Mugsy. Mugsy is computationally efficient and can align 31 Streptococcus pneumoniae genomes in less than 2 hours producing alignments that compare favorably to other tools. Mugsy is also the fastest program evaluated for the multiple alignment of assembled human chromosome sequences from four individuals. Mugsy does not require a reference sequence, can align mixtures of assembled draft and completed genome data, and is robust in identifying a rich complement of genetic variation including duplications, rearrangements, and large-scale gain and loss of sequence. Availability: Mugsy is free, open-source software available from http://mugsy.sf.net. Contact:angiuoli@cs.umd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Samuel V Angiuoli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA.
| | | |
Collapse
|