1
|
Thompson JD, Ripp R, Mayer C, Poch O, Michel CJ. Potential role of the X circular code in the regulation of gene expression. Biosystems 2021; 203:104368. [PMID: 33567309 DOI: 10.1016/j.biosystems.2021.104368] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/06/2023]
Abstract
The X circular code is a set of 20 trinucleotides (codons) that has been identified in the protein-coding genes of most organisms (bacteria, archaea, eukaryotes, plasmids, viruses). It has been shown previously that the X circular code has the important mathematical property of being an error-correcting code. Thus, motifs of the X circular code, i.e. a series of codons belonging to X and called X motifs, allow identification and maintenance of the reading frame in genes. X motifs are significantly enriched in protein-coding genes, but have also been identified in many transfer RNA (tRNA) genes and in important functional regions of the ribosomal RNA (rRNA), notably in the peptidyl transferase center and the decoding center. Here, we investigate the potential role of X motifs as functional elements of protein-coding genes. First, we identify the codons of the X circular code which are frequent or rare in each domain of life (archaea, bacteria, eukaryota) and show that, for the amino acids with the highest codon bias, the preferred codon is often an X codon. We also observe a correlation between the 20 X codons and the optimal codons/dicodons that have been shown to influence translation efficiency. Then, we examined recently published experimental results concerning gene expression levels in diverse organisms. The approach used is the analysis of X motifs according to their density ds(X), i.e. the number of X motifs per kilobase in a gene sequence s. Surprisingly, this simple parameter identifies several unexpected relations between the X circular code and gene expression. For example, the X motifs are significantly enriched in the minimal gene set belonging to the three domains of life, and in codon-optimized genes. Furthermore, the density of X motifs generally correlates with experimental measures of translation efficiency and mRNA stability. Taken together, these results lead us to propose that the X motifs may represent a genetic signal contributing to the maintenance of the correct reading frame and the optimization and regulation of gene expression.
Collapse
Affiliation(s)
- Julie D Thompson
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Raymond Ripp
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Claudine Mayer
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France; Unité de Microbiologie Structurale, Institut Pasteur, CNRS, 75724, Paris Cedex 15, France; Université Paris Diderot, Sorbonne Paris Cité, 75724, Paris Cedex 15, France.
| | - Olivier Poch
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| | - Christian J Michel
- Department of Computer Science, ICube, CNRS, University of Strasbourg, Strasbourg, France.
| |
Collapse
|
2
|
Yang X, Yan J, Zhang Z, Lin T, Xin T, Wang B, Wang S, Zhao J, Zhang Z, Lucas WJ, Li G, Huang S. Regulation of plant architecture by a new histone acetyltransferase targeting gene bodies. NATURE PLANTS 2020; 6:809-822. [PMID: 32665652 DOI: 10.1038/s41477-020-0715-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 05/29/2020] [Indexed: 05/10/2023]
Abstract
Axillary meristem development determines both plant architecture and crop yield; this critical process is regulated by the PROLIFERATING CELL FACTORS (TCP) family of transcription factors. Although TCP proteins bind primarily to promoter regions, some also target gene bodies for expression activation. However, the underlying regulatory mechanism remains unknown. Here we show that TEN, a TCP from cucumber (Cucumis sativus L.), controls the identity and mobility of tendrils. Through its C terminus, TEN binds at intragenic enhancers of target genes; its N-terminal domain functions as a non-canonical histone acetyltransferase (HAT) to preferentially act on lysine 56 and 122 of the histone H3 globular domain. This HAT activity is responsible for chromatin loosening and host-gene activation. The N termini of all tested CYCLOIDEA and TEOSINTE BRANCHED 1-like TCP proteins contain an intrinsically disordered region; despite their sequence divergence, they have conserved HAT activity. This study identifies a non-canonical class of HATs and provides a mechanism by which modification at the H3 globular domain is integrated with the transcription process.
Collapse
Affiliation(s)
- Xueyong Yang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jianbin Yan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhen Zhang
- College of Horticulture, Northwest A&F University, Yangling, China
- State Key Laboratory of Crop Stress Adaptation and Improvement, School of Life Sciences, Henan University, Kaifeng, China
| | - Tao Lin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Tongxu Xin
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bowen Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Shenhao Wang
- College of Horticulture, Northwest A&F University, Yangling, China
| | - Jicheng Zhao
- University of Chinese Academy of Sciences, National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Zhonghua Zhang
- Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China
| | - William J Lucas
- Department of Plant Biology, College of Biological Sciences, University of California, Davis, CA, USA
| | - Guohong Li
- University of Chinese Academy of Sciences, National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
3
|
Faber MS, Wrenbeck EE, Azouz LR, Steiner PJ, Whitehead TA. Impact of In Vivo Protein Folding Probability on Local Fitness Landscapes. Mol Biol Evol 2020; 36:2764-2777. [PMID: 31400199 DOI: 10.1093/molbev/msz184] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
It is incompletely understood how biophysical properties like protein stability impact molecular evolution and epistasis. Epistasis is defined as specific when a mutation exclusively influences the phenotypic effect of another mutation, often at physically interacting residues. In contrast, nonspecific epistasis results when a mutation is influenced by a large number of nonlocal mutations. As most mutations are pleiotropic, the in vivo folding probability-governed by basal protein stability-is thought to determine activity-enhancing mutational tolerance, implying that nonspecific epistasis is dominant. However, evidence exists for both specific and nonspecific epistasis as the prevalent factor, with limited comprehensive data sets to support either claim. Here, we use deep mutational scanning to probe how in vivo enzyme folding probability impacts local fitness landscapes. We computationally designed two different variants of the amidase AmiE with statistically indistinguishable catalytic efficiencies but lower probabilities of folding in vivo compared with wild-type. Local fitness landscapes show slight alterations among variants, with essentially the same global distribution of fitness effects. However, specific epistasis was predominant for the subset of mutations exhibiting positive sign epistasis. These mutations mapped to spatially distinct locations on AmiE near the initial mutation or proximal to the active site. Intriguingly, the majority of specific epistatic mutations were codon dependent, with different synonymous codons resulting in fitness sign reversals. Together, these results offer a nuanced view of how protein folding probability impacts local fitness landscapes and suggest that transcriptional-translational effects are as important as stability in determining evolutionary outcomes.
Collapse
Affiliation(s)
- Matthew S Faber
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI
| | - Emily E Wrenbeck
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI
| | - Laura R Azouz
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI
| | - Paul J Steiner
- Department of Chemical and Biological Engineering, University of Colorado, Boulder, CO
| | - Timothy A Whitehead
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI.,Department of Chemical and Biological Engineering, University of Colorado, Boulder, CO.,E.E.W. Ginkgo Bioworks, L.R.A. McKetta Department of Chemical Engineering, University of Texas at Austin, Austin, TX
| |
Collapse
|
4
|
Mayorov A, Dal Peraro M, Abriata LA. Active Site-Induced Evolutionary Constraints Follow Fold Polarity Principles in Soluble Globular Enzymes. Mol Biol Evol 2020; 36:1728-1733. [PMID: 31004173 DOI: 10.1093/molbev/msz096] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A recent analysis of evolutionary rates in >500 globular soluble enzymes revealed pervasive conservation gradients toward catalytic residues. By looking at amino acid preference profiles rather than evolutionary rates in the same data set, we quantified the effects of active sites on site-specific constraints for physicochemical traits. We found that conservation gradients respond to constraints for polarity, hydrophobicity, flexibility, rigidity and structure in ways consistent with fold polarity principles; while sites far from active sites seem to experience no physicochemical constraint, rather being highly variable and favoring amino acids of low metabolic cost. Globally, our results highlight that amino acid variation contains finer information about protein structure than usually regarded in evolutionary models, and that this information is retrievable automatically with simple fits. We propose that analyses of the kind presented here incorporated into models of protein evolution should allow for better description of the physical chemistry that underlies molecular evolution.
Collapse
Affiliation(s)
- Alexander Mayorov
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Luciano A Abriata
- Laboratory for Biomolecular Modeling, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Protein Production and Structure Core Facility, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
5
|
Xia JH, Wei GH. Enhancer Dysfunction in 3D Genome and Disease. Cells 2019; 8:cells8101281. [PMID: 31635067 PMCID: PMC6830074 DOI: 10.3390/cells8101281] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/10/2019] [Accepted: 10/14/2019] [Indexed: 12/13/2022] Open
Abstract
Spatiotemporal patterns of gene expression depend on enhancer elements and other factors during individual development and disease progression. The rapid progress of high-throughput techniques has led to well-defined enhancer chromatin properties. Various genome-wide methods have revealed a large number of enhancers and the discovery of three-dimensional (3D) genome architecture showing the distant interacting mechanisms of enhancers that loop to target gene promoters. Whole genome sequencing projects directed at cancer have led to the discovery of substantial enhancer dysfunction in misregulating gene expression and in tumor initiation and progression. Results from genome-wide association studies (GWAS) combined with functional genomics analyses have elucidated the functional impacts of many cancer risk-associated variants that are enriched within the enhancer regions of chromatin. Risk variants dysregulate the expression of enhancer variant-associated genes via 3D genomic interactions. Moreover, these enhancer variants often alter the chromatin binding affinity for cancer-relevant transcription factors, which in turn leads to aberrant expression of the genes associated with cancer susceptibility. In this review, we investigate the extent to which these genetic regulatory circuits affect cancer predisposition and how the recent development of genome-editing methods have enabled the determination of the impacts of genomic variation and alteration on cancer phenotype, which will eventually lead to better management plans and treatment responses to human cancer in the clinic.
Collapse
Affiliation(s)
- Ji-Han Xia
- Biocenter Oulu, Faculty of Biochemistry and Molecular Medicine, University of Oulu, 90014 Oulu, Finland.
| | - Gong-Hong Wei
- Biocenter Oulu, Faculty of Biochemistry and Molecular Medicine, University of Oulu, 90014 Oulu, Finland.
| |
Collapse
|
6
|
Codon and Codon-Pair Usage Tables (CoCoPUTs): Facilitating Genetic Variation Analyses and Recombinant Gene Design. J Mol Biol 2019; 431:2434-2441. [PMID: 31029701 DOI: 10.1016/j.jmb.2019.04.021] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/10/2019] [Accepted: 04/15/2019] [Indexed: 02/08/2023]
Abstract
Usage of sequential codon-pairs is non-random and unique to each species. Codon-pair bias is related to but clearly distinct from individual codon usage bias. Codon-pair bias is thought to affect translational fidelity and efficiency and is presumed to be under the selective pressure. It was suggested that changes in codon-pair utilization may affect human disease more significantly than changes in single codons. Although recombinant gene technologies often take codon-pair usage bias into account, codon-pair usage data/tables are not readily available, thus potentially impeding research efforts. The present computational resource (https://hive.biochemistry.gwu.edu/review/codon2) systematically addresses this issue. Building on our recent HIVE-Codon Usage Tables, we constructed a new database to include genomic codon-pair and dinucleotide statistics of all organisms with sequenced genome, available in the GenBank. We believe that the growing understanding of the importance of codon-pair usage will make this resource an invaluable tool to many researchers in academia and pharmaceutical industry.
Collapse
|
7
|
Barbieri M. What is code biology? Biosystems 2018; 164:1-10. [DOI: 10.1016/j.biosystems.2017.10.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 10/04/2017] [Accepted: 10/05/2017] [Indexed: 01/29/2023]
|
8
|
Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics 2017; 18:391. [PMID: 28865429 PMCID: PMC5581930 DOI: 10.1186/s12859-017-1793-7] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 08/15/2017] [Indexed: 01/24/2023] Open
Abstract
Background Due to the degeneracy of the genetic code, most amino acids can be encoded by multiple synonymous codons. Synonymous codons naturally occur with different frequencies in different organisms. The choice of codons may affect protein expression, structure, and function. Recombinant gene technologies commonly take advantage of the former effect by implementing a technique termed codon optimization, in which codons are replaced with synonymous ones in order to increase protein expression. This technique relies on the accurate knowledge of codon usage frequencies. Accurately quantifying codon usage bias for different organisms is useful not only for codon optimization, but also for evolutionary and translation studies: phylogenetic relations of organisms, and host-pathogen co-evolution relationships, may be explored through their codon usage similarities. Furthermore, codon usage has been shown to affect protein structure and function through interfering with translation kinetics, and cotranslational protein folding. Results Despite the obvious need for accurate codon usage tables, currently available resources are either limited in scope, encompassing only organisms from specific domains of life, or greatly outdated. Taking advantage of the exponential growth of GenBank and the creation of NCBI’s RefSeq database, we have developed a new database, the High-performance Integrated Virtual Environment-Codon Usage Tables (HIVE-CUTs), to present and analyse codon usage tables for every organism with publicly available sequencing data. Compared to existing databases, this new database is more comprehensive, addresses concerns that limited the accuracy of earlier databases, and provides several new functionalities, such as the ability to view and compare codon usage between individual organisms and across taxonomical clades, through graphical representation or through commonly used indices. In addition, it is being routinely updated to keep up with the continuous flow of new data in GenBank and RefSeq. Conclusion Given the impact of codon usage bias on recombinant gene technologies, this database will facilitate effective development and review of recombinant drug products and will be instrumental in a wide area of biological research. The database is available at hive.biochemistry.gwu.edu/review/codon. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1793-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- John Athey
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Aikaterini Alexaki
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Ekaterina Osipova
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Alexandre Rostovtsev
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Luis V Santana-Quintero
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Upendra Katneni
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Vahan Simonyan
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Chava Kimchi-Sarfaty
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA.
| |
Collapse
|
9
|
Soussi T, Taschner PEM, Samuels Y. Synonymous Somatic Variants in Human Cancer Are Not Infamous: A Plea for Full Disclosure in Databases and Publications. Hum Mutat 2017; 38:339-342. [PMID: 28026089 DOI: 10.1002/humu.23163] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 11/28/2016] [Accepted: 12/11/2016] [Indexed: 12/12/2022]
Abstract
Single-nucleotide variants (SNVs) are the most frequent genetic changes found in human cancer. Most driver alterations are missense and nonsense variants localized in the coding region of cancer genes. Unbiased cancer genome sequencing shows that synonymous SNVs (sSNVs) can be found clustered in the coding regions of several cancer oncogenes or tumor suppressor genes suggesting purifying selection. sSNVs are currently underestimated, as they are usually discarded during analysis. Furthermore, several public databases do not display sSNVs, which can lead to analytical bias and the false assumption that this mutational event is uncommon. Recent progress in our understanding of the deleterious consequences of these sSNVs for RNA stability and protein translation shows that they can act as strong drivers of cancer, as demonstrated for several cancer genes such as TP53 or BCL2L12. It is therefore essential that sSNVs be properly reported and analyzed in order to provide an accurate picture of the genetic landscape of the cancer genome.
Collapse
Affiliation(s)
- Thierry Soussi
- Sorbonne Université, UPMC Univ Paris 06, Paris, F-75005, France.,INSERM, U1138, Centre de Recherche des Cordeliers, Paris, France.,Department of Oncology-Pathology, Karolinska Institutet, Cancer Center Karolinska (CCK) R8:04, Stockholm, SE-171 76, Sweden
| | - Peter E M Taschner
- Generade Centre of Expertise Genomics and University of Applied Sciences Leiden, Leiden, 2333 CL, The Netherlands
| | - Yardena Samuels
- Molecular Cell Biology Department, Weizmann Institute of Science, Rehovot, 76100, Israel
| |
Collapse
|
10
|
Pancsa R, Tompa P. Coding Regions of Intrinsic Disorder Accommodate Parallel Functions. Trends Biochem Sci 2016; 41:898-906. [DOI: 10.1016/j.tibs.2016.08.009] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Revised: 08/16/2016] [Accepted: 08/19/2016] [Indexed: 02/01/2023]
|
11
|
|
12
|
Gamble CE, Brule CE, Dean KM, Fields S, Grayhack EJ. Adjacent Codons Act in Concert to Modulate Translation Efficiency in Yeast. Cell 2016; 166:679-690. [PMID: 27374328 PMCID: PMC4967012 DOI: 10.1016/j.cell.2016.05.070] [Citation(s) in RCA: 141] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/14/2016] [Accepted: 05/19/2016] [Indexed: 12/18/2022]
Abstract
Translation elongation efficiency is largely thought of as the sum of decoding efficiencies for individual codons. Here, we find that adjacent codon pairs modulate translation efficiency. Deploying an approach in Saccharomyces cerevisiae that scored the expression of over 35,000 GFP variants in which three adjacent codons were randomized, we have identified 17 pairs of adjacent codons associated with reduced expression. For many pairs, codon order is obligatory for inhibition, implying a more complex interaction than a simple additive effect. Inhibition mediated by adjacent codons occurs during translation itself as GFP expression is restored by increased tRNA levels or by non-native tRNAs with exact-matching anticodons. Inhibition operates in endogenous genes, based on analysis of ribosome profiling data. Our findings suggest translation efficiency is modulated by an interplay between tRNAs at adjacent sites in the ribosome and that this concerted effect needs to be considered in predicting the functional consequences of codon choice.
Collapse
Affiliation(s)
- Caitlin E Gamble
- Departments of Genome Sciences and Medicine, University of Washington, Seattle, WA 98195, USA; Program in Molecular and Cellular Biology, University of Washington, Seattle, WA 98195, USA
| | - Christina E Brule
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - Kimberly M Dean
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA
| | - Stanley Fields
- Departments of Genome Sciences and Medicine, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| | - Elizabeth J Grayhack
- Department of Biochemistry and Biophysics, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA.
| |
Collapse
|
13
|
Abriata LA, Bovigny C, Dal Peraro M. Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server. BMC Bioinformatics 2016; 17:242. [PMID: 27315797 PMCID: PMC4912743 DOI: 10.1186/s12859-016-1124-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 11/21/2022] Open
Abstract
Background Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Results Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. Discussion We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. Conclusion We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1124-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luciano A Abriata
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.
| | - Christophe Bovigny
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland.,Present address: Molecular Modeling Group, Swiss Institute of Bioinformatics, UNIL, Bâtiment Génopode, Lausanne, 1015, Switzerland
| | - Matteo Dal Peraro
- Laboratory for Biomolecular Modeling, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Institute of Bioinformatics, AAB014 Station 19, Lausanne, 1015, Switzerland
| |
Collapse
|
14
|
Yadav VK, Smith KS, Flinders C, Mumenthaler SM, De S. Significance of duon mutations in cancer genomes. Sci Rep 2016; 6:27437. [PMID: 27272679 PMCID: PMC4897603 DOI: 10.1038/srep27437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 05/17/2016] [Indexed: 11/15/2022] Open
Abstract
Functional mutations in coding regions not only affect the structure and function of the protein products, but may also modulate their expression in some cases. This class of mutations, recently dubbed “duon mutations” due to their dual roles, can potentially have major impacts on downstream pathways. However their significance in diseases such as cancer remain unclear. In a survey covering 4606 samples from 19 cancer types, and integrating allelic expression, overall mRNA expression, regulatory motif perturbation, and chromatin signatures in one composite index called REDACT score, we identified potential duon mutations. Several such mutations are detected in known cancer genes in multiple cancer types. For instance a potential duon mutation in TP53 is associated with increased expression of the mutant allelic gene copy, thereby possibly amplifying the functional effects on the downstream pathways. Another potential duon mutation in SF3B1 is associated with abnormal splicing and changes in angiogenesis and matrix degradation related pathways. Our findings emphasize the need to interrogate the mutations in coding regions beyond their obvious effects on protein structures.
Collapse
Affiliation(s)
- Vinod Kumar Yadav
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA.,The Jackson Laboratory, Farmington, CT06032, USA
| | - Kyle S Smith
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA.,Computational Biosciences Graduate Program, University of Colorado, Aurora, CO 80045, USA
| | - Colin Flinders
- Center for Applied Molecular Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Shannon M Mumenthaler
- Center for Applied Molecular Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Subhajyoti De
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA.,University of Colorado Cancer Center, Aurora 80045, CO, USA.,Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| |
Collapse
|
15
|
Pang E, Wu X, Lin K. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences. Mol Genet Genomics 2016; 291:1127-36. [PMID: 26833483 PMCID: PMC4875946 DOI: 10.1007/s00438-016-1170-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/18/2016] [Indexed: 11/30/2022]
Abstract
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
Collapse
Affiliation(s)
- Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| | - Xiaomei Wu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 310036, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
16
|
Martínez MA, Jordan-Paiz A, Franco S, Nevot M. Synonymous Virus Genome Recoding as a Tool to Impact Viral Fitness. Trends Microbiol 2016; 24:134-147. [DOI: 10.1016/j.tim.2015.11.002] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Revised: 10/28/2015] [Accepted: 11/04/2015] [Indexed: 01/28/2023]
|
17
|
Mallik S, Das S, Kundu S. Predicting protein folding rate change upon point mutation using residue-level coevolutionary information. Proteins 2015; 84:3-8. [DOI: 10.1002/prot.24960] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 11/11/2015] [Accepted: 11/11/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Saurav Mallik
- Department of Biophysics; Molecular Biology and Bioinformatics, University of Calcutta; Kolkata 700009 India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta; Kolkata 700009 India
| | - Smita Das
- Department of Biophysics; Molecular Biology and Bioinformatics, University of Calcutta; Kolkata 700009 India
| | - Sudip Kundu
- Department of Biophysics; Molecular Biology and Bioinformatics, University of Calcutta; Kolkata 700009 India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta; Kolkata 700009 India
| |
Collapse
|
18
|
Angione C, Lió P. Predictive analytics of environmental adaptability in multi-omic network models. Sci Rep 2015; 5:15147. [PMID: 26482106 PMCID: PMC4611489 DOI: 10.1038/srep15147] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 09/14/2015] [Indexed: 01/22/2023] Open
Abstract
Bacterial phenotypic traits and lifestyles in response to diverse environmental conditions depend on changes in the internal molecular environment. However, predicting bacterial adaptability is still difficult outside of laboratory controlled conditions. Many molecular levels can contribute to the adaptation to a changing environment: pathway structure, codon usage, metabolism. To measure adaptability to changing environmental conditions and over time, we develop a multi-omic model of Escherichia coli that accounts for metabolism, gene expression and codon usage at both transcription and translation levels. After the integration of multiple omics into the model, we propose a multiobjective optimization algorithm to find the allowable and optimal metabolic phenotypes through concurrent maximization or minimization of multiple metabolic markers. In the condition space, we propose Pareto hypervolume and spectral analysis as estimators of short term multi-omic (transcriptomic and metabolic) evolution, thus enabling comparative analysis of metabolic conditions. We therefore compare, evaluate and cluster different experimental conditions, models and bacterial strains according to their metabolic response in a multidimensional objective space, rather than in the original space of microarray data. We finally validate our methods on a phenomics dataset of growth conditions. Our framework, named METRADE, is freely available as a MATLAB toolbox.
Collapse
Affiliation(s)
| | - Pietro Lió
- Computer Laboratory - University of Cambridge, UK
| |
Collapse
|
19
|
Ressayre A, Glémin S, Montalent P, Serre-Giardi L, Dillmann C, Joets J. Introns Structure Patterns of Variation in Nucleotide Composition in Arabidopsis thaliana and Rice Protein-Coding Genes. Genome Biol Evol 2015; 7:2913-28. [PMID: 26450849 PMCID: PMC4684703 DOI: 10.1093/gbe/evv189] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Plant genomes present a continuous range of variation in nucleotide composition (G + C content). In coding regions, G + C-poor species tend to have unimodal distributions of G + C content among genes within genomes and slight 5′–3′ gradients along genes. In contrast, G + C-rich species display bimodal distributions of G + C content among genes and steep 5′–3′ decreasing gradients along genes. The causes of these peculiar patterns are still poorly understood. Within two species (Arabidopsis thaliana and rice), each representative of one side of the continuum, we studied the consequences of intron presence on coding region and intron G + C content at different scales. By properly taking intron structure into account, we showed that, in both species, intron presence is associated with step changes in nucleotide, codon, and amino acid composition. This suggests that introns have a barrier effect structuring G + C content along genes and that previous continuous characterizations of the 5′–3′ gradients were artifactual. In external gene regions (located upstream first or downstream last introns), species-specific factors, such as GC-biased gene conversion, are shaping G + C content whereas in internal gene regions (surrounded by introns), G + C content is likely constrained to remain within a range common to both species.
Collapse
Affiliation(s)
- Adrienne Ressayre
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, INRA, Gif-sur-Yvette, France
| | - Sylvain Glémin
- Institut des Sciences de l'Evolution (ISEM), UMR 5554, Université de Montpellier, CNRS-IRD-EPHE, France Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Pierre Montalent
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, INRA, Gif-sur-Yvette, France
| | - Laurana Serre-Giardi
- UMR 1345 IRHS Institut de Recherche en Horticulture et Semences, INRA, Centre de Recherche Angers-Nantes, Beaucousé, France
| | - Christine Dillmann
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, Université Paris-Sud, Gif-sur-Yvette, France
| | - Johann Joets
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, INRA, Gif-sur-Yvette, France
| |
Collapse
|
20
|
Co-evolutionary constraints of globular proteins correlate with their folding rates. FEBS Lett 2015; 589:2179-85. [DOI: 10.1016/j.febslet.2015.06.032] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 06/09/2015] [Accepted: 06/24/2015] [Indexed: 11/20/2022]
|
21
|
Smithers B, Oates ME, Gough J. Splice junctions are constrained by protein disorder. Nucleic Acids Res 2015; 43:4814-22. [PMID: 25934802 PMCID: PMC4446445 DOI: 10.1093/nar/gkv407] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 04/15/2015] [Indexed: 01/23/2023] Open
Abstract
We have discovered that positions of splice junctions in genes are constrained by the tolerance for disorder-promoting amino acids in the translated protein region. It is known that efficient splicing requires nucleotide bias at the splice junction; the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splicing enhancers found near the splice junction in the gene, reveals that these (short DNA motifs) are more prevalent in exons that encode disordered protein regions than exons encoding structured regions. Thus we also conclude that local protein features constrain efficient splicing more in structure than in disorder.
Collapse
Affiliation(s)
- Ben Smithers
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Matt E Oates
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Julian Gough
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| |
Collapse
|
22
|
Morgunov AS, Babu MM. Optimizing membrane-protein biogenesis through nonoptimal-codon usage. Nat Struct Mol Biol 2015; 21:1023-5. [PMID: 25469841 DOI: 10.1038/nsmb.2926] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Alexey S Morgunov
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| | - M Madan Babu
- Medical Research Council Laboratory of Molecular Biology, Cambridge, UK
| |
Collapse
|
23
|
Abstract
There are two distinct types of DNA sequences, namely coding sequences and regulatory sequences, in a genome. A recent study of the occupancy of transcription factors (TFs) in human cells suggested that protein-coding sequences also serve as the codes of TF occupancy, and proposed a "duon" hypothesis in which up to 15% of codons of human protein genes are constrained by the additional coding requirements that regulate gene expression. This hypothesis challenges our basic understanding on the human genome. We reanalyzed the data and found that the previous study was confounded by ascertainment bias related to base composition. Using an unbiased comparison in which G/C and A/T sites are considered separately, we reveal a similar level of conservation between TF-bound codons and TF-depleted codons, suggesting largely no extra purifying selection provided by the TF occupancy on the codons of human genes. Given the generally short binding motifs of TFs and the open chromatin structure during transcription, we argue that the occupancy of TFs on protein-coding sequences is mostly passive and evolutionarily neutral, with to-be-determined functions in the regulation of gene expression.
Collapse
Affiliation(s)
- Ke Xing
- State Key Laboratory of Biocontrol, Cooperative Innovation Center for High Performance Computing, College of Ecology and Evolution, Sun Yat-sen University, Guangzhou, China Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| | - Xionglei He
- State Key Laboratory of Biocontrol, Cooperative Innovation Center for High Performance Computing, College of Ecology and Evolution, Sun Yat-sen University, Guangzhou, China Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
24
|
Dios F, Barturen G, Lebrón R, Rueda A, Hackenberg M, Oliver JL. DNA clustering and genome complexity. Comput Biol Chem 2014; 53 Pt A:71-8. [PMID: 25182383 DOI: 10.1016/j.compbiolchem.2014.08.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/08/2023]
Abstract
Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.
Collapse
Affiliation(s)
- Francisco Dios
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Guillermo Barturen
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Ricardo Lebrón
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Antonio Rueda
- Plataforma Andaluza de Genómica y Bioinformática (GBPA), Edificio INSUR, Calle Albert Einstein, 41092 Sevilla, Spain
| | - Michael Hackenberg
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - José L Oliver
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain.
| |
Collapse
|
25
|
Abstract
Whole-genome and functional analyses suggest a wealth of secondary or auxiliary genetic information (AGI) within the redundancy component of the genetic code. Although there are multiple aspects of biased codon use, we focus on two types of auxiliary information: codon-specific translational pauses that can be used by particular proteins toward their unique folding and biased codon patterns shared by groups of functionally related mRNAs with coordinate regulation. AGI is important to genetics in general and to human disease; here, we consider influences of its three major components, biased codon use itself, variations in the tRNAome, and anticodon modifications that distinguish synonymous decoding. AGI is plastic and can be used by different species to different extents, with tissue-specificity and in stress responses. Because AGI is species-specific, it is important to consider codon-sensitive experiments when using heterologous systems; for this we focus on the tRNA anticodon loop modification enzyme, CDKAL1, and its link to type 2 diabetes. Newly uncovered tRNAome variability among humans suggests roles in penetrance and as a genetic modifier and disease modifier. Development of experimental and bioinformatics methods are needed to uncover additional means of auxiliary genetic information.
Collapse
Affiliation(s)
- Richard J. Maraia
- Intramural Research Program on Genomics of Differentiation, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, USA
- Corresponding authorE-mail
| | - James R. Iben
- Intramural Research Program on Genomics of Differentiation, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
26
|
Li MJ, Yan B, Sham PC, Wang J. Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression. Brief Bioinform 2014; 16:393-412. [PMID: 24916300 DOI: 10.1093/bib/bbu018] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 04/23/2014] [Indexed: 12/13/2022] Open
Abstract
Understanding the genetic basis of human traits/diseases and the underlying mechanisms of how these traits/diseases are affected by genetic variations is critical for public health. Current genome-wide functional genomics data uncovered a large number of functional elements in the noncoding regions of human genome, providing new opportunities to study regulatory variants (RVs). RVs play important roles in transcription factor bindings, chromatin states and epigenetic modifications. Here, we systematically review an array of methods currently used to map RVs as well as the computational approaches in annotating and interpreting their regulatory effects, with emphasis on regulatory single-nucleotide polymorphism. We also briefly introduce experimental methods to validate these functional RVs.
Collapse
|