1
|
Martinsen E, Jinnurine T, Subramani S, Rogne M. Advances in RNA therapeutics for modulation of 'undruggable' targets. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 204:249-294. [PMID: 38458740 DOI: 10.1016/bs.pmbts.2023.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Over the past decades, drug discovery utilizing small pharmacological compounds, fragment-based therapeutics, and antibody therapy have significantly advanced treatment options for many human diseases. However, a major bottleneck has been that>70% of human proteins/genomic regions are 'undruggable' by the above-mentioned approaches. Many of these proteins constitute essential drug targets against complex multifactorial diseases like cancer, immunological disorders, and neurological diseases. Therefore, alternative approaches are required to target these proteins or genomic regions in human cells. RNA therapeutics is a promising approach for many of the traditionally 'undruggable' targets by utilizing methods such as antisense oligonucleotides, RNA interference, CRISPR/Cas-based genome editing, aptamers, and the development of mRNA therapeutics. In the following chapter, we will put emphasis on recent advancements utilizing these approaches against challenging drug targets, such as intranuclear proteins, intrinsically disordered proteins, untranslated genomic regions, and targets expressed in inaccessible tissues.
Collapse
Affiliation(s)
| | | | - Saranya Subramani
- Pioneer Research AS, Oslo Science Park, Oslo, Norway; Department of Pharmacy, Section for Pharmacology and Pharmaceutical Biosciences, University of Oslo, Oslo, Norway
| | - Marie Rogne
- Pioneer Research AS, Oslo Science Park, Oslo, Norway; Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway.
| |
Collapse
|
2
|
Tran TTH, Tran HS, Le BTN, Van Nguyen S, Vu HA, Kim OTP. Novel single nucleotide polymorphisms of insulin-like growth factor-binding protein 7 (IGFBP7) gene significantly associated with growth traits in striped catfish (Pangasianodon hypophthalmus Sauvage, 1878). Mol Genet Genomics 2023; 298:883-893. [PMID: 37097322 DOI: 10.1007/s00438-023-02016-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 04/05/2023] [Indexed: 04/26/2023]
Abstract
Breeding program to improve economically important growth traits in striped catfish (Pangasianodon hypophthalmus) requires effective molecular markers. This study was conducted to identify single nucleotide polymorphisms (SNPs) of Insulin-like Growth Factor-Binding Protein 7 (IGFBP7) gene which plays multiple roles in regulating growth, energy metabolism and development. The association between SNPs in IGFBP7 gene and growth traits in striped catfish was analyzed in order to uncover the SNPs that have potential to be valuable markers for improving growth traits. Firstly, fragments of IGFBP7 gene from ten fast-growing fish and ten slow-growing fish were sequenced in order to discover SNPs. After filtering the detected SNPs, an intronic SNP (2060A > G) and two non-synonymous SNPs (344 T > C and 4559C > A) causing Leu78Pro and Leu189Met in protein, respectively, were subjected to further validated by individual genotyping in 70 fast-growing fish and 70 slow-growing fish using single base extension method. Our results showed that two SNPs (2060A > G and 4559 C > A (p. Leu189Met)) were significantly associated with the growth in P. hypophthalmus (p < 0.001), thus being candidate SNP markers for the growth traits of this fish. Moreover, linkage disequilibrium and association analysis with growth traits of haplotypes generated from the 3 filtered SNPs (344 T > C, 2060 A > G and 4559 C > A) were examined. These revealed that the non-coding SNP locus (2060A > G) had higher genetic diversity at which the G allele was predominant over the A allele in the fast-growing fish. Furthermore, the results of qPCR showed that expression of IGFBP7 gene with genotype GG (at locus 2060) in fast-growing group was significantly higher than that with genotype AA in slow-growing group (p < 0.05). Our study provides insights into the genetic variants of IGFBP7 gene and useful data source for development molecular marker for growth traits in breeding of the striped catfish.
Collapse
Affiliation(s)
- Trang Thi Huyen Tran
- Institute of Genome Research, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam
- Graduate University of Science and Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam
| | - Hoang Son Tran
- Institute of Genome Research, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam
| | - Binh Thi Nguyen Le
- Institute of Genome Research, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam
| | - Sang Van Nguyen
- Research Institute of Aquaculture, No.2, 116 Nguyen Dinh Chieu Str, District 1, Ho Chi Minh City, Vietnam
| | - Hai-Anh Vu
- Institute of Genome Research, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam
| | - Oanh Thi Phuong Kim
- Institute of Genome Research, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam.
- Graduate University of Science and Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Str, Cau Giay, Hanoi, Vietnam.
| |
Collapse
|
3
|
Durward-Akhurst SA, Schaefer RJ, Grantham B, Carey WK, Mickelson JR, McCue ME. Genetic Variation and the Distribution of Variant Types in the Horse. Front Genet 2021; 12:758366. [PMID: 34925451 PMCID: PMC8676274 DOI: 10.3389/fgene.2021.758366] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
Genetic variation is a key contributor to health and disease. Understanding the link between an individual's genotype and the corresponding phenotype is a major goal of medical genetics. Whole genome sequencing (WGS) within and across populations enables highly efficient variant discovery and elucidation of the molecular nature of virtually all genetic variation. Here, we report the largest catalog of genetic variation for the horse, a species of importance as a model for human athletic and performance related traits, using WGS of 534 horses. We show the extent of agreement between two commonly used variant callers. In data from ten target breeds that represent major breed clusters in the domestic horse, we demonstrate the distribution of variants, their allele frequencies across breeds, and identify variants that are unique to a single breed. We investigate variants with no homozygotes that may be potential embryonic lethal variants, as well as variants present in all individuals that likely represent regions of the genome with errors, poor annotation or where the reference genome carries a variant. Finally, we show regions of the genome that have higher or lower levels of genetic variation compared to the genome average. This catalog can be used for variant prioritization for important equine diseases and traits, and to provide key information about regions of the genome where the assembly and/or annotation need to be improved.
Collapse
Affiliation(s)
- S. A. Durward-Akhurst
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States
| | - R. J. Schaefer
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States
| | - B. Grantham
- Interval Bio LLC, Mountain View, CA, United States
| | - W. K. Carey
- Interval Bio LLC, Mountain View, CA, United States
| | - J. R. Mickelson
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Minneapolis, MN, United States
| | - M. E. McCue
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
4
|
Schweizer G, Wagner A. Both Binding Strength and Evolutionary Accessibility Affect the Population Frequency of Transcription Factor Binding Sequences in Arabidopsis thaliana. Genome Biol Evol 2021; 13:6459646. [PMID: 34894231 PMCID: PMC8712246 DOI: 10.1093/gbe/evab273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2021] [Indexed: 11/22/2022] Open
Abstract
Mutations in DNA sequences that bind transcription factors and thus modulate gene expression are a source of adaptive variation in gene expression. To understand how transcription factor binding sequences evolve in natural populations of the thale cress Arabidopsis thaliana, we integrated genomic polymorphism data for loci bound by transcription factors with in vitro data on binding affinity for these transcription factors. Specifically, we studied 19 different transcription factors, and the allele frequencies of 8,333 genomic loci bound in vivo by these transcription factors in 1,135 A. thaliana accessions. We find that transcription factor binding sequences show very low genetic diversity, suggesting that they are subject to purifying selection. High frequency alleles of such binding sequences tend to bind transcription factors strongly. Conversely, alleles that are absent from the population tend to bind them weakly. In addition, alleles with high frequencies also tend to be the endpoints of many accessible evolutionary paths leading to these alleles. We show that both high affinity and high evolutionary accessibility contribute to high allele frequency for at least some transcription factors. Although binding sequences with stronger affinity are more frequent, we did not find them to be associated with higher gene expression levels. Epistatic interactions among individual mutations that alter binding affinity are pervasive and can help explain variation in accessibility among binding sequences. In summary, combining in vitro binding affinity data with in vivo binding sequence data can help understand the forces that affect the evolution of transcription factor binding sequences in natural populations.
Collapse
Affiliation(s)
- Gabriel Schweizer
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zürich, Switzerland.,Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, Lausanne, Switzerland.,Santa Fe Institute, Santa Fe, New Mexico, USA.,Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, South Africa
| |
Collapse
|
5
|
Joshi M, Kapopoulou A, Laurent S. Impact of Genetic Variation in Gene Regulatory Sequences: A Population Genomics Perspective. Front Genet 2021; 12:660899. [PMID: 34276769 PMCID: PMC8282999 DOI: 10.3389/fgene.2021.660899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 05/31/2021] [Indexed: 01/06/2023] Open
Abstract
The unprecedented rise of high-throughput sequencing and assay technologies has provided a detailed insight into the non-coding sequences and their potential role as gene expression regulators. These regulatory non-coding sequences are also referred to as cis-regulatory elements (CREs). Genetic variants occurring within CREs have been shown to be associated with altered gene expression and phenotypic changes. Such variants are known to occur spontaneously and ultimately get fixed, due to selection and genetic drift, in natural populations and, in some cases, pave the way for speciation. Hence, the study of genetic variation at CREs has improved our overall understanding of the processes of local adaptation and evolution. Recent advances in high-throughput sequencing and better annotations of CREs have enabled the evaluation of the impact of such variation on gene expression, phenotypic alteration and fitness. Here, we review recent research on the evolution of CREs and concentrate on studies that have investigated genetic variation occurring in these regulatory sequences within the context of population genetics.
Collapse
Affiliation(s)
- Manas Joshi
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | | | - Stefan Laurent
- Department of Comparative Development and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| |
Collapse
|
6
|
Wang Y, Shi FY, Liang Y, Gao G. REVA as A Well-curated Database for Human Expression-modulating Variants. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:590-601. [PMID: 34224878 PMCID: PMC9040024 DOI: 10.1016/j.gpb.2021.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 06/22/2021] [Accepted: 06/25/2021] [Indexed: 10/25/2022]
Abstract
More than 90% of disease- and trait-associated human variants are noncoding. By systematically screening multiple large-scale studies, we compiled REVA, a manually curated database for over 11.8 million experimentally tested noncoding variants with expression-modulating potentials. We provided 2424 functional annotations that could be used to pinpoint the plausible regulatory mechanism of these variants. We further benchmarked multiple state-of-the-art computational tools and found their limited sensitivity remains a serious challenge for effective large-scale analysis. REVA provides high-quality experimentally tested expression-modulating variants with extensive functional annotations, which will be useful for users in the noncoding variants community. REVA is available at http://reva.gao-lab.org.
Collapse
Affiliation(s)
- Yu Wang
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Fang-Yuan Shi
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Sciences, Nanchang University, Nanchang 330031, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI) and State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China.
| |
Collapse
|
7
|
Koganebuchi K, Sato K, Fujii K, Kumabe T, Haneji K, Toma T, Ishida H, Joh K, Soejima H, Mano S, Ogawa M, Oota H. An analysis of the demographic history of the risk allele R4810K in RNF213 of moyamoya disease. Ann Hum Genet 2021; 85:166-177. [PMID: 34013582 PMCID: PMC8453937 DOI: 10.1111/ahg.12424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/07/2021] [Accepted: 04/08/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND Ring finger protein 213 (RNF213) is a susceptibility gene of moyamoya disease (MMD). A previous case-control study and a family analysis demonstrated a strong association of the East Asian-specific variant, R4810K (rs112735431), with MMD. Our aim is to uncover evolutionary history of R4810K in East Asian populations. METHODS The RNF213 locus of 24 MMD patients in Japan were sequenced using targeted-capture sequencing. Based on the sequence data, we conducted population genetic analysis and estimated the age of R4810K using coalescent simulation. RESULTS The diversity of the RNF213 gene was higher in Africans than non-Africans, which can be explained by bottleneck effect of the out-of-Africa migration. Coalescent simulation showed that the risk variant was born in East Asia 14,500-5100 years ago and came to the Japanese archipelago afterward, probably in the period when the known migration based on archaeological evidences occurred. CONCLUSIONS Although clinical data show that the symptoms varies, all sequences harboring the risk allele are almost identical with a small number of exceptions, suggesting the MMD phenotypes are unaffected by the variants of this gene and rather would be more affected by environmental factors.
Collapse
Affiliation(s)
- Kae Koganebuchi
- Department of Biological Structure, Kitasato University Graduate School of Medical Sciences, Sagamihara, Kanagawa, Japan.,Faculty of Medicine, Advanced Medical Research Center, University of the Ryukyus, Nishihara, Okinawa, Japan.,Department of Biological Sciences, Graduate School of Science, University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Kimitoshi Sato
- Department of Neurosurgery, Kitasato University School of Medicine, Sagamihara, Kanagawa, Japan
| | - Kiyotaka Fujii
- Department of Neurosurgery, Kitasato University School of Medicine, Sagamihara, Kanagawa, Japan
| | - Toshihiro Kumabe
- Department of Neurosurgery, Kitasato University School of Medicine, Sagamihara, Kanagawa, Japan
| | - Kuniaki Haneji
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Takashi Toma
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Hajime Ishida
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Keiichiro Joh
- Division of Molecular Genetics and Epigenetics, Faculty of Medicine, Department of Biomolecular Sciences, Saga University, Saga, Saga, Japan
| | - Hidenobu Soejima
- Division of Molecular Genetics and Epigenetics, Faculty of Medicine, Department of Biomolecular Sciences, Saga University, Saga, Saga, Japan
| | - Shuhei Mano
- Department of Mathematical Analysis and Statistical Inference, The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan
| | - Motoyuki Ogawa
- Department of Biological Structure, Kitasato University Graduate School of Medical Sciences, Sagamihara, Kanagawa, Japan.,Department of Anatomy, Kitasato University School of Medicine, Sagamihara, Kanagawa, Japan
| | - Hiroki Oota
- Department of Biological Structure, Kitasato University Graduate School of Medical Sciences, Sagamihara, Kanagawa, Japan.,Department of Anatomy, Kitasato University School of Medicine, Sagamihara, Kanagawa, Japan.,Department of Biological Sciences, Graduate School of Science, University of Tokyo, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
8
|
Imprints of selection in peripheral and ecologically marginal central-eastern European Scots pine populations. Gene 2021; 779:145509. [PMID: 33600955 DOI: 10.1016/j.gene.2021.145509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 11/04/2020] [Accepted: 02/05/2021] [Indexed: 11/21/2022]
Abstract
Knowledge of the molecular mechanisms underlying the stress response in plants is essential to understand evolutionary processes that result in long-term persistence of populations. Populations inhabiting marginal ecological conditions at the distribution range periphery may have preserved imprints of natural selection that have shaped functional genetic variation of the species. Our aim was to evaluate the extent of selection processes in the extremely fragmented, peripheral and isolated populations of Scots pine in central-eastern Europe. Autochthonous populations of the Carpathian Mts. and the Pannonian Basin were sampled and drought stress-related candidate genes were re-sequenced. Neutrality tests and outlier detection approaches were applied to infer the effect and direction of selection. Populations retained high genetic diversity by preserving a high number of alleles and haplotypes, many of them being population specific. Neutrality tests and outlier detection highlighted nucleotide positions that are under divergent selection and may be involved in local adaptation. The detected genetic pattern confirms that natural selection has played an important role in shaping modern-day genetic variation in marginal Scots pine populations, allowing for the long-term persistence of populations. Selection detected at functional regions possibly acts to maintain diversity and counteract the effect of genetic erosion.
Collapse
|
9
|
Michelson DJ, Clark RD. Optimizing Genetic Diagnosis of Neurodevelopmental Disorders in the Clinical Setting. Clin Lab Med 2020; 40:231-256. [PMID: 32718497 DOI: 10.1016/j.cll.2020.05.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Progress in medical genetics has changed the practice of medicine in general and child neurology in particular. A genetic diagnosis has become critically important in determining optimal management of many neurodevelopmental disorders, making genetic testing a routine consideration of patient care in outpatient and inpatient settings. Today's child neurologists should be familiar with various genetic testing modalities and their appropriate use. Molecular genetic testing of children with unexplained developmental delays and/or congenital anomalies has a 20% to 30% chance of identifying a causative etiology. Newer methods have made genetic testing more widely available and sensitive but also more likely to produce ambiguous results.
Collapse
Affiliation(s)
- David Joshua Michelson
- Division of Child Neurology, Department of Pediatrics, Loma Linda University School of Medicine, Coleman Pavilion Room A, 1175 Campus Street, Loma Linda, CA 92354, USA.
| | - Robin Dawn Clark
- Division of Medical Genetics, Department of Pediatrics, Loma Linda University School of Medicine, Coleman Pavilion Room A, 1175 Campus Street, Loma Linda, CA 92354, USA
| |
Collapse
|
10
|
Transcriptome Analysis of circRNA and mRNA in Theca Cells during Follicular Development in Chickens. Genes (Basel) 2020; 11:genes11050489. [PMID: 32365656 PMCID: PMC7290432 DOI: 10.3390/genes11050489] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/19/2020] [Accepted: 04/27/2020] [Indexed: 12/11/2022] Open
Abstract
Development of ovarian follicles requires interactions between granulosa cells, theca cells, and oocytes. Multiple transcription levels are involved but information about the role of noncoding RNAs, especially circular RNAs (circRNAs), is lacking. Here, we used RNA sequencing to profile circRNAs and mRNAs in theca cells from three types of follicle: small yellow follicles (SYF), the smallest hierarchical follicles (F6), and the largest hierarchical follicles (F1). Using bioinformatics analysis, we identified a total of 14,502 circRNAs in all theca cells, with 5622 widely distributed in all stages of development. Differential expression analysis suggested that some genes display differential isoforms during follicular development. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed enrichment of both differentially expressed circRNAs and mRNAs in pathways associated with reproduction, including the TGF-β signaling pathway, oocyte meiosis, and vascular smooth muscle contraction. Our study provides the first visual information about circRNAs and mRNAs in theca cells during follicle development in chickens and adds to the growing body of knowledge about theca cells.
Collapse
|
11
|
Salichos L, Meyerson W, Warrell J, Gerstein M. Estimating growth patterns and driver effects in tumor evolution from individual samples. Nat Commun 2020; 11:732. [PMID: 32024824 PMCID: PMC7002450 DOI: 10.1038/s41467-020-14407-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 11/26/2019] [Indexed: 01/01/2023] Open
Abstract
Tumors accumulate thousands of mutations, and sequencing them has given rise to methods for finding cancer drivers via mutational recurrence. However, these methods require large cohorts and underperform for low recurrence. Recently, ultra-deep sequencing has enabled accurate measurement of VAFs (variant-allele frequencies) for mutations, allowing the determination of evolutionary trajectories. Here, based solely on the VAF spectrum for an individual sample, we report on a method that identifies drivers and quantifies tumor growth. Drivers introduce perturbations into the spectrum, and our method uses the frequency of hitchhiking mutations preceding a driver to measure this. As validation, we use simulation models and 993 tumors from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium with previously identified drivers. Then we apply our method to an ultra-deep sequenced acute myeloid leukemia (AML) tumor and identify known cancer genes and additional driver candidates. In summary, our framework presents opportunities for personalized driver diagnosis using sequencing data from a single individual.
Collapse
Affiliation(s)
- Leonidas Salichos
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - William Meyerson
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Jonathan Warrell
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
- Department of Computer Science, Yale University, New Haven, CT, 06520, USA.
- Center for Biomedical Data Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
12
|
Chen Z, Wen W, Beeghly-Fadiel A, Shu XO, Díez-Obrero V, Long J, Bao J, Wang J, Liu Q, Cai Q, Moreno V, Zheng W, Guo X. Identifying Putative Susceptibility Genes and Evaluating Their Associations with Somatic Mutations in Human Cancers. Am J Hum Genet 2019; 105:477-492. [PMID: 31402092 PMCID: PMC6731359 DOI: 10.1016/j.ajhg.2019.07.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 07/10/2019] [Indexed: 12/23/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified hundreds of genetic risk variants for human cancers. However, target genes for the majority of risk loci remain largely unexplored. It is also unclear whether GWAS risk-loci-associated genes contribute to mutational signatures and tumor mutational burden (TMB) in cancer tissues. We systematically conducted cis-expression quantitative trait loci (cis-eQTL) analyses for 294 GWAS-identified variants for six major types of cancer-colorectal, lung, ovary, prostate, pancreas, and melanoma-by using transcriptome data from the Genotype-Tissue Expression (GTEx) Project, the Cancer Genome Atlas (TCGA), and other public data sources. By using integrative analysis strategies, we identified 270 candidate target genes, including 99 with previously unreported associations, for six cancer types. By analyzing functional genomic data, our results indicate that 180 genes (66.7% of 270) had evidence of cis-regulation by putative functional variants via proximal promoter or distal enhancer-promoter interactions. Together with our previously reported associations for breast cancer risk, our results show that 24 genes are shared by at least two cancer types, including four genes for both breast and ovarian cancer. By integrating mutation data from TCGA, we found that expression levels of 33 and 66 putative susceptibility genes were associated with specific mutational signatures and TMB of cancer-driver genes, respectively, at a Bonferroni-corrected p < 0.05. Together, these findings provide further insight into our understanding of how genetic risk variants might contribute to carcinogenesis through the regulation of susceptibility genes that are related to the biogenesis of somatic mutations.
Collapse
Affiliation(s)
- Zhishan Chen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Wanqing Wen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Alicia Beeghly-Fadiel
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Xiao-Ou Shu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Virginia Díez-Obrero
- Unit of Biomarkers and Susceptibility, Oncology Data Analytics Program, Catalan Institute of Oncology, Barcelona 08908, Spain; Colorectal Cancer Group, ONCOBELL Program, Bellvitge Biomedical Research Institute, Barcelona 08908, Spain; Consortium for Biomedical Research in Epidemiology and Public Health, Barcelona 08908, Spain; Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona 08908, Spain
| | - Jirong Long
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Jiandong Bao
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA; College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Jing Wang
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Qi Liu
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Qiuyin Cai
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Victor Moreno
- Unit of Biomarkers and Susceptibility, Oncology Data Analytics Program, Catalan Institute of Oncology, Barcelona 08908, Spain; Colorectal Cancer Group, ONCOBELL Program, Bellvitge Biomedical Research Institute, Barcelona 08908, Spain; Consortium for Biomedical Research in Epidemiology and Public Health, Barcelona 08908, Spain; Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona 08908, Spain
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN 37203, USA.
| |
Collapse
|
13
|
In silico identification of long non-coding RNA based simple sequence repeat markers and their application in diversity analysis in rice. GENE REPORTS 2019. [DOI: 10.1016/j.genrep.2019.100418] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Wilcox JS, Kerschner A, Hollocher H. Indel-informed Bayesian analysis suggests cryptic population structure between Plasmodium knowlesi of humans and long-tailed macaques (Macaca fascicularis) in Malaysian Borneo. INFECTION GENETICS AND EVOLUTION 2019; 75:103994. [PMID: 31421245 DOI: 10.1016/j.meegid.2019.103994] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 08/01/2019] [Accepted: 08/03/2019] [Indexed: 01/02/2023]
Abstract
Plasmodium knowlesi is an important causative agent of malaria in humans of Southeast Asia. Macaques are natural hosts for this parasite, but little is conclusively known about its patterns of transmission within and between these hosts. Here, we apply a comprehensive phylogenetic approach to test for patterns of cryptic population genetic structure between P. knowlesi isolated from humans and long-tailed macaques from the state of Sarawak in Malaysian Borneo. Our approach differs from previous investigations through our exhaustive use of archival 18S Small Subunit rRNA (18S) gene sequences from Plasmodium and Hepatocystis species, our inclusion of insertion and deletion information during phylogenetic inference, and our application of Bayesian phylogenetic inference to this problem. We report distinct clades of P. knowlesi that predominantly contained sequences from either human or macaque hosts for paralogous A-type and S-type 18S gene loci. We report significant partitioning of sequence distances between host species across both types of loci, and confirmed that sequences of the same locus type showed significantly biased assortment into different clades depending on their host species. Our results support the zoonotic potential of Plasmodium knowlesi, but also suggest that humans may be preferentially infected with certain strains of this parasite. Broadly, such patterns could arise through preferential zoonotic transmission of some parasite lineages or a disposition of parasites to transmit within, rather than between, human and macaque hosts. Available data are insufficient to address these hypotheses. Our results suggest that the epidemiology of P. knowlesi may be more complicated than previously assumed, and highlight the need for renewed and more vigorous explorations of transmission patterns in the fifth human malarial parasite.
Collapse
Affiliation(s)
- JustinJ S Wilcox
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-5688, USA.
| | - Abigail Kerschner
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-5688, USA
| | - Hope Hollocher
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-5688, USA
| |
Collapse
|
15
|
Liu EM, Martinez-Fundichely A, Diaz BJ, Aronson B, Cuykendall T, MacKay M, Dhingra P, Wong EWP, Chi P, Apostolou E, Sanjana NE, Khurana E. Identification of Cancer Drivers at CTCF Insulators in 1,962 Whole Genomes. Cell Syst 2019; 8:446-455.e8. [PMID: 31078526 PMCID: PMC6917527 DOI: 10.1016/j.cels.2019.04.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 11/20/2018] [Accepted: 04/02/2019] [Indexed: 12/15/2022]
Abstract
Recent studies have shown that mutations at non-coding elements, such as promoters and enhancers, can act as cancer drivers. However, an important class of non-coding elements, namely CTCF insulators, has been overlooked in the previous driver analyses. We used insulator annotations from CTCF and cohesin ChIA-PET and analyzed somatic mutations in 1,962 whole genomes from 21 cancer types. Using the heterogeneous patterns of transcription-factor-motif disruption, functional impact, and recurrence of mutations, we developed a computational method that revealed 21 insulators showing signals of positive selection. In particular, mutations in an insulator in multiple cancer types, including 16% of melanoma samples, are associated with TGFB1 up-regulation. Using CRISPR-Cas9, we find that alterations at two of the most frequently mutated regions in this insulator increase cell growth by 40%-50%, supporting the role of this boundary element as a cancer driver. Thus, our study reveals several CTCF insulators as putative cancer drivers.
Collapse
Affiliation(s)
- Eric Minwei Liu
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Alexander Martinez-Fundichely
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Bianca Jay Diaz
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Boaz Aronson
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Tawny Cuykendall
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Matthew MacKay
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Priyanka Dhingra
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Elissa W P Wong
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ping Chi
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Effie Apostolou
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Neville E Sanjana
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Ekta Khurana
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
16
|
Cultrera NGM, Sarri V, Lucentini L, Ceccarelli M, Alagna F, Mariotti R, Mousavi S, Ruiz CG, Baldoni L. High Levels of Variation Within Gene Sequences of Olea europaea L. FRONTIERS IN PLANT SCIENCE 2019; 9:1932. [PMID: 30671076 PMCID: PMC6331486 DOI: 10.3389/fpls.2018.01932] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 12/12/2018] [Indexed: 05/08/2023]
Abstract
Gene sequence variation in cultivated olive (Olea europaea L. subsp. europaea var. europaea), the most important oil tree crop of the Mediterranean basin, has been poorly evaluated up to now. A deep sequence analysis of fragments of four genes, OeACP1, OeACP2, OeLUS and OeSUT1, in 90 cultivars, revealed a wide range of polymorphisms along all recognized allele forms and unexpected allele frequencies and genotype combinations. High linkage values among most polymorphisms were recorded within each gene fragment. The great sequence variability corresponded to a low number of alleles and, surprisingly, to a small fraction of genotype combinations. The distribution, frequency, and combination of the different alleles at each locus is possibly due to natural and human pressures, such as selection, ancestrality, or fitness. Phylogenetic analyses of allele sequences showed distant and complex patterns of relationships among cultivated olives, intermixed with other related forms, highlighting an evolutionary connection between olive cultivars and the O. europaea subspecies cuspidata and cerasiformis. This study demonstrates how a detailed and complete sequence analysis of a few gene portions and a thorough genotyping on a representative set of cultivars can clarify important issues related to sequence polymorphisms, reconstructing the phylogeny of alleles, as well as the genotype combinations. The identification of regions representing blocks of recombination could reveal polymorphisms that represent putatively functional markers. Indeed, specific mutations found on the analyzed OeACP1 and OeACP2 fragments seem to be correlated to the fruit weight.
Collapse
Affiliation(s)
- Nicolò G. M. Cultrera
- Institute of Biosciences and Bioresources, National Research Council, Perugia, Italy
| | - Vania Sarri
- Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
| | - Livia Lucentini
- Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
| | - Marilena Ceccarelli
- Department of Chemistry, Biology and Biotechnology, University of Perugia, Perugia, Italy
| | - Fiammetta Alagna
- ENEA Italian National Agency for New Technologies Energy and Sustainable Economic Development, Trisaia Research Center, Rotondella, Italy
| | - Roberto Mariotti
- Institute of Biosciences and Bioresources, National Research Council, Perugia, Italy
| | - Soraya Mousavi
- Institute of Biosciences and Bioresources, National Research Council, Perugia, Italy
| | | | - Luciana Baldoni
- Institute of Biosciences and Bioresources, National Research Council, Perugia, Italy
| |
Collapse
|
17
|
Rigau M, Juan D, Valencia A, Rico D. Intronic CNVs and gene expression variation in human populations. PLoS Genet 2019; 15:e1007902. [PMID: 30677042 PMCID: PMC6345438 DOI: 10.1371/journal.pgen.1007902] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/17/2018] [Indexed: 11/19/2022] Open
Abstract
Introns can be extraordinarily large and they account for the majority of the DNA sequence in human genes. However, little is known about their population patterns of structural variation and their functional implication. By combining the most extensive maps of CNVs in human populations, we have found that intronic losses are the most frequent copy number variants (CNVs) in protein-coding genes in human, with 12,986 intronic deletions, affecting 4,147 genes (including 1,154 essential genes and 1,638 disease-related genes). This intronic length variation results in dozens of genes showing extreme population variability in size, with 40 genes with 10 or more different sizes and up to 150 allelic sizes. Intronic losses are frequent in evolutionarily ancient genes that are highly conserved at the protein sequence level. This result contrasts with losses overlapping exons, which are observed less often than expected by chance and almost exclusively affect primate-specific genes. An integrated analysis of CNVs and RNA-seq data showed that intronic loss can be associated with significant differences in gene expression levels in the population (CNV-eQTLs). These intronic CNV-eQTLs regions are enriched for intronic enhancers and can be associated with expression differences of other genes showing long distance intron-promoter 3D interactions. Our data suggests that intronic structural variation of protein-coding genes makes an important contribution to the variability of gene expression and splicing in human populations.
Collapse
Affiliation(s)
- Maria Rigau
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - David Juan
- Institut de Biologia Evolutiva, Consejo Superior de Investigaciones Científicas–Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona, Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Daniel Rico
- Institute of Cellular Medicine, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
18
|
Wilcox JJS, Kerschner A, Hollocher H. WITHDRAWN: Indel-informed bayesian analysis suggests cryptic divisions between Plasmodium knowlesi of humans and long-tailed macaques (Macaca fascicularis) in Malaysian Borneo. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2018:S1567-1348(18)30557-4. [PMID: 30481580 DOI: 10.1016/j.meegid.2018.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Revised: 10/24/2018] [Accepted: 11/23/2018] [Indexed: 06/09/2023]
Abstract
This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at https://www.elsevier.com/about/our-business/policies/article-withdrawal.
Collapse
Affiliation(s)
- Justin J S Wilcox
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-5688, USA.
| | - Abigail Kerschner
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-5688, USA
| | - Hope Hollocher
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556-5688, USA
| |
Collapse
|
19
|
Shen Y, Zhang J, Fu Z, Zhang B, Chen M, Ling X, Zou X. Gene microarray analysis of the circular RNAs expression profile in human gastric cancer. Oncol Lett 2018; 15:9965-9972. [PMID: 29928369 PMCID: PMC6004662 DOI: 10.3892/ol.2018.8590] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Accepted: 04/03/2018] [Indexed: 12/12/2022] Open
Abstract
Human gastric cancer is a common malignant neoplasm of the digestive system and represents a threat to human health worldwide. The mechanisms underlying gastric cancer germination and development are not yet fully understood. Circular RNAs (circRNAs) serve crucial roles in various physiological and pathological processes, particularly cancer. However, few studies have focused on the mechanisms involving circRNAs in gastric cancer. Therefore the present study set out to identify the differentially expressed circRNAs in gastric cancer. Three specimens of gastric cancer and normal gastric tissue were selected and circRNA microarray analysis was performed to detect the differentially expressed circRNAs. The changes in circRNAs were confirmed by reverse transcription-quantitative polymerase chain reaction analysis. A total of 347 upregulated and 603 downregulated circRNAs (fold-change, >2.0) were identified in gastric cancer compared with the normal gastric tissue. A total of 20 selected circRNAs were dysregulated during gastric cancer, which suggests their potential role in gastric cancer. The present study identified circRNAs in the expression profile of human gastric cancer that were potentially involved in the underlying molecular mechanisms of its development.
Collapse
Affiliation(s)
- Yonghua Shen
- Department of Gastroenterology, The Affiliated Drum Tower Hospital of Nanjing University, Medical School, Nanjing, Jiangsu 210008, P.R. China
| | - Juanjuan Zhang
- Department of Reproduction, Affiliated Nanjing Maternal and Child Health Hospital, Nanjing Medical University, Nanjing, Jiangsu 210004, P.R. China
| | - Ziyi Fu
- Nanjing Maternal and Child Health Medical Institute, Affiliated Nanjing Maternal and Child Health Hospital, Nanjing Medical University, Nanjing, Jiangsu 210004, P.R. China
| | - Bin Zhang
- Department of Gastroenterology, The Affiliated Drum Tower Hospital of Nanjing University, Medical School, Nanjing, Jiangsu 210008, P.R. China
| | - Min Chen
- Department of Gastroenterology, The Affiliated Drum Tower Hospital of Nanjing University, Medical School, Nanjing, Jiangsu 210008, P.R. China
| | - Xiufeng Ling
- Department of Reproduction, Affiliated Nanjing Maternal and Child Health Hospital, Nanjing Medical University, Nanjing, Jiangsu 210004, P.R. China
- Correspondence to: Professor Xiufeng Ling, Department of Reproduction, Affiliated Nanjing Maternal and Child Health Hospital, Nanjing Medical University, 123 Mochou Road, Nanjing, Jiangsu 210004, P.R. China, E-mail:
| | - Xiaoping Zou
- Department of Gastroenterology, The Affiliated Drum Tower Hospital of Nanjing University, Medical School, Nanjing, Jiangsu 210008, P.R. China
- Professor Xiaoping Zou, Department of Gastroenterology, The Affiliated Drum Tower Hospital of Nanjing University, Medical School, 321 Zhongshan Road, Nanjing, Jiangsu 210008, P.R. China, E-mail:
| |
Collapse
|
20
|
Naidoo T, Sjödin P, Schlebusch C, Jakobsson M. Patterns of variation in cis-regulatory regions: examining evidence of purifying selection. BMC Genomics 2018; 19:95. [PMID: 29373957 PMCID: PMC5787233 DOI: 10.1186/s12864-017-4422-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 12/27/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With only 2 % of the human genome consisting of protein coding genes, functionality across the rest of the genome has been the subject of much debate. This has gained further impetus in recent years due to a rapidly growing catalogue of genomic elements, based primarily on biochemical signatures (e.g. the ENCODE project). While the assessment of functionality is a complex task, the presence of selection acting on a genomic region is a strong indicator of importance. In this study, we apply population genetic methods to investigate signals overlaying several classes of regulatory elements. RESULTS We disentangle signals of purifying selection acting directly on regulatory elements from the confounding factors of demography and purifying selection linked to e.g. nearby protein coding regions. We confirm the importance of regulatory regions proximal to coding sequence, while also finding differential levels of selection at distal regions. We note differences in purifying selection among transcription factor families. Signals of constraint at some genomic classes were also strongly dependent on their physical location relative to coding sequence. In addition, levels of selection efficacy across genomic classes differed between African and non-African populations. CONCLUSIONS In order to assign a valid signal of selection to a particular class of genomic sequence, we show that it is crucial to isolate the signal by accounting for the effects of demography and linked-purifying selection. Our study highlights the intricate interplay of factors affecting signals of selection on functional elements.
Collapse
Affiliation(s)
- Thijessen Naidoo
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Per Sjödin
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Carina Schlebusch
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden
| | - Mattias Jakobsson
- Department of Organismal Biology, Uppsala University, Uppsala, Sweden. .,Science for Life Lab, Uppsala, Sweden.
| |
Collapse
|
21
|
Liao SM, Zheng W, Zhu J, Lewis CA, Delgado O, Crowley MA, Buchanan NM, Jaffee BD, Dryja TP. Specific correlation between the major chromosome 10q26 haplotype conferring risk for age-related macular degeneration and the expression of HTRA1. Mol Vis 2017; 23:318-333. [PMID: 28659708 PMCID: PMC5479693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 06/12/2017] [Indexed: 11/30/2022] Open
Abstract
PURPOSE A region within chromosome 10q26 has a set of single nucleotide polymorphisms (SNPs) that define a haplotype that confers high risk for age-related macular degeneration (AMD). We used a bioinformatics approach to search for genes in this region that may be responsible for risk for AMD by assessing levels of gene expression in individuals carrying different haplotypes and by searching for open chromatin regions in the retinal pigment epithelium (RPE) that might include one or more of the SNPs. METHODS We surveyed the PubMed and the 1000 Genomes databases to find all common (minor allele frequency > 0.01) SNPs in 10q26 strongly associated with AMD. We used the HaploReg and LDlink databases to find sets of SNPs with alleles in linkage disequilibrium and used the Genotype-Tissue Expression (GTEx) database to search for correlations between genotypes at individual SNPs and the relative level of expression of the genes. We also accessed Encyclopedia of DNA Elements (ENCODE) to find segments of open chromatin in the region with the AMD-associated SNPs. Predicted transcription factor binding motifs were identified using HOMER, PROMO, and RegulomeDB software programs. RESULTS There are 34 polymorphisms within a 30-kb region that are in strong linkage disequilibrium (r2>0.8) with the reference SNP rs10490924 previously associated with risk for AMD. The expression of three genes in this region, PLEKHA1, ARMS2, and HTRA1 varies between people who have the low-AMD-risk haplotype compared with those with the high-AMD-risk haplotype. For PLEKHA1, 44 tissues have an expression pattern with the high-AMD-risk haplotype associated with low expression (rs10490924 effect size -0.43, p = 3.8 x 10-5 in ovary). With regard to ARMS2, the variation is most pronounced in testes: homozygotes with the high-AMD-risk haplotype express ARMS2 at lower levels than homozygotes with the low-AMD-risk haplotype; expression in heterozygotes falls in between (rs10490924 effect size -0.79, p = 7.5 x 10-24). For HTRA1, the expression pattern is the opposite; the high-AMD-risk haplotype has higher levels of expression in 27 tissues (rs10490924 effect size 0.40, p = 1.5 × 10-7 in testes). None of the other 22 genes within one megabase of rs10490924, or any gene in the entire genome, have mRNA expression levels that correlate with the high-AMD-risk haplotype. More than 100 other SNPs in the 10q26 region affect the expression of PLEKHA1 and ARMS2 but not that of HTRA1; none of these SNPs affects the risk for AMD according to published genome-wide association studies (GWASs). Two of the AMD-risk SNPs (rs36212732 and rs36212733) affect transcription factor binding sites in proximity to a DNase I hypersensitive region (i.e., a region of open chromatin) in RPE cells. CONCLUSIONS SNPs in chromosome 10q26 that influence the expression of only PLEKHA1 or ARMS2 are not associated with risk for AMD, while most SNPs that influence the expression of HTRA1 are associated with risk for AMD. Two of the AMD-risk SNPs affect transcription factor binding sites that may control expression of one of the linked genes in the RPE. These findings suggest that the variation in the risk for AMD associated with chromosome 10q26 is likely due to variation in HTRA1 expression. Modulating HTRA1 activity might be a potential therapy for AMD.
Collapse
Affiliation(s)
- Sha-Mei Liao
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Wei Zheng
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Jiang Zhu
- Scientific Data Analysis, NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Casey A. Lewis
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Omar Delgado
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Maura A. Crowley
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Natasha M. Buchanan
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Bruce D. Jaffee
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| | - Thaddeus P. Dryja
- Department of Ophthalmology; NIBR Informatics, Novartis Institutes for Biomedical Research, Cambridge, MA
| |
Collapse
|
22
|
Seplyarskiy VB, Andrianova MA, Bazykin GA. APOBEC3A/B-induced mutagenesis is responsible for 20% of heritable mutations in the TpCpW context. Genome Res 2016; 27:175-184. [PMID: 27940951 PMCID: PMC5287224 DOI: 10.1101/gr.210336.116] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 12/01/2016] [Indexed: 12/18/2022]
Abstract
APOBEC3A/B cytidine deaminase is responsible for the majority of cancerous mutations in a large fraction of cancer samples. However, its role in heritable mutagenesis remains very poorly understood. Recent studies have demonstrated that both in yeast and in human cancerous cells, most APOBEC3A/B-induced mutations occur on the lagging strand during replication and on the nontemplate strand of transcribed regions. Here, we use data on rare human polymorphisms, interspecies divergence, and de novo mutations to study germline mutagenesis and to analyze mutations at nucleotide contexts prone to attack by APOBEC3A/B. We show that such mutations occur preferentially on the lagging strand and on nontemplate strands of transcribed regions. Moreover, we demonstrate that APOBEC3A/B-like mutations tend to produce strand-coordinated clusters, which are also biased toward the lagging strand. Finally, we show that the mutation rate is increased 3' of C→G mutations to a greater extent than 3' of C→T mutations, suggesting pervasive trans-lesion bypass of the APOBEC3A/B-induced damage. Our study demonstrates that 20% of C→T and C→G mutations in the TpCpW context-where W denotes A or T, segregating as polymorphisms in human population-or 1.4% of all heritable mutations are attributable to APOBEC3A/B activity.
Collapse
Affiliation(s)
- Vladimir B Seplyarskiy
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia
| | - Maria A Andrianova
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia
| | - Georgii A Bazykin
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), Moscow 127994, Russia.,Pirogov Russian National Research Medical University, Moscow 117997, Russia.,Lomonosov Moscow State University, Moscow 119234, Russia.,Skolkovo Institute of Science and Technology, Skolkovo 143026, Russia
| |
Collapse
|
23
|
The roles of RNA processing in translating genotype to phenotype. NATURE REVIEWS. MOLECULAR CELL BIOLOGY 2016. [PMID: 27847391 DOI: 10.1038/nrm.2016.139.] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A goal of human genetics studies is to determine the mechanisms by which genetic variation produces phenotypic differences that affect human health. Efforts in this respect have previously focused on genetic variants that affect mRNA levels by altering epigenetic and transcriptional regulation. Recent studies show that genetic variants that affect RNA processing are at least equally as common as, and are largely independent from, those variants that affect transcription. We highlight the impact of genetic variation on pre-mRNA splicing and polyadenylation, and on the stability, translation and structure of mRNAs as mechanisms that produce phenotypic traits. These results emphasize the importance of including RNA processing signals in analyses to identify functional variants.
Collapse
|
24
|
Manning KS, Cooper TA. The roles of RNA processing in translating genotype to phenotype. Nat Rev Mol Cell Biol 2016; 18:102-114. [PMID: 27847391 DOI: 10.1038/nrm.2016.139] [Citation(s) in RCA: 139] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A goal of human genetics studies is to determine the mechanisms by which genetic variation produces phenotypic differences that affect human health. Efforts in this respect have previously focused on genetic variants that affect mRNA levels by altering epigenetic and transcriptional regulation. Recent studies show that genetic variants that affect RNA processing are at least equally as common as, and are largely independent from, those variants that affect transcription. We highlight the impact of genetic variation on pre-mRNA splicing and polyadenylation, and on the stability, translation and structure of mRNAs as mechanisms that produce phenotypic traits. These results emphasize the importance of including RNA processing signals in analyses to identify functional variants.
Collapse
Affiliation(s)
- Kassie S Manning
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA.,Integrative Molecular and Biomedical Sciences Program, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Thomas A Cooper
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas 77030, USA.,Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA.,Integrative Molecular and Biomedical Sciences Program, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
25
|
Kaiser VB, Taylor MS, Semple CA. Mutational Biases Drive Elevated Rates of Substitution at Regulatory Sites across Cancer Types. PLoS Genet 2016; 12:e1006207. [PMID: 27490693 PMCID: PMC4973979 DOI: 10.1371/journal.pgen.1006207] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 06/29/2016] [Indexed: 02/07/2023] Open
Abstract
Disruption of gene regulation is known to play major roles in carcinogenesis and tumour progression. Here, we comprehensively characterize the mutational profiles of diverse transcription factor binding sites (TFBSs) across 1,574 completely sequenced cancer genomes encompassing 11 tumour types. We assess the relative rates and impact of the mutational burden at the binding sites of 81 transcription factors (TFs), by comparing the abundance and patterns of single base substitutions within putatively functional binding sites to control sites with matched sequence composition. There is a strong (1.43-fold) and significant excess of mutations at functional binding sites across TFs, and the mutations that accumulate in cancers are typically more disruptive than variants tolerated in extant human populations at the same sites. CTCF binding sites suffer an exceptionally high mutational load in cancer (3.31-fold excess) relative to control sites, and we demonstrate for the first time that this effect is seen in essentially all cancer types with sufficient data. The sub-set of CTCF sites involved in higher order chromatin structures has the highest mutational burden, suggesting a widespread breakdown of chromatin organization. However, we find no evidence for selection driving these distinctive patterns of mutation. The mutational load at CTCF-binding sites is substantially determined by replication timing and the mutational signature of the tumor in question, suggesting that selectively neutral processes underlie the unusual mutation patterns. Pervasive hyper-mutation within transcription factor binding sites rewires the regulatory landscape of the cancer genome, but it is dominated by mutational processes rather than selection. Regulatory regions of the genome are important players in cancer initiation and progression. Here, we study the patterns of mutations accumulating at short DNA segments bound by regulatory proteins (transcription factor binding sites) across many cancer types and in the human population. We find strikingly high rates of mutation at active regulatory sites across different cancers, relative to matched control sequences. This excess of mutations disrupts the binding sites of particular factors, such as CTCF, and is likely to be driven by selectively neutral processes, such as the replication timing of the genomic regions concerned. However, binding sites involved in regulatory chromatin structures suffer particularly high levels of mutation, suggesting the frequent disruption of such structures in cancers.
Collapse
Affiliation(s)
- Vera B Kaiser
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Martin S Taylor
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| | - Colin A Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom
| |
Collapse
|
26
|
Functional Implications of Human-Specific Changes in Great Ape microRNAs. PLoS One 2016; 11:e0154194. [PMID: 27105073 PMCID: PMC4841587 DOI: 10.1371/journal.pone.0154194] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 04/10/2016] [Indexed: 12/29/2022] Open
Abstract
microRNAs are crucial post-transcriptional regulators of gene expression involved in a wide range of biological processes. Although microRNAs are highly conserved among species, the functional implications of existing lineage-specific changes and their role in determining differences between humans and other great apes have not been specifically addressed. We analyzed the recent evolutionary history of 1,595 human microRNAs by looking at their intra- and inter-species variation in great apes using high-coverage sequenced genomes of 82 individuals including gorillas, orangutans, bonobos, chimpanzees and humans. We explored the strength of purifying selection among microRNA regions and found that the seed and mature regions are under similar and stronger constraint than the precursor region. We further constructed a comprehensive catalogue of microRNA species-specific nucleotide substitutions among great apes and, for the first time, investigated the biological relevance that human-specific changes in microRNAs may have had in great ape evolution. Expression and functional analyses of four microRNAs (miR-299-3p, miR-503-3p, miR-508-3p and miR-541-3p) revealed that lineage-specific nucleotide substitutions and changes in the length of these microRNAs alter their expression as well as the repertoires of target genes and regulatory networks. We suggest that the studied molecular changes could have modified crucial microRNA functions shaping phenotypes that, ultimately, became human-specific. Our work provides a frame to study the impact that regulatory changes may have in the recent evolution of our species.
Collapse
|
27
|
Pang E, Wu X, Lin K. Different evolutionary patterns of SNPs between domains and unassigned regions in human protein-coding sequences. Mol Genet Genomics 2016; 291:1127-36. [PMID: 26833483 PMCID: PMC4875946 DOI: 10.1007/s00438-016-1170-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/18/2016] [Indexed: 11/30/2022]
Abstract
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
Collapse
Affiliation(s)
- Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| | - Xiaomei Wu
- College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, 310036, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
28
|
Abstract
The sequencing of the human genome and technological advances in DNA sequencing have led to a revolution with respect to DNA sequencing and its potential to diagnose genetic disorders. However, requests for open access to genomic data must be balanced against the guiding principles of the Common Rule for human subject research. Unfortunately, the risks to patients involved in genomic studies are still evolving and as such may not be clear to learned and well-intentioned scientists. Central to this issue are the strategies that enable human participants in such studies to remain anonymous, or de-identified. The wealth of genomic data on the Internet in genomic data repositories and other databases has enabled de-identified data to be broken and research subjects to be identified. The security of de-identification neglects the fact that DNA itself is an identifying element. Therefore, it is questionable whether data security standards can ever truly protect the identity of a patient, under the current conditions or in the future. As Big Data methodologies advance, additional sources of data may enable the re-identification of patients enrolled in next-generation sequencing (NGS) studies. As such, it is time to re-evaluate the risks of sharing genomic data and establish new guidelines for good practices. In this commentary, I address the challenges facing federally funded investigators who need to strike a balance between compliance with federal (US) rules for human subjects and the recent requirement for open access/sharing of data from National Institute for Health (NIH)-funded studies involving human subjects.
Collapse
Affiliation(s)
- R Meller
- Translational Stroke Program, Neuroscience Institute, Morehouse School of Medicine, Atlanta, USA
| |
Collapse
|
29
|
Fu Y, Liu Z, Lou S, Bedford J, Mu XJ, Yip KY, Khurana E, Gerstein M. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol 2015; 15:480. [PMID: 25273974 PMCID: PMC4203974 DOI: 10.1186/s13059-014-0480-5] [Citation(s) in RCA: 226] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Indexed: 12/15/2022] Open
Abstract
Identification of noncoding drivers from thousands of somatic alterations in a typical tumor is a difficult and unsolved problem. We report a computational framework, FunSeq2, to annotate and prioritize these mutations. The framework combines an adjustable data context integrating large-scale genomics and cancer resources with a streamlined variant-prioritization pipeline. The pipeline has a weighted scoring system combining: inter- and intra-species conservation; loss- and gain-of-function events for transcription-factor binding; enhancer-gene linkages and network centrality; and per-element recurrence across samples. We further highlight putative drivers with information specific to a particular sample, such as differential expression. FunSeq2 is available from funseq2.gersteinlab.org.
Collapse
Affiliation(s)
- Yao Fu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Lyon KF, Strong CL, Schooler SG, Young RJ, Roy N, Ozar B, Bachmeier M, Rajasekaran S, Schiller MR. Natural variability of minimotifs in 1092 people indicates that minimotifs are targets of evolution. Nucleic Acids Res 2015; 43:6399-412. [PMID: 26068475 PMCID: PMC4513861 DOI: 10.1093/nar/gkv580] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Revised: 04/17/2015] [Accepted: 05/21/2015] [Indexed: 01/05/2023] Open
Abstract
Since the function of a short contiguous peptide minimotif can be introduced or eliminated by a single point mutation, these functional elements may be a source of human variation and a target of selection. We analyzed the variability of ∼300 000 minimotifs in 1092 human genomes from the 1000 Genomes Project. Most minimotifs have been purified by selection, with a 94% invariance, which supports important functional roles for minimotifs. Minimotifs are generally under negative selection, possessing high genomic evolutionary rate profiling (GERP) and sitewise likelihood-ratio (SLR) scores. Some are subject to neutral drift or positive selection, similar to coding regions. Most SNPs in minimotif were common variants, but with minor allele frequencies generally <10%. This was supported by low substation rates and few newly derived minimotifs. Several minimotif alleles showed different intercontinental and regional geographic distributions, strongly suggesting a role for minimotifs in adaptive evolution. We also note that 4% of PTM minimotif sites in histone tails were common variants, which has the potential to differentially affect DNA packaging among individuals. In conclusion, minimotifs are a source of functional genetic variation in the human population; thus, they are likely to be an important target of selection and evolution.
Collapse
Affiliation(s)
- Kenneth F Lyon
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Christy L Strong
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Steve G Schooler
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Richard J Young
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, USA
| | - Nervik Roy
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Brittany Ozar
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Mark Bachmeier
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, USA
| | - Martin R Schiller
- Nevada Institute of Personalized Medicine and School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Parkway, Las Vegas, NV 89154-4004, USA
| |
Collapse
|
31
|
|
32
|
Natural Selection and Functional Potentials of Human Noncoding Elements Revealed by Analysis of Next Generation Sequencing Data. PLoS One 2015; 10:e0129023. [PMID: 26053627 PMCID: PMC4460046 DOI: 10.1371/journal.pone.0129023] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Accepted: 05/04/2015] [Indexed: 11/19/2022] Open
Abstract
Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution.
Collapse
|
33
|
Human genetic variation and its effect on miRNA biogenesis, activity and function. Biochem Soc Trans 2015; 42:1184-9. [PMID: 25110023 DOI: 10.1042/bst20140055] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
miRNAs are small non-coding regulators of gene expression that are estimated to regulate over 60% of all human genes. Each miRNA can target multiple mRNA targets and as such, miRNAs are responsible for some of the 'fine tuning' of gene expression and are implicated in regulation of all cellular processes. miRNAs bind to target genes by sequence complementarity, resulting in target degradation or translational blocking and usually a reduction in target gene expression. Like mRNA, miRNAs are transcribed from genomic DNA and are processed in several steps that are heavily reliant on correct secondary and tertiary structure. Secondary structure is determined by RNA sequence, which is in turn determined by the sequence of the genome. The human genome, however, like most eukaryotes is variable. Large numbers of SNPs (single nucleotide polymorphisms), small insertions and deletions (indels) and CNVs (copy number variants) have been described in our genome. Should this genetic variation occur in regions critical for the correct secondary structure or target binding, it may interfere with normal gene regulation and cause disease. In this review, we outline the consequences of genetic variation involving different aspects of miRNA biosynthesis, processing and regulation, with selected examples of incidences when this has potential to affect human disease.
Collapse
|
34
|
Thomas D, Finan C, Newport MJ, Jones S. DNA entropy reveals a significant difference in complexity between housekeeping and tissue specific gene promoters. Comput Biol Chem 2015; 58:19-24. [PMID: 25988219 DOI: 10.1016/j.compbiolchem.2015.05.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 05/01/2015] [Accepted: 05/01/2015] [Indexed: 10/23/2022]
Abstract
BACKGROUND The complexity of DNA can be quantified using estimates of entropy. Variation in DNA complexity is expected between the promoters of genes with different transcriptional mechanisms; namely housekeeping (HK) and tissue specific (TS). The former are transcribed constitutively to maintain general cellular functions, and the latter are transcribed in restricted tissue and cells types for specific molecular events. It is known that promoter features in the human genome are related to tissue specificity, but this has been difficult to quantify on a genomic scale. If entropy effectively quantifies DNA complexity, calculating the entropies of HK and TS gene promoters as profiles may reveal significant differences. RESULTS Entropy profiles were calculated for a total dataset of 12,003 human gene promoters and for 501 housekeeping (HK) and 587 tissue specific (TS) human gene promoters. The mean profiles show the TS promoters have a significantly lower entropy (p<2.2e-16) than HK gene promoters. The entropy distributions for the 3 datasets show that promoter entropies could be used to identify novel HK genes. CONCLUSION Functional features comprise DNA sequence patterns that are non-random and hence they have lower entropies. The lower entropy of TS gene promoters can be explained by a higher density of positive and negative regulatory elements, required for genes with complex spatial and temporary expression.
Collapse
Affiliation(s)
- David Thomas
- Brighton and Sussex Medical School, University of Sussex, Brighton BN1 9PX, UK
| | - Chris Finan
- Brighton and Sussex Medical School, University of Sussex, Brighton BN1 9PX, UK
| | - Melanie J Newport
- Brighton and Sussex Medical School, University of Sussex, Brighton BN1 9PX, UK
| | - Susan Jones
- The James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| |
Collapse
|
35
|
Wei Y, Zhang T, Wang YP, Schatten H, Sun QY. Polar bodies in assisted reproductive technology: current progress and future perspectives. Biol Reprod 2014; 92:19. [PMID: 25472922 DOI: 10.1095/biolreprod.114.125575] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022] Open
Abstract
During meiotic cell-cycle progression, unequal divisions take place, resulting in a large oocyte and two diminutive polar bodies. The first polar body contains a subset of bivalent chromosomes, whereas the second polar body contains a haploid set of chromatids. One unique feature of the female gamete is that the polar bodies can provide beneficial information about the genetic background of the oocyte without potentially destroying it. Therefore, polar body biopsies have been applied in preimplantation genetic diagnosis to detect chromosomal or genetic abnormalities that might be inherited by the offspring. Besides the traditional use in preimplantation diagnosis, recent findings suggest additional important roles for polar bodies in assisted reproductive technology. In this paper, we review the new roles of polar bodies in assisted reproductive technology, mainly focusing on single-cell sequencing of the polar body genome to deduce the genomic information of its sibling oocyte and on polar body transfer to prevent the transmission of mtDNA-associated diseases. We also discuss additional potential roles for polar bodies and related key questions in human reproductive health. We believe that further exploration of new roles for polar bodies will contribute to a better understanding of reproductive health and that polar body manipulation and diagnosis will allow production of a greater number of healthy babies.
Collapse
Affiliation(s)
- Yanchang Wei
- State Key Laboratory of Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Teng Zhang
- State Key Laboratory of Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Ya-Peng Wang
- State Key Laboratory of Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Heide Schatten
- Department of Veterinary Pathobiology, University of Missouri, Columbia, Missouri
| | - Qing-Yuan Sun
- State Key Laboratory of Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
36
|
Siepel A, Arbiza L. Cis-regulatory elements and human evolution. Curr Opin Genet Dev 2014; 29:81-9. [PMID: 25218861 PMCID: PMC4258466 DOI: 10.1016/j.gde.2014.08.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 08/17/2014] [Accepted: 08/23/2014] [Indexed: 11/20/2022]
Abstract
Modification of gene regulation has long been considered an important force in human evolution, particularly through changes to cis-regulatory elements (CREs) that function in transcriptional regulation. For decades, however, the study of cis-regulatory evolution was severely limited by the available data. New data sets describing the locations of CREs and genetic variation within and between species have now made it possible to study CRE evolution much more directly on a genome-wide scale. Here, we review recent research on the evolution of CREs in humans based on large-scale genomic data sets. We consider inferences based on primate divergence, human polymorphism, and combinations of divergence and polymorphism. We then consider 'new frontiers' in this field stemming from recent research on transcriptional regulation.
Collapse
Affiliation(s)
- Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | - Leonardo Arbiza
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
37
|
Gineau L, Luisi P, Castelli EC, Milet J, Courtin D, Cagnin N, Patillon B, Laayouni H, Moreau P, Donadi EA, Garcia A, Sabbagh A. Balancing immunity and tolerance: genetic footprint of natural selection in the transcriptional regulatory region of HLA-G. Genes Immun 2014; 16:57-70. [DOI: 10.1038/gene.2014.63] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Revised: 10/04/2014] [Accepted: 10/06/2014] [Indexed: 12/28/2022]
|
38
|
Approximation to the distribution of fitness effects across functional categories in human segregating polymorphisms. PLoS Genet 2014; 10:e1004697. [PMID: 25375159 PMCID: PMC4222666 DOI: 10.1371/journal.pgen.1004697] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 08/22/2014] [Indexed: 02/03/2023] Open
Abstract
Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral or not strongly selected, and we do not rely on fitting the DFE of all new nonsynonymous mutations to a single probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this and other conservation scores to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model on SNP data. Our method serves to approximate the deleterious DFE of mutations that are segregating, regardless of their genomic consequence. We can then compare the proportion of mutations that are negatively selected or neutral across various categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly peaked at neutrality, while the distribution of nonsynonymous polymorphisms has a second peak at [Formula: see text]. Other types of polymorphisms have shapes that fall roughly in between these two. We find that transcriptional start sites, strong CTCF-enriched elements and enhancers are the regulatory categories with the largest proportion of deleterious polymorphisms.
Collapse
|
39
|
Rawlings-Goss RA, Campbell MC, Tishkoff SA. Global population-specific variation in miRNA associated with cancer risk and clinical biomarkers. BMC Med Genomics 2014; 7:53. [PMID: 25169894 PMCID: PMC4159108 DOI: 10.1186/1755-8794-7-53] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 08/12/2014] [Indexed: 12/30/2022] Open
Abstract
Background MiRNA expression profiling is being actively investigated as a clinical biomarker and diagnostic tool to detect multiple cancer types and stages as well as other complex diseases. Initial investigations, however, have not comprehensively taken into account genetic variability affecting miRNA expression and/or function in populations of different ethnic backgrounds. Therefore, more complete surveys of miRNA genetic variability are needed to assess global patterns of miRNA variation within and between diverse human populations and their effect on clinically relevant miRNA genes. Methods Genetic variation in 1524 miRNA genes was examined using whole genome sequencing (60x coverage) in a panel of 69 unrelated individuals from 14 global populations, including European, Asian and African populations. Results We identified 33 previously undescribed miRNA variants, and 31 miRNA containing variants that are globally population-differentiated in frequency between African and non-African populations (PD-miRNA). The top 1% of PD-miRNA were significantly enriched for regulation of genes involved in glucose/insulin metabolism and cell division (p < 10−7), most significantly the mitosis pathway, which is strongly linked to cancer onset. Overall, we identify 7 PD-miRNAs that are currently implicated as cancer biomarkers or diagnostics: hsa-mir-202, hsa-mir-423, hsa-mir-196a-2, hsa-mir-520h, hsa-mir-647, hsa-mir-943, and hsa-mir-1908. Notably, hsa-mir-202, a potential breast cancer biomarker, was found to show significantly high allele frequency differentiation at SNP rs12355840, which is known to affect miRNA expression levels in vivo and subsequently breast cancer mortality. Conclusion MiRNA expression profiles represent a promising new category of disease biomarkers. However, population specific genetic variation can affect the prevalence and baseline expression of these miRNAs in diverse populations. Consequently, miRNA genetic and expression level variation among ethnic groups may be contributing in part to health disparities observed in multiple forms of cancer, specifically breast cancer, and will be an essential consideration when assessing the utility of miRNA biomarkers for the clinic.
Collapse
|
40
|
De Silva DR, Nichols R, Elgar G. Purifying selection in deeply conserved human enhancers is more consistent than in coding sequences. PLoS One 2014; 9:e103357. [PMID: 25062004 PMCID: PMC4111549 DOI: 10.1371/journal.pone.0103357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 07/01/2014] [Indexed: 12/30/2022] Open
Abstract
Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans--higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.
Collapse
Affiliation(s)
- Dilrini R. De Silva
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Richard Nichols
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
| | - Greg Elgar
- Systems Biology, MRC National Institute for Medical Research, Mill Hill, London, United Kingdom
| |
Collapse
|
41
|
Abstract
Evolutionary conservation has been an accurate predictor of functional elements across the first decade of metazoan genomics. More recently, there has been a move to define functional elements instead from biochemical annotations. Evolutionary methods are, however, more comprehensive than biochemical approaches can be and can assess quantitatively, especially for subtle effects, how biologically important--how injurious after mutation--different types of elements are. Evolutionary methods are thus critical for understanding the large fraction (up to 10%) of the human genome that does not encode proteins and yet might convey function. These methods can also capture the ephemeral nature of much noncoding functional sequence, with large numbers of functional elements having been gained and lost rapidly along each mammalian lineage. Here, we review how different strengths of purifying selection have impacted on protein-coding and non-protein-coding loci and on transcription factor binding sites in mammalian and fruit fly genomes.
Collapse
Affiliation(s)
- Wilfried Haerty
- MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom; ,
| | | |
Collapse
|
42
|
Thomas LF, Sætrom P. Circular RNAs are depleted of polymorphisms at microRNA binding sites. Bioinformatics 2014; 30:2243-6. [PMID: 24764460 PMCID: PMC4207428 DOI: 10.1093/bioinformatics/btu257] [Citation(s) in RCA: 152] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Motivation: Circular RNAs (circRNAs) are an abundant class of highly stable RNAs that can affect gene regulation by binding and preventing microRNAs (miRNAs) from regulating their messenger RNA (mRNA) targets. Mammals have thousands of circRNAs with predicted miRNA binding sites, but only two circRNAs have been verified as being actual miRNA sponges. As it is unclear whether these thousands of predicted miRNA binding sites are functional, we investigated whether miRNA seed sites within human circRNAs are under selective pressure. Results: Using SNP data from the 1000 Genomes Project, we found a significant decrease in SNP density at miRNA seed sites compared with flanking sequences and random sites. This decrease was similar to that of miRNA seed sites in 3' untranslated regions, suggesting that many of the predicted miRNA binding sites in circRNAs are functional and under similar selective pressure as miRNA binding sites in mRNAs. Contact:pal.satrom@ntnu.no Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Laurent F Thomas
- Department of Cancer Research and Molecular Medicine and Department of Computer and Information Science, Norwegian University of Science and Technology, N-7489 Trondheim, Norway
| | - Pål Sætrom
- Department of Cancer Research and Molecular Medicine and Department of Computer and Information Science, Norwegian University of Science and Technology, N-7489 Trondheim, NorwayDepartment of Cancer Research and Molecular Medicine and Department of Computer and Information Science, Norwegian University of Science and Technology, N-7489 Trondheim, Norway
| |
Collapse
|
43
|
|
44
|
Worldwide genetic variation at the 3′ untranslated region of the HLA-G gene: balancing selection influencing genetic diversity. Genes Immun 2013; 15:95-106. [DOI: 10.1038/gene.2013.67] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 10/31/2013] [Accepted: 11/12/2013] [Indexed: 11/08/2022]
|
45
|
Jovelin R, Cutter AD. Fine-scale signatures of molecular evolution reconcile models of indel-associated mutation. Genome Biol Evol 2013; 5:978-86. [PMID: 23558593 PMCID: PMC3673634 DOI: 10.1093/gbe/evt051] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Genomic structural alterations that vary within species, known as large copy number variants, represent an unanticipated and abundant source of genetic diversity that associates with variation in gene expression and susceptibility to disease. Even short insertions and deletions (indels) can exert important effects on genomes by locally increasing the mutation rate, with multiple mechanisms proposed to account for this pattern. To better understand how indels promote genome evolution, we demonstrate that the single nucleotide mutation rate is elevated in the vicinity of indels, with a resolution of tens of base pairs, for the two closely related nematode species Caenorhabditis remanei and C. sp. 23. In addition to indels being clustered with single nucleotide polymorphisms and fixed differences, we also show that transversion mutations are enriched in sequences that flank indels and that many indels associate with sequence repeats. These observations are compatible with a model that reconciles previously proposed mechanisms of indel-associated mutagenesis, implicating repeat sequences as a common driver of indel errors, which then recruit error-prone polymerases during DNA repair, resulting in a locally elevated single nucleotide mutation rate. The striking influence of indel variants on the molecular evolution of flanking sequences strengthens the emerging general view that mutations can induce further mutations.
Collapse
Affiliation(s)
- Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Ontario, Canada.
| | | |
Collapse
|
46
|
Distinct Patterns of Genetic Variations in Potential Functional Elements in Long Noncoding RNAs. Hum Mutat 2013; 35:192-201. [DOI: 10.1002/humu.22472] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Accepted: 10/14/2013] [Indexed: 01/09/2023]
|
47
|
Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüş ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GRS, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C, Gerstein M. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 2013; 342:1235587. [PMID: 24092746 PMCID: PMC3947637 DOI: 10.1126/science.1235587] [Citation(s) in RCA: 270] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
Collapse
Affiliation(s)
- Ekta Khurana
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
| | - Vincenza Colonna
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
- Institute of Genetics and Biophysics, National Research Council
(CNR), 80131 Naples, Italy
| | - Xinmeng Jasmine Mu
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
| | - Hyun Min Kang
- Center for Statistical Genetics, Biostatistics, University of
Michigan, Ann Arbor, MI 48109, USA
| | - Tuuli Lappalainen
- Department of Genetic Medicine and Development, University of Geneva
Medical School, 1211 Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of
Geneva, 1211 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Andrea Sboner
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute
for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021,
USA
| | - Lucas Lochovsky
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
| | - Jieming Chen
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Integrated Graduate Program in Physical and Engineering Biology,
Yale University, New Haven, CT 06520, USA
| | - Arif Harmanci
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology,
Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
| | - Alexej Abyzov
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Suganthi Balasubramanian
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dimple Chakravarty
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
| | - Daniel Challis
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Yuan Chen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
| | - Declan Clarke
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Uday S. Evani
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert Fragoza
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
- Department of Molecular Biology and Genetics, Cornell University,
Ithaca, NY 14853, USA
| | - Erik Garrison
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | - Richard Gibbs
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Zeynep H. Gümüş
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute
for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021,
USA
- Department of Physiology and Biophysics, Weill Cornell Medical
College, New York, NY, 10065, USA
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Naoki Kitabayashi
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
| | - Yong Kong
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
- Keck Biotechnology Resource Laboratory, Yale University, New Haven,
CT 06511, USA
| | - Kasper Lage
- Pediatric Surgical Research Laboratories, MassGeneral Hospital for
Children, Massachusetts General Hospital, Boston, MA 02114, USA
- Analytical and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
- Center for Biological Sequence Analysis, Department of Systems
Biology, Technical University of Denmark, Lyngby, Denmark
- Center for Protein Research, University of Copenhagen, Copenhagen,
Denmark
| | - Vaja Liluashvili
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute
for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021,
USA
- Department of Physiology and Biophysics, Weill Cornell Medical
College, New York, NY, 10065, USA
| | - Steven M. Lipkin
- Department of Medicine, Weill Cornell Medical College, New York, NY
10065, USA
| | - Daniel G. MacArthur
- Analytical and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of
Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142,
USA
| | - Gabor Marth
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | - Donna Muzny
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Tune H. Pers
- Center for Biological Sequence Analysis, Department of Systems
Biology, Technical University of Denmark, Lyngby, Denmark
- Division of Endocrinology and Center for Basic and Translational
Obesity Research, Children’s Hospital, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham R. S. Ritchie
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jeffrey A. Rosenfeld
- Department of Medicine, Rutgers New Jersey Medical School, Newark,
NJ 07101, USA
- IST/High Performance and Research Computing, Rutgers University
Newark, NJ 07101, USA
- Sackler Institute for Comparative Genomics, American Museum of
Natural History, New York, NY 10024, USA
| | - Cristina Sisu
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Xiaomu Wei
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
- Department of Medicine, Weill Cornell Medical College, New York, NY
10065, USA
| | - Michael Wilson
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Child Study Center, Yale University, New Haven, CT 06520, USA
| | - Yali Xue
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
| | - Fuli Yu
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | | | - Emmanouil T. Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva
Medical School, 1211 Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of
Geneva, 1211 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology,
Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
| | - Mark A. Rubin
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
| | - Chris Tyler-Smith
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT
06520, USA
| |
Collapse
|
48
|
Cotney J, Leng J, Yin J, Reilly SK, DeMare LE, Emera D, Ayoub AE, Rakic P, Noonan JP. The evolution of lineage-specific regulatory activities in the human embryonic limb. Cell 2013; 154:185-96. [PMID: 23827682 PMCID: PMC3785101 DOI: 10.1016/j.cell.2013.05.056] [Citation(s) in RCA: 149] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Revised: 03/29/2013] [Accepted: 05/24/2013] [Indexed: 10/26/2022]
Abstract
The evolution of human anatomical features likely involved changes in gene regulation during development. However, the nature and extent of human-specific developmental regulatory functions remain unknown. We obtained a genome-wide view of cis-regulatory evolution in human embryonic tissues by comparing the histone modification H3K27ac, which provides a quantitative readout of promoter and enhancer activity, during human, rhesus, and mouse limb development. Based on increased H3K27ac, we find that 13% of promoters and 11% of enhancers have gained activity on the human lineage since the human-rhesus divergence. These gains largely arose by modification of ancestral regulatory activities in the limb or potential co-option from other tissues and are likely to have heterogeneous genetic causes. Most enhancers that exhibit gain of activity in humans originated in mammals. Gains at promoters and enhancers in the human limb are associated with increased gene expression, suggesting they include molecular drivers of human morphological evolution.
Collapse
Affiliation(s)
- Justin Cotney
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Jing Leng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Jun Yin
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Steven K. Reilly
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Laura E. DeMare
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Deena Emera
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Albert E. Ayoub
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06520, USA
- Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Pasko Rakic
- Department of Neurobiology, Yale University School of Medicine, New Haven, CT 06520, USA
- Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT 06520, USA
| | - James P. Noonan
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
- Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
49
|
Ramireddy E, Brenner WG, Pfeifer A, Heyl A, Schmülling T. In planta analysis of a cis-regulatory cytokinin response motif in Arabidopsis and identification of a novel enhancer sequence. PLANT & CELL PHYSIOLOGY 2013; 54:1079-92. [PMID: 23620480 DOI: 10.1093/pcp/pct060] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The phytohormone cytokinin plays a key role in regulating plant growth and development, and is involved in numerous physiological responses to environmental changes. The type-B response regulators, which regulate the transcription of cytokinin response genes, are a part of the cytokinin signaling system. Arabidopsis thaliana encodes 11 type-B response regulators (type-B ARRs), and some of them were shown to bind in vitro to the core cytokinin response motif (CRM) 5'-(A/G)GAT(T/C)-3' or, in the case of ARR1, to an extended motif (ECRM), 5'-AAGAT(T/C)TT-3'. Here we obtained in planta proof for the functionality of the latter motif. Promoter deletion analysis of the primary cytokinin response gene ARR6 showed that a combination of two extended motifs within the promoter is required to mediate the full transcriptional activation by ARR1 and other type-B ARRs. CRMs were found to be over-represented in the vicinity of ECRMs in the promoters of cytokinin-regulated genes, suggesting their functional relevance. Moreover, an evolutionarily conserved 27 bp long T-rich region between -220 and -193 bp was identified and shown to be required for the full activation by type-B ARRs and the response to cytokinin. This novel enhancer is not bound by the DNA-binding domain of ARR1, indicating that additional proteins might be involved in mediating the transcriptional cytokinin response. Furthermore, genome-wide expression profiling identified genes, among them ARR16, whose induction by cytokinin depends on both ARR1 and other specific type-B ARRs. This together with the ECRM/CRM sequence clustering indicates cooperative action of different type-B ARRs for the activation of particular target genes.
Collapse
Affiliation(s)
- Eswarayya Ramireddy
- Institute of Biology/Applied Genetics, Dahlem Centre of Plant Sciences (DCPS), Freie Universität Berlin, Germany
| | | | | | | | | |
Collapse
|
50
|
Arbiza L, Gronau I, Aksoy BA, Hubisz MJ, Gulko B, Keinan A, Siepel A. Genome-wide inference of natural selection on human transcription factor binding sites. Nat Genet 2013; 45:723-9. [PMID: 23749186 DOI: 10.1038/ng.2658] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 05/08/2013] [Indexed: 11/09/2022]
Abstract
For decades, it has been hypothesized that gene regulation has had a central role in human evolution, yet much remains unknown about the genome-wide impact of regulatory mutations. Here we use whole-genome sequences and genome-wide chromatin immunoprecipitation and sequencing data to demonstrate that natural selection has profoundly influenced human transcription factor binding sites since the divergence of humans from chimpanzees 4-6 million years ago. Our analysis uses a new probabilistic method, called INSIGHT, for measuring the influence of selection on collections of short, interspersed noncoding elements. We find that, on average, transcription factor binding sites have experienced somewhat weaker selection than protein-coding genes. However, the binding sites of several transcription factors show clear evidence of adaptation. Several measures of selection are strongly correlated with predicted binding affinity. Overall, regulatory elements seem to contribute substantially to both adaptive substitutions and deleterious polymorphisms with key implications for human evolution and disease.
Collapse
Affiliation(s)
- Leonardo Arbiza
- Department of Biological Statistics & Computational Biology, Cornell University, Ithaca, NY, USA
| | | | | | | | | | | | | |
Collapse
|