1
|
Garrido Marques A, Rubinacci S, Malaspinas AS, Delaneau O, Sousa da Mota B. Assessing the impact of post-mortem damage and contamination on imputation performance in ancient DNA. Sci Rep 2024; 14:6227. [PMID: 38486065 PMCID: PMC10940295 DOI: 10.1038/s41598-024-56584-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 03/08/2024] [Indexed: 03/18/2024] Open
Abstract
Low-coverage imputation is becoming ever more present in ancient DNA (aDNA) studies. Imputation pipelines commonly used for present-day genomes have been shown to yield accurate results when applied to ancient genomes. However, post-mortem damage (PMD), in the form of C-to-T substitutions at the reads termini, and contamination with DNA from closely related species can potentially affect imputation performance in aDNA. In this study, we evaluated imputation performance (i) when using a genotype caller designed for aDNA, ATLAS, compared to bcftools, and (ii) when contamination is present. We evaluated imputation performance with principal component analyses and by calculating imputation error rates. With a particular focus on differently imputed sites, we found that using ATLAS prior to imputation substantially improved imputed genotypes for a very damaged ancient genome (42% PMD). Trimming the ends of the sequencing reads led to similar improvements in imputation accuracy. For the remaining genomes, ATLAS brought limited gains. Finally, to examine the effect of contamination on imputation, we added various amounts of reads from two present-day genomes to a previously downsampled high-coverage ancient genome. We observed that imputation accuracy drastically decreased for contamination rates above 5%. In conclusion, we recommend (i) accounting for PMD by either trimming sequencing reads or using a genotype caller such as ATLAS before imputing highly damaged genomes and (ii) only imputing genomes containing up to 5% of contamination.
Collapse
Affiliation(s)
| | - Simone Rubinacci
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anna-Sapfo Malaspinas
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | | | - Bárbara Sousa da Mota
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
2
|
Seo D, Koh B, Eom GE, Kim HW, Kim S. A dual gene-specific mutator system installs all transition mutations at similar frequencies in vivo. Nucleic Acids Res 2023; 51:e59. [PMID: 37070179 PMCID: PMC10250238 DOI: 10.1093/nar/gkad266] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 03/31/2023] [Indexed: 04/19/2023] Open
Abstract
Targeted in vivo hypermutation accelerates directed evolution of proteins through concurrent DNA diversification and selection. Although systems employing a fusion protein of a nucleobase deaminase and T7 RNA polymerase present gene-specific targeting, their mutational spectra have been limited to exclusive or dominant C:G→T:A mutations. Here we describe eMutaT7transition, a new gene-specific hypermutation system, that installs all transition mutations (C:G→T:A and A:T→G:C) at comparable frequencies. By using two mutator proteins in which two efficient deaminases, PmCDA1 and TadA-8e, are separately fused to T7 RNA polymerase, we obtained similar numbers of C:G→T:A and A:T→G:C substitutions at a sufficiently high frequency (∼6.7 substitutions in 1.3 kb gene during 80-h in vivo mutagenesis). Through eMutaT7transition-mediated TEM-1 evolution for antibiotic resistance, we generated many mutations found in clinical isolates. Overall, with a high mutation frequency and wider mutational spectrum, eMutaT7transition is a potential first-line method for gene-specific in vivo hypermutation.
Collapse
Affiliation(s)
- Daeje Seo
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Bonghyun Koh
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Ga-eul Eom
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Hye Won Kim
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Seokhee Kim
- Department of Chemistry, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
3
|
Harvey WT, Ebert P, Ebler J, Audano PA, Munson KM, Hoekzema K, Porubsky D, Beck CR, Marschall T, Garimella K, Eichler EE. Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.04.539448. [PMID: 37205567 PMCID: PMC10187267 DOI: 10.1101/2023.05.04.539448] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
Collapse
Affiliation(s)
- William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Peter A. Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Christine R. Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT 06032 USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Kiran Garimella
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
4
|
Jing Y, Bian L, Zhang X, Zhao B, Zheng R, Su S, Ye D, Zheng X, El-Kassaby YA, Shi J. Genetic diversity and structure of the 4 th cycle breeding population of Chinese fir ( Cunninghamia lanceolata (lamb.) hook). FRONTIERS IN PLANT SCIENCE 2023; 14:1106615. [PMID: 36778690 PMCID: PMC9911867 DOI: 10.3389/fpls.2023.1106615] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 01/16/2023] [Indexed: 06/18/2023]
Abstract
Studying population genetic structure and diversity is crucial for the marker-assisted selection and breeding of coniferous tree species. In this study, using RAD-seq technology, we developed 343,644 high-quality single nucleotide polymorphism (SNP) markers to resolve the genetic diversity and population genetic structure of 233 Chinese fir selected individuals from the 4th cycle breeding program, representing different breeding generations and provenances. The genetic diversity of the 4th cycle breeding population was high with nucleotide diversity (Pi ) of 0.003, and Ho and He of 0.215 and 0.233, respectively, indicating that the breeding population has a broad genetic base. The genetic differentiation level between the different breeding generations and different provenances was low (Fst < 0.05), with population structure analysis results dividing the 233 individuals into four subgroups. Each subgroup has a mixed branch with interpenetration and weak population structure, which might be related to breeding rather than provenance, with aggregation from the same source only being in the local branches. Our results provide a reference for further research on the marker-assisted selective breeding of Chinese fir and other coniferous trees.
Collapse
Affiliation(s)
- Yonglian Jing
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Liming Bian
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Xuefeng Zhang
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Benwen Zhao
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| | - Renhua Zheng
- Key Laboratory of Timber Forest Breeding and Cultivation for Mountainous Areas in Southern China, Fujian Academy of Forestry Science, Fuzhou, China
| | - Shunde Su
- Key Laboratory of Timber Forest Breeding and Cultivation for Mountainous Areas in Southern China, Fujian Academy of Forestry Science, Fuzhou, China
| | - Daiquan Ye
- Department of Tree Improvement, Yangkou State-owned Forest Farm, Nanping, China
| | - Xueyan Zheng
- Department of Tree Improvement, Yangkou State-owned Forest Farm, Nanping, China
| | - Yousry A. El-Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, Vancouver, BC, Canada
| | - Jisen Shi
- State Key Laboratory of Tree Genetics and Breeding, Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, Nanjing, China
| |
Collapse
|
5
|
Sun N, Yau SST. In-depth investigation of the point mutation pattern of HIV-1. Front Cell Infect Microbiol 2022; 12:1033481. [PMID: 36457853 PMCID: PMC9705751 DOI: 10.3389/fcimb.2022.1033481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/25/2022] [Indexed: 04/29/2024] Open
Abstract
Mutations may produce highly transmissible and damaging HIV variants, which increase the genetic diversity, and pose a challenge to develop vaccines. Therefore, it is of great significance to understand how mutations drive the virulence of HIV. Based on the 11897 reliable genomes of HIV-1 retrieved from HIV sequence Database, we analyze the 12 types of point mutation (A>C, A>G, A>T, C>A, C>G, C>T, G>A, G>C, G>T, T>A, T>C, T>G) from multiple statistical perspectives for the first time. The global/geographical location/subtype/k-mer analysis results report that A>G, G>A, C>T and T>C account for nearly 64% among all SNPs, which suggest that APOBEC-editing and ADAR-editing may play an important role in HIV-1 infectivity. Time analysis shows that most genomes with abnormal mutation numbers comes from African countries. Finally, we use natural vector method to check the k-mer distribution changing patterns in the genome, and find that there is an important substitution pattern between nucleotides A and G, and 2-mer CG may have a significant impact on viral infectivity. This paper provides an insight into the single mutation of HIV-1 by using the latest data in the HIV sequence Database.
Collapse
Affiliation(s)
- Nan Sun
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Stephen S.-T. Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, China
| |
Collapse
|
6
|
Pilheden M, Ahlgren L, Hyrenius-Wittsten A, Gonzalez-Pena V, Sturesson H, Hansen Marquart HV, Lausen B, Castor A, Pronk CJ, Barbany G, Pokrovskaja Tamm K, Fogelstrand L, Lohi O, Norén-Nyström U, Asklin J, Chen Y, Song G, Walsh M, Ma J, Zhang J, Saal LH, Gawad C, Hagström-Andersson AK. Duplex Sequencing Uncovers Recurrent Low-frequency Cancer-associated Mutations in Infant and Childhood KMT2A-rearranged Acute Leukemia. Hemasphere 2022; 6:e785. [PMID: 36204688 PMCID: PMC9529062 DOI: 10.1097/hs9.0000000000000785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 08/30/2022] [Indexed: 11/26/2022] Open
Abstract
Infant acute lymphoblastic leukemia (ALL) with KMT2A-gene rearrangements (KMT2A-r) have few mutations and a poor prognosis. To uncover mutations that are below the detection of standard next-generation sequencing (NGS), a combination of targeted duplex sequencing and NGS was applied on 20 infants and 7 children with KMT2A-r ALL, 5 longitudinal and 6 paired relapse samples. Of identified nonsynonymous mutations, 87 had been previously implicated in cancer and targeted genes recurrently altered in KMT2A-r leukemia and included mutations in KRAS, NRAS, FLT3, TP53, PIK3CA, PAX5, PIK3R1, and PTPN11, with infants having fewer such mutations. Of identified cancer-associated mutations, 62% were below the resolution of standard NGS. Only 33 of 87 mutations exceeded 2% of cellular prevalence and most-targeted PI3K/RAS genes (31/33) and typically KRAS/NRAS. Five patients only had low-frequency PI3K/RAS mutations without a higher-frequency signaling mutation. Further, drug-resistant clones with FLT3 D835H or NRAS G13D/G12S mutations that comprised only 0.06% to 0.34% of diagnostic cells, expanded at relapse. Finally, in longitudinal samples, the relapse clone persisted as a minor subclone from diagnosis and through treatment before expanding during the last month of disease. Together, we demonstrate that infant and childhood KMT2A-r ALL harbor low-frequency cancer-associated mutations, implying a vast subclonal genetic landscape.
Collapse
Affiliation(s)
- Mattias Pilheden
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Louise Ahlgren
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Axel Hyrenius-Wittsten
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | - Veronica Gonzalez-Pena
- Division of Pediatric Hematology/Oncology, Stanford University, School of Medicine, Stanford, CA, USA
| | - Helena Sturesson
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
| | | | - Birgitte Lausen
- Department of Paediatrics and Adolescent Medicine, Rigshospitalet, University of Copenhagen, Denmark
| | - Anders Castor
- Childhood Cancer Center, Skane University Hospital, Lund, Sweden
| | | | - Gisela Barbany
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | | | - Linda Fogelstrand
- Department of Clinical Chemistry, Sahlgrenska University Hospital, Gothenburg, Sweden
- Department of Laboratory Medicine, Institute of Biomedicine, University of Gothenburg, Sweden
| | - Olli Lohi
- Tampere Center for Child, Adolescent and Maternal Health Research and Tays Cancer Center, Tampere University and Tampere University Hospital, Tampere, Finland
| | | | | | | | - Guangchun Song
- Department of Pathology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Michael Walsh
- Department of Pathology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Jing Ma
- Department of Pathology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN, USA
| | - Lao H. Saal
- SAGA Diagnostics, Lund, Sweden
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | - Charles Gawad
- Division of Pediatric Hematology/Oncology, Stanford University, School of Medicine, Stanford, CA, USA
| | - Anna K. Hagström-Andersson
- Division of Clinical Genetics, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Center for Translational Genomics, Lund University, Lund, Sweden
| |
Collapse
|
7
|
Chen Y, Li R, Sun J, Li C, Xiao H, Chen S. Genome-Wide Population Structure and Selection Signatures of Yunling Goat Based on RAD-seq. Animals (Basel) 2022; 12:ani12182401. [PMID: 36139261 PMCID: PMC9495202 DOI: 10.3390/ani12182401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/07/2022] [Accepted: 09/10/2022] [Indexed: 12/04/2022] Open
Abstract
Simple Summary Goats are important domestic animals that provide meat, milk, fur, and other products for humans. The demand for these products has increased in recent years. Disease resistance among goat breeds is different, but the genetic basis of the differences in resistance to diseases is still unclear and needs to be further studied. In this study, many genes and pathways related to immunity and diseases were identified to be under positive selection between Yunling and Nubian goats using RAD-seq technology. This study on the selection signatures of Yunling goats provides the scientific basis and technical support for the breeding of domestic goats for disease resistance, which has important social and economic significance. Abstract Animal diseases impose a huge burden on the countries where diseases are endemic. Conventional control strategies of vaccines and veterinary drugs are to control diseases from a pharmaceutical perspective. Another alternative approach is using pre-existing genetic disease resistance or tolerance. We know that the Yunling goat is an excellent local breed from Yunnan, southwestern China, which has characteristics of strong disease resistance and remarkable adaptability. However, genetic information about the selection signatures of Yunling goats is limited. We reasoned that the genes underlying the observed difference in disease resistance might be identified by investigating selection signatures between two different goat breeds. Herein, we selected the Nubian goat as the reference group to perform the population structure and selection signature analysis by using RAD-seq technology. The results showed that two goat breeds were divided into two clusters, but there also existed gene flow. We used Fst (F-statistics) and π (pi/θπ) methods to carry out selection signature analysis. Eight selected regions and 91 candidate genes were identified, in which some genes such as DOK2, TIMM17A, MAVS, and DOCK8 related to disease and immunity and some genes such as SPEFI, CDC25B, and MIR103 were associated with reproduction. Four GO (Gene Ontology) terms (GO:0010591, GO:001601, GO:0038023, and GO:0017166) were associated with cell migration, signal transduction, and immune responses. The KEGG (Kyoto Encyclopedia of Genes and Genomes) signaling pathways were mainly associated with immune responses, inflammatory responses, and stress reactions. This study preliminarily revealed the genetic basis of strong disease resistance and adaptability of Yunling goats. It provides a theoretical basis for the subsequent genetic breeding of disease resistance of goats.
Collapse
Affiliation(s)
- Yuming Chen
- School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China; (Y.C.); (R.L.); (C.L.); (H.X.)
- School of Life Sciences, Yunnan University, Kunming 650500, China;
| | - Rong Li
- School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China; (Y.C.); (R.L.); (C.L.); (H.X.)
- College of Life Science, Yunnan Normal University, Kunming 650500, China
| | - Jianshu Sun
- School of Life Sciences, Yunnan University, Kunming 650500, China;
| | - Chunqing Li
- School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China; (Y.C.); (R.L.); (C.L.); (H.X.)
| | - Heng Xiao
- School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China; (Y.C.); (R.L.); (C.L.); (H.X.)
| | - Shanyuan Chen
- School of Ecology and Environmental Science, Yunnan University, Kunming 650500, China; (Y.C.); (R.L.); (C.L.); (H.X.)
- Correspondence: ; Tel.: +86-18687122260
| |
Collapse
|
8
|
Nelakurti DD, Rossetti T, Husbands AY, Petreaca RC. Arginine Depletion in Human Cancers. Cancers (Basel) 2021; 13:cancers13246274. [PMID: 34944895 PMCID: PMC8699593 DOI: 10.3390/cancers13246274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 12/04/2021] [Accepted: 12/09/2021] [Indexed: 11/25/2022] Open
Abstract
Simple Summary Thousands of cancer genomes are now publicly available which has led to new insights into the underlying features of cancers. These include the identification of mutational signatures at both nucleotide and amino acid levels. Here, we discuss C > T transitions as a key nucleotide-level mutational signature that leads to a dramatic overrepresentation of arginine substitutions in cancers. We propose that this underlying C > T mutational signature canalizes possible arginine substitution outcomes, favoring histidine, cysteine, glutamine, and tryptophan. This initial asymmetry is then acted on at the amino acid level by purifying selection. Thus, a model of “sequential selection” could explain the documented bias towards arginine substitutions in multiple cancers. Abstract Arginine is encoded by six different codons. Base pair changes in any of these codons can have a broad spectrum of effects including substitutions to twelve different amino acids, eighteen synonymous changes, and two stop codons. Four amino acids (histidine, cysteine, glutamine, and tryptophan) account for over 75% of amino acid substitutions of arginine. This suggests that a mutational bias, or “purifying selection”, mechanism is at work. This bias appears to be driven by C > T and G > A transitions in four of the six arginine codons, a signature that is universal and independent of cancer tissue of origin or histology. Here, we provide a review of the available literature and reanalyze publicly available data from the Catalogue of Somatic Mutations in Cancer (COSMIC). Our analysis identifies several genes with an arginine substitution bias. These include known factors such as IDH1, as well as previously unreported genes, including four cancer driver genes (FGFR3, PPP6C, MAX, GNAQ). We propose that base pair substitution bias and amino acid physiology both play a role in purifying selection. This model may explain the documented arginine substitution bias in cancers.
Collapse
Affiliation(s)
- Devi D. Nelakurti
- Biomedical Science Undergraduate Program, The Ohio State University Medical School, Columbus, OH 43210, USA;
| | - Tiffany Rossetti
- Biology Undergraduate Program, The Ohio State University, Marion, OH 43302, USA;
| | - Aman Y. Husbands
- Department of Molecular Genetics, The Ohio State University, Columbus, OH 43215, USA
- Correspondence: (A.Y.H.); (R.C.P.)
| | - Ruben C. Petreaca
- Department of Molecular Genetics, The Ohio State University, Marion, OH 43302, USA
- Cancer Biology Program, The Ohio State University James Comprehensive Cancer Center, Columbus, OH 43210, USA
- Correspondence: (A.Y.H.); (R.C.P.)
| |
Collapse
|
9
|
Li Y, Tan Z, Zhang Y, Zhang Z, Hu Q, Liang K, Jun Y, Ye Y, Li YC, Li C, Liao L, Xu J, Xing Z, Pan Y, Chatterjee SS, Nguyen TK, Hsiao H, Egranov SD, Putluri N, Coarfa C, Hawke DH, Gunaratne PH, Tsai KL, Han L, Hung MC, Calin GA, Namour F, Guéant JL, Muntau AC, Blau N, Sutton VR, Schiff M, Feillet F, Zhang S, Lin C, Yang L. A noncoding RNA modulator potentiates phenylalanine metabolism in mice. Science 2021; 373:662-673. [PMID: 34353949 PMCID: PMC9714245 DOI: 10.1126/science.aba4991] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 08/31/2020] [Accepted: 06/25/2021] [Indexed: 12/13/2022]
Abstract
The functional role of long noncoding RNAs (lncRNAs) in inherited metabolic disorders, including phenylketonuria (PKU), is unknown. Here, we demonstrate that the mouse lncRNA Pair and human HULC associate with phenylalanine hydroxylase (PAH). Pair-knockout mice exhibited excessive blood phenylalanine (Phe), musty odor, hypopigmentation, growth retardation, and progressive neurological symptoms including seizures, which faithfully models human PKU. HULC depletion led to reduced PAH enzymatic activities in human induced pluripotent stem cell-differentiated hepatocytes. Mechanistically, HULC modulated the enzymatic activities of PAH by facilitating PAH-substrate and PAH-cofactor interactions. To develop a therapeutic strategy for restoring liver lncRNAs, we designed GalNAc-tagged lncRNA mimics that exhibit liver enrichment. Treatment with GalNAc-HULC mimics reduced excessive Phe in Pair -/- and Pah R408W/R408W mice and improved the Phe tolerance of these mice.
Collapse
Affiliation(s)
- Yajuan Li
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Zhi Tan
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA
| | - Yaohua Zhang
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Zhao Zhang
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston McGovern Medical School, Houston, TX 77030, USA
| | - Qingsong Hu
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ke Liang
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yao Jun
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Youqiong Ye
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston McGovern Medical School, Houston, TX 77030, USA
| | - Yi-Chuan Li
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston McGovern Medical School, Houston, TX 77030, USA
| | - Chunlai Li
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Lan Liao
- Genetically Engineered Mouse Core, Advanced Technology Cores, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jianming Xu
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhen Xing
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Yinghong Pan
- Department of Biochemistry and Biology, University of Houston, Houston, TX 77030, USA
| | - Sujash S Chatterjee
- Department of Biochemistry and Biology, University of Houston, Houston, TX 77030, USA
| | - Tina K Nguyen
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Heidi Hsiao
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Sergey D Egranov
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Nagireddy Putluri
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Cristian Coarfa
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - David H Hawke
- Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Preethi H Gunaratne
- Department of Biochemistry and Biology, University of Houston, Houston, TX 77030, USA
| | - Kuang-Lei Tsai
- Department of Biochemistry and Molecular Biology, The University of Texas Health Science Center at Houston McGovern Medical School, Houston, TX 77030, USA
| | - Leng Han
- Center for Epigenetics and Disease Prevention, Institute of Biosciences and Technology, Texas A&M University, Houston, TX 77030, USA
| | - Mien-Chie Hung
- Graduate Institute of Biomedical Sciences, Research Center for Cancer Biology, and Center for Molecular Medicine, China Medical University, Taichung 404, Taiwan
- Department of Biotechnology, Asia University, Taichung 413, Taiwan
| | - George A Calin
- Department of Translational Molecular Pathology, Division of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Fares Namour
- Department of Molecular Medicine and Reference Center for Inborn Errors of Metabolism, University Hospital of Nancy, Nancy F-54000, France
- INSERM, U1256, NGERE - Nutrition, Genetics, and Environmental Risk Exposure, University of Lorraine, Nancy F-54000, France
| | - Jean-Louis Guéant
- Department of Molecular Medicine and Reference Center for Inborn Errors of Metabolism, University Hospital of Nancy, Nancy F-54000, France
- INSERM, U1256, NGERE - Nutrition, Genetics, and Environmental Risk Exposure, University of Lorraine, Nancy F-54000, France
| | - Ania C Muntau
- University Children's Hospital, University Medical Center Hamburg Eppendorf, Hamburg 20246, Germany
| | - Nenad Blau
- Division of Metabolism, University Children's Hospital Zurich, CH-8032 Zurich, Switzerland
| | - V Reid Sutton
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Manuel Schiff
- Necker Hospital, APHP, Reference Center for Inborn Error of Metabolism and Filière G2M, Pediatrics Department, University of Paris, Paris 75007, France
- Inserm UMR_S1163, Institut Imagine, Paris 75015, France
| | - François Feillet
- INSERM, U1256, NGERE - Nutrition, Genetics, and Environmental Risk Exposure, University of Lorraine, Nancy F-54000, France.
- Pediatric Department Reference Center for Inborn Errors of Metabolism Children University Hospital Nancy, Nancy F-54000, France
| | - Shuxing Zhang
- Intelligent Molecular Discovery Laboratory, Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77054, USA.
- The Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Chunru Lin
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
- The Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Liuqing Yang
- Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA.
- The Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Center for RNA Interference and Non-Coding RNAs, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
10
|
Abstract
A near-universal Standard Genetic Code (SGC) implies a single origin for present Earth life. To study this unique event, I compute paths to the SGC, comparing different plausible histories. Notably, SGC-like coding emerges from traditional evolutionary mechanisms, and a superior route can be identified. To objectively measure evolution, progress values from 0 (random coding) to 1 (SGC-like) are defined: these measure fractions of random-code-to-SGC distance. Progress types are spacing/distance/delta Polar Requirement, detecting space between identical assignments/mutational distance to the SGC/chemical order, respectively. The coding system is based on selected RNAs performing aminoacyl-RNA synthetase reactions. Acceptor RNAs exhibit SGC-like Crick wobble; alternatively, non-wobbling triplets uniquely encode 20 amino acids/start/stop. Triplets acquire 22 functions by stereochemistry, selection, coevolution, or at random. Assignments also propagate to an assigned triplet’s neighborhood via single mutations, but can also decay. A vast code universe makes futile evolutionary paths plentiful. Thus, SGC evolution is critically sensitive to disorder from random assignments. Evolution also inevitably slows near coding completion. The SGC likely avoided these difficulties, and two suitable paths are compared. In late wobble, a majority of non-wobble assignments are made before wobble is adopted. In continuous wobble, a uniquely advantageous early intermediate yields an ordered SGC. Revised coding evolution (limited randomness, late wobble, concentration on amino acid encoding, chemically conservative coevolution with a chemically ordered elite) produces varied full codes with excellent joint progress values. A population of only 600 independent coding tables includes SGC-like members; a Bayesian path toward more accurate SGC evolution is available.
Collapse
Affiliation(s)
- Michael Yarus
- Department of Molecular, Cellular and Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309-0347, USA.
| |
Collapse
|
11
|
Zhang TT, Liu H, Gao QY, Yang T, Liu JN, Ma XF, Li ZH. Gene transfer and nucleotide sequence evolution by Gossypium cytoplasmic genomes indicates novel evolutionary characteristics. PLANT CELL REPORTS 2020; 39:765-777. [PMID: 32215683 DOI: 10.1007/s00299-020-02529-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 03/04/2020] [Indexed: 06/10/2023]
Abstract
The DNA fragments transferred among cotton cytoplasmic genomes are highly differentiated. The wild D group cotton species have undergone much greater evolution compared with cultivated AD group. Cotton (Gossypium spp.) is one of the most economically important fiber crops worldwide. Gene transfer, nucleotide evolution, and the codon usage preferences in cytoplasmic genomes are important evolutionary characteristics of high plants. In this study, we analyzed the nucleotide sequence evolution, codon usage, and transfer of cytoplasmic DNA fragments in Gossypium chloroplast (cp) and mitochondrial (mt) genomes, including the A genome group, wild D group, and cultivated AD group of cotton species. Our analyses indicated that the differences in the length of transferred cytoplasmic DNA fragments were not significant in mitochondrial and chloroplast sequences. Analysis of the transfer of tRNAs found that trnQ and nine other tRNA genes were commonly transferred between two different cytoplasmic genomes. The Codon Adaptation Index values showed that Gossypium cp genomes prefer A/T-ending codons. Codon preference selection was higher in the D group than the other two groups. Nucleotide sequence evolution analysis showed that intergenic spacer sequences were more variable than coding regions and nonsynonymous mutations were clearly more common in cp genomes than mt genomes. Evolutionary analysis showed that the substitution rate was much higher in cp genomes than mt genomes. Interestingly, the D group cotton species have undergone much faster evolution compared with cultivated AD groups, possibly due to the selection and domestication of diverse cotton species. Our results demonstrate that gene transfer and differential nucleotide sequence evolution have occurred frequently in cotton cytoplasmic genomes.
Collapse
Affiliation(s)
- Ting-Ting Zhang
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Heng Liu
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Qi-Yuan Gao
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Ting Yang
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), College of Life Sciences, Northwest University, Xi'an, 710069, China
| | - Jian-Ni Liu
- State Key Laboratory of Continental Dynamics, Department of Geology, Early Life Institute, Northwest University, Xi'an, 710069, China
| | - Xiong-Feng Ma
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China.
| | - Zhong-Hu Li
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), College of Life Sciences, Northwest University, Xi'an, 710069, China.
- State Key Laboratory of Continental Dynamics, Department of Geology, Early Life Institute, Northwest University, Xi'an, 710069, China.
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang, 455000, China.
| |
Collapse
|
12
|
Di Gioacchino A, Šulc P, Komarova AV, Greenbaum BD, Monasson R, Cocco S. The Heterogeneous Landscape and Early Evolution of Pathogen-Associated CpG Dinucleotides in SARS-CoV-2. SSRN 2020:3611280. [PMID: 32714120 PMCID: PMC7366803 DOI: 10.2139/ssrn.3611280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 05/27/2020] [Indexed: 11/15/2022]
Abstract
SARS-CoV-2 infection can lead to acute respiratory syndrome in patients, which can be due in part to dysregulated immune signalling. We analyze here the occurrences of CpG dinucleotides, which are putative pathogen-associated molecular patterns, along the viral sequence. Carrying out a comparative analysis with other ssRNA viruses and within the Coronaviridae family, we find the CpG content of SARS-CoV-2, while low compared to other betacoronaviruses, widely fluctuates along its primary sequence. While the CpG relative abundance and its associated CpG force parameter are low for the spike protein (S) and comparable to circulating seasonal coronaviruses such as HKU1, they are much greater and comparable to SARS and MERS for the 3'-end of the viral genome. In particular, the nucleocapsid protein (N), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3'UTRs of all subgenomic RNA, has high CpG content. We speculate this dual nature of CpG content can confer to SARS-CoV-2 high ability to both enter the host and trigger pattern recognition receptors (PRRs) in different contexts. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic. Using a new application of selective forces on dinucleotides to estimate context driven mutational processes, we find that synonymous mutations seem driven both by the viral codon bias and by the high value of the CpG force in the N protein, leading to a loss in CpG content. Sequence motifs preceding these CpG-loss-associated loci match recently identified binding patterns of the Zinc Finger anti-viral Protein (ZAP) protein. Funding: This work was partially supported by the ANR19 Decrypted CE30-0021-01 grants. B.G. was supported by National Institutes of Health grants 7R01AI081848-04, 1R01CA240924-01, a Stand Up to Cancer - Lustgarten Foundation Convergence Dream Team Grant, and The Pershing Square Sohn Prize - Mark Foundation Fellow supported by funding from The Mark Foundation for Cancer Research.
Collapse
Affiliation(s)
- Andrea Di Gioacchino
- Laboratoire de Physique de l'Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Petr Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, 1001 South McAllister Avenue, Tempe, Arizona 85281, USA
| | - Anastassia V Komarova
- Molecular Genetics of RNA viruses, Department of Virology, Institut Pasteur, CNRS UMR-3569, 75015 Paris, France
| | - Benjamin D Greenbaum
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, 1275 York Avenue New York, NY 10065
| | - Rémi Monasson
- Laboratoire de Physique de l'Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Simona Cocco
- Laboratoire de Physique de l'Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, F-75005 Paris, France
| |
Collapse
|
13
|
Nguyen A, Maisnier-Patin S, Yamayoshi I, Kofoid E, Roth JR. Selective Inbreeding: Genetic Crosses Drive Apparent Adaptive Mutation in the Cairns-Foster System of Escherichia coli. Genetics 2020; 214:333-354. [PMID: 31810989 PMCID: PMC7017022 DOI: 10.1534/genetics.119.302754] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/02/2019] [Indexed: 01/09/2023] Open
Abstract
The Escherichia coli system of Cairns and Foster employs a lac frameshift mutation that reverts rarely (10-9/cell/division) during unrestricted growth. However, when 108 cells are plated on lactose medium, the nongrowing lawn produces ∼50 Lac+ revertant colonies that accumulate linearly with time over 5 days. Revertants carry very few associated mutations. This behavior has been attributed to an evolved mechanism ("adaptive mutation" or "stress-induced mutagenesis") that responds to starvation by preferentially creating mutations that improve growth. We describe an alternative model, "selective inbreeding," in which natural selection acts during intercellular transfer of the plasmid that carries the mutant lac allele and the dinB gene for an error-prone polymerase. Revertant genome sequences show that the plasmid is more intensely mutagenized than the chromosome. Revertants vary widely in their number of plasmid and chromosomal mutations. Plasmid mutations are distributed evenly, but chromosomal mutations are focused near the replication origin. Rare, heavily mutagenized, revertants have acquired a plasmid tra mutation that eliminates conjugation ability. These findings support the new model, in which revertants are initiated by rare pre-existing cells (105) with many copies of the F'lac plasmid. These cells divide under selection, producing daughters that mate. Recombination between donor and recipient plasmids initiates rolling-circle plasmid over-replication, causing a mutagenic elevation of DinB level. A lac+ reversion event starts chromosome replication and mutagenesis by accumulated DinB. After reversion, plasmid transfer moves the revertant lac+ allele into an unmutagenized cell, and away from associated mutations. Thus, natural selection explains why mutagenesis appears stress-induced and directed.
Collapse
Affiliation(s)
- Amanda Nguyen
- Department of Microbiology and Molecular Genetics, University of California, Davis, California 95616
| | - Sophie Maisnier-Patin
- Department of Microbiology and Molecular Genetics, University of California, Davis, California 95616
| | - Itsugo Yamayoshi
- Department of Microbiology and Molecular Genetics, University of California, Davis, California 95616
| | - Eric Kofoid
- Department of Microbiology and Molecular Genetics, University of California, Davis, California 95616
| | - John R Roth
- Department of Microbiology and Molecular Genetics, University of California, Davis, California 95616
| |
Collapse
|
14
|
Bauer J, Broom M, Alonso E. The stabilization of equilibria in evolutionary game dynamics through mutation: mutation limits in evolutionary games. Proc Math Phys Eng Sci 2019; 475:20190355. [PMID: 31824216 DOI: 10.1098/rspa.2019.0355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 10/01/2019] [Indexed: 11/12/2022] Open
Abstract
The multi-population replicator dynamics is a dynamic approach to coevolving populations and multi-player games and is related to Cross learning. In general, not every equilibrium is a Nash equilibrium of the underlying game, and the convergence is not guaranteed. In particular, no interior equilibrium can be asymptotically stable in the multi-population replicator dynamics, e.g. resulting in cyclic orbits around a single interior Nash equilibrium. We introduce a new notion of equilibria of replicator dynamics, called mutation limits, based on a naturally arising, simple form of mutation, which is invariant under the specific choice of mutation parameters. We prove the existence of mutation limits for a large class of games, and consider a particularly interesting subclass called attracting mutation limits. Attracting mutation limits are approximated in every (mutation-)perturbed replicator dynamics, hence they offer an approximate dynamic solution to the underlying game even if the original dynamic is not convergent. Thus, mutation stabilizes the system in certain cases and makes attracting mutation limits near attainable. Hence, attracting mutation limits are relevant as a dynamic solution concept of games. We observe that they have some similarity to Q-learning in multi-agent reinforcement learning. Attracting mutation limits do not exist in all games, however, raising the question of their characterization.
Collapse
Affiliation(s)
- Johann Bauer
- Department of Mathematics, University of London, London, UK
| | - Mark Broom
- Department of Mathematics, University of London, London, UK
| | - Eduardo Alonso
- Department of Computer Science, City, University of London, London, UK
| |
Collapse
|
15
|
Li J, Jew B, Zhan L, Hwang S, Coppola G, Freimer NB, Sul JH. ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest. PLoS Comput Biol 2019; 15:e1007556. [PMID: 31851693 PMCID: PMC6938691 DOI: 10.1371/journal.pcbi.1007556] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 01/01/2020] [Accepted: 11/21/2019] [Indexed: 12/30/2022] Open
Abstract
Next-generation sequencing technology (NGS) enables the discovery of nearly all genetic variants present in a genome. A subset of these variants, however, may have poor sequencing quality due to limitations in NGS or variant callers. In genetic studies that analyze a large number of sequenced individuals, it is critical to detect and remove those variants with poor quality as they may cause spurious findings. In this paper, we present ForestQC, a statistical tool for performing quality control on variants identified from NGS data by combining a traditional filtering approach and a machine learning approach. Our software uses the information on sequencing quality, such as sequencing depth, genotyping quality, and GC contents, to predict whether a particular variant is likely to be false-positive. To evaluate ForestQC, we applied it to two whole-genome sequencing datasets where one dataset consists of related individuals from families while the other consists of unrelated individuals. Results indicate that ForestQC outperforms widely used methods for performing quality control on variants such as VQSR of GATK by considerably improving the quality of variants to be included in the analysis. ForestQC is also very efficient, and hence can be applied to large sequencing datasets. We conclude that combining a machine learning algorithm trained with sequencing quality information and the filtering approach is a practical approach to perform quality control on genetic variants from sequencing data.
Collapse
Affiliation(s)
- Jiajin Li
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Brandon Jew
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Lingyu Zhan
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Sungoo Hwang
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Giovanni Coppola
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Nelson B. Freimer
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States of America
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, United States of America
| |
Collapse
|
16
|
Zhou JL, Xu J, Jiao AG, Yang L, Chen J, Callac P, Liu Y, Wang SX. Patterns of PCR Amplification Artifacts of the Fungal Barcode Marker in a Hybrid Mushroom. Front Microbiol 2019; 10:2686. [PMID: 31803173 PMCID: PMC6877668 DOI: 10.3389/fmicb.2019.02686] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2019] [Accepted: 11/05/2019] [Indexed: 11/16/2022] Open
Abstract
The polymerase chain reaction (PCR) is widely used in modern biology and medicine. However, PCR artifacts can complicate the interpretation of PCR-based results. The internal transcribed spacer (ITS) region of the ribosomal RNA gene cluster is the consensus fungal barcode marker and suspected PCR artifacts have been reported in many studies, especially for the analyses of environmental fungal samples. At present, the patterns of PCR artifacts in the whole fungal ITS region (ITS1+5.8S+ITS2) are not known. In this study, we analyzed the error rates of PCR at three template complexity levels using the divergent copies of ITS from the mushroom Agaricus subrufescens. Our results showed that PCR using the Phusion® High-Fidelity DNA Polymerase has a per nucleotide error rate of about 4 × 10–6 per replication. Among the detected mutations, transitions were much more frequent than transversions, insertions, and deletions. When divergent alleles were mixed as templates in the same reaction, a significant proportion (∼30%) of recombinant molecules were detected. The in vitro mixed-template results were comparable to those obtained from using the genomic DNA of the original mushroom specimen as template. Our results indicate that caution should be in place when interpreting ITS sequences from individual fungal specimens, especially those containing divergent ITS copies. Similar results could also happen to PCR-based analyses of other multicopy DNA fragments as well as single-copy DNA sequences with divergent alleles in diploid organisms.
Collapse
Affiliation(s)
- Jun-Liang Zhou
- Institute of Plant and Environment Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing Engineering Research Center for Edible Mushroom, Beijing, China.,International Exchange and Cooperation Department, Kunming University, Kunming, China
| | - Jianping Xu
- Department of Biology, McMaster University, Hamilton, ON, Canada.,Laboratory for Conservation and Utilization of Bio-Resources and Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming, China
| | - An-Guo Jiao
- Institute of Plant and Environment Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing Engineering Research Center for Edible Mushroom, Beijing, China
| | - Li Yang
- Institute of Plant and Environment Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing Engineering Research Center for Edible Mushroom, Beijing, China
| | - Jie Chen
- Instituto de Ecología, Veracruz, Mexico
| | | | - Yu Liu
- Institute of Plant and Environment Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing Engineering Research Center for Edible Mushroom, Beijing, China
| | - Shou-Xian Wang
- Institute of Plant and Environment Protection, Beijing Academy of Agriculture and Forestry Sciences, Beijing Engineering Research Center for Edible Mushroom, Beijing, China
| |
Collapse
|
17
|
Wichmann S, Ardern Z. Optimality in the standard genetic code is robust with respect to comparison code sets. Biosystems 2019; 185:104023. [DOI: 10.1016/j.biosystems.2019.104023] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/22/2019] [Accepted: 08/24/2019] [Indexed: 01/22/2023]
|
18
|
Kanthaswamy S, Oldt RF, Said R, Grijalva J, Falak A, Jensen A, Vizor C, Houghton P, Bunlungsup S, Malaivijitnond S, Smith DG. Partial sequence analyses of exon 7 of the ABO locus of cynomolgus (Macaca fascicularis) and rhesus (M. mulatta) macaques: Indeterminate phenotypes show the presence of the O blood group. HLA 2019; 94:482-492. [PMID: 31448567 DOI: 10.1111/tan.13675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 07/26/2019] [Accepted: 08/22/2019] [Indexed: 11/27/2022]
Abstract
Compatibility tests to identify A, B, and O alleles are critical for establishing suitable donor-recipient matches among experimental animals. Using a qPCR-based SNP probe assay, we have identified A, B, AB, and indeterminate blood group phenotypes in cynomolgus and rhesus macaques. We have hypothesized, albeit without molecular confirmation, that the indeterminate phenotype represents homozygosity for the null O allele at the macaque ABO locus. The indeterminate phenotype represents the unsuccessful detection of either A or B alleles using primers targeting the A-specific and B-specific single nucleotide polymorphisms (SNPs) in a variable region of exon 7 of the ABO locus. These SNPs are associated with two functional sites, detected using two allele-specific probes in the qPCR assay where the codons leucine and methionine (at codon 266) and glycine and alanine (at codon 268) are required for the synthesis of the A and B transferases, respectively. While reference sequences for the A and B alleles exhibited no novel mutations in the functional exon, plasmid Sanger sequence analyses showed unique mutations within the diagnostic target sites in 10 macaques exhibiting the indeterminate phenotype. Eight of these indeterminate individuals exhibited SNPs at codon 268 that should prevent the syntheses of an A or B transferase. While the two other indeterminate samples had functional codons that were consistent with A or B alleles, mutations in either their probe- or primer-binding sites that altered their peptide sequences probably impeded their detection by our assay.
Collapse
Affiliation(s)
- Sreetharan Kanthaswamy
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona.,California National Primate Research Center, University of California, Davis, California
| | - Robert F Oldt
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona.,Evolutionary Biology Graduate Program, School of Life Sciences, Arizona State University, Tempe, Arizona
| | - Ruweida Said
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona
| | - Jose Grijalva
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona
| | - Asiya Falak
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona
| | - Ashley Jensen
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona
| | - Choice Vizor
- School of Mathematical and Natural Sciences, Arizona State University (ASU) at the West Campus, Glendale, Arizona
| | | | - Srichan Bunlungsup
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi, Thailand.,Department of Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Suchinda Malaivijitnond
- National Primate Research Center of Thailand, Chulalongkorn University, Saraburi, Thailand.,Department of Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - David G Smith
- California National Primate Research Center, University of California, Davis, California.,Molecular Anthropology Laboratory, Department of Anthropology, University of California, Davis, California
| |
Collapse
|
19
|
Yang Z, Kim HJ, Le JT, McLendon C, Bradley KM, Kim MS, Hutter D, Hoshika S, Yaren O, Benner SA. Nucleoside analogs to manage sequence divergence in nucleic acid amplification and SNP detection. Nucleic Acids Res 2019; 46:5902-5910. [PMID: 29800323 PMCID: PMC6159519 DOI: 10.1093/nar/gky392] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 05/02/2018] [Indexed: 01/18/2023] Open
Abstract
Described here are the synthesis, enzymology and some applications of a purine nucleoside analog (H) designed to have two tautomeric forms, one complementary to thymidine (T), the other complementary to cytidine (C). The performance of H is compared by various metrics to performances of other 'biversal' analogs that similarly rely on tautomerism to complement both pyrimidines. These include (i) the thermodynamic stability of duplexes that pair these biversals with various standard nucleotides, (ii) the ability of the biversals to support polymerase chain reaction (PCR), (iii) the ability of primers containing biversals to equally amplify targets having polymorphisms in the primer binding site, and (iv) the ability of ligation-based assays to exploit the biversals to detect medically relevant single nucleotide polymorphisms (SNPs) in sequences flanked by medically irrelevant polymorphisms. One advantage of H over the widely used inosine 'universal base' and 'mixed sequence' probes is seen in ligation-based assays to detect SNPs. The need to detect medically relevant SNPs within ambiguous sequences is especially important when probing RNA viruses, which rapidly mutate to create drug resistance, but also suffer neutral drift, the second obstructing simple methods to detect the first. Thus, H is being developed to detect variants of viruses that are rapidly mutating.
Collapse
Affiliation(s)
- Zunyi Yang
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Hyo-Joong Kim
- Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Jennifer T Le
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Chris McLendon
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Kevin M Bradley
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Myong-Sang Kim
- Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Daniel Hutter
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Shuichi Hoshika
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| | - Ozlem Yaren
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA
| | - Steven A Benner
- Foundation for Applied Molecular Evolution (FfAME), 13709 Progress Boulevard, Box 7, Alachua, FL 32615, USA.,Firebird Biomolecular Sciences LLC, 13709 Progress Blvd, Box 17, Alachua, FL 32615, USA
| |
Collapse
|
20
|
Pértille F, Da Silva VH, Johansson AM, Lindström T, Wright D, Coutinho LL, Jensen P, Guerrero-Bosagna C. Mutation dynamics of CpG dinucleotides during a recent event of vertebrate diversification. Epigenetics 2019; 14:685-707. [PMID: 31070073 PMCID: PMC6557589 DOI: 10.1080/15592294.2019.1609868] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
DNA methylation in CpGs dinucleotides is associated with high mutability and disappearance of CpG sites during evolution. Although the high mutability of CpGs is thought to be relevant for vertebrate evolution, very little is known on the role of CpG-related mutations in the genomic diversification of vertebrates. Our study analysed genetic differences in chickens, between Red Junglefowl (RJF; the living closest relative to the ancestor of domesticated chickens) and domesticated breeds, to identify genomic dynamics that have occurred during the process of their domestication, focusing particularly on CpG-related mutations. Single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) between RJF and these domesticated breeds were assessed in a reduced fraction of their genome. Additionally, DNA methylation in the same fraction of the genome was measured in the sperm of RJF individuals to identify possible correlations with the mutations found between RJF and the domesticated breeds. Our study shows that although the vast majority of CpG-related mutations found relate to CNVs, CpGs disproportionally associate to SNPs in comparison to CNVs, where they are indeed substantially under-represented. Moreover, CpGs seem to be hotspots of mutations related to speciation. We suggest that, on the one hand, CpG-related mutations in CNV regions would promote genomic ‘flexibility’ in evolution, i.e., the ability of the genome to expand its functional possibilities; on the other hand, CpG-related mutations in SNPs would relate to genomic ‘specificity’ in evolution, thus, representing mutations that would associate with phenotypic traits relevant for speciation.
Collapse
Affiliation(s)
- Fábio Pértille
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden.,b Animal Biotechnology Laboratory, Animal Science Department , University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ) , Piracicaba , São Paulo , Brazil
| | - Vinicius H Da Silva
- c Animal Breeding and Genomics Centre , Wageningen University & Research , Wageningen , The Netherlands.,d Department of Animal Ecology (AnE) , Netherlands Institute of Ecology (NIOO-KNAW) , Wageningen , The Netherlands.,e Department of Animal Breeding and Genetics , Swedish University of Agricultural Sciences , Uppsala , Sweden
| | - Anna M Johansson
- e Department of Animal Breeding and Genetics , Swedish University of Agricultural Sciences , Uppsala , Sweden
| | - Tom Lindström
- f Division of Theoretical Biology, IFM , Linköping University , Linköping , Sweden
| | - Dominic Wright
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden
| | - Luiz L Coutinho
- b Animal Biotechnology Laboratory, Animal Science Department , University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ) , Piracicaba , São Paulo , Brazil
| | - Per Jensen
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden
| | - Carlos Guerrero-Bosagna
- a Avian Behavioral Genomics and Physiology Group, IFM Biology , Linköping University , Linköping , Sweden
| |
Collapse
|
21
|
Arabnejad M, Dawkins BA, Bush WS, White BC, Harkness AR, McKinney BA. Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS. BioData Min 2018; 11:23. [PMID: 30410580 PMCID: PMC6215626 DOI: 10.1186/s13040-018-0186-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 10/22/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding. METHODS We introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity. RESULTS We apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias. CONCLUSIONS Our results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment.
Collapse
Affiliation(s)
- M. Arabnejad
- Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA
| | - B. A. Dawkins
- Department of Mathematics, The University of Tulsa, Tulsa, OK 74104 USA
| | - W. S. Bush
- Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road, Cleveland, OH 44106 USA
| | - B. C. White
- Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA
| | - A. R. Harkness
- Department of Psychology, The University of Tulsa, Tulsa, OK 74104 USA
| | - B. A. McKinney
- Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA
- Department of Mathematics, The University of Tulsa, Tulsa, OK 74104 USA
| |
Collapse
|
22
|
Yamagata Y, Yoshimura A, Anai T, Watanabe S. Selection criteria for SNP loci to maximize robustness of high-resolution melting analysis for plant breeding. BREEDING SCIENCE 2018; 68:488-498. [PMID: 30369824 PMCID: PMC6198901 DOI: 10.1270/jsbbs.18048] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 06/30/2018] [Indexed: 06/01/2023]
Abstract
DNA markers are useful for identifying genes and developing new genetic materials for breeding and genetic research. High-resolution melting (HRM) analysis can detect a single nucleotide polymorphism (SNP) in two polymerase chain reaction (PCR) fragments as a melting temperature (Tm) difference without additional experimental steps, such as gel electrophoresis. To design a method for developing reliable HRM markers that discriminate between homozygous alleles containing SNPs, we tested new evaluation indexes related to the thermodynamics of double-stranded DNA to find one that maximizes the difference in Tm values between PCR fragments. We found that differences in the change in Gibbs free energy (ΔG°) correlated with actual differences in Tm values. Optimization of the nearest neighboring nucleotide (NNN) of a SNP by nucleotide substitution in the primer and reducing the size of the PCR fragment both enlarged the actual differences in Tm. The genetic DNA markers we developed by NNN substitution, termed NNNs-HRM markers, could be precisely mapped within soybean chromosomes by linkage analysis. We developed a Perl script pipeline to enable the automatic design of a massive number of NNNs-HRM markers; these scripts are freely available and would be useful for practical breeding programs for other plant species.
Collapse
Affiliation(s)
- Yoshiyuki Yamagata
- Faculty of Agriculture, Kyushu University,
744 Motooka, Nishi, Fukuoka 819-0395,
Japan
| | - Atsushi Yoshimura
- Faculty of Agriculture, Kyushu University,
744 Motooka, Nishi, Fukuoka 819-0395,
Japan
| | - Toyoaki Anai
- Faculty of Agriculture, Saga University,
1 Honjo-machi, Saga, Saga 840-8502,
Japan
| | - Satoshi Watanabe
- Faculty of Agriculture, Saga University,
1 Honjo-machi, Saga, Saga 840-8502,
Japan
| |
Collapse
|
23
|
Shaw JLA, Judy JD, Kumar A, Bertsch P, Wang MB, Kirby JK. Incorporating Transgenerational Epigenetic Inheritance into Ecological Risk Assessment Frameworks. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2017; 51:9433-9445. [PMID: 28745897 DOI: 10.1021/acs.est.7b01094] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Chronic exposure to environmental contaminants can induce heritable "transgenerational" modifications to organisms, potentially affecting future ecosystem health and functionality. Incorporating transgenerational epigenetic heritability into risk assessment procedures has been previously suggested. However, a critical review of existing literature yielded numerous studies claiming transgenerational impacts, with little compelling evidence. Therefore, contaminant-induced epigenetic inheritance may be less common than is reported in the literature. We identified a need for multigeneration epigenetic studies that extend beyond what could be deemed "direct exposure" to F1 and F2 gametes and also include subsequent multiple nonexposed generations to adequately evaluate transgenerational recovery times. Also, increased experimental replication is required to account for the highly variable nature of epigenetic responses and apparent irreproducibility of current studies. Further, epigenetic end points need to be correlated with observable detrimental organism changes before a need for risk management can be properly determined. We suggest that epigenetic-based contaminant studies include concentrations lower than current "EC10-20" or "Lowest Observable Effect Concentrations" for the organism's most sensitive phenotypic end point, as higher concentrations are likely already regulated. Finally, we propose a regulatory framework and optimal experimental design that enables transgenerational epigenetic effects to be assessed and incorporated into conventional ecotoxicological testing.
Collapse
Affiliation(s)
- Jennifer L A Shaw
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) , Land and Water, Environmental Contaminant Mitigation and Technologies Research Program, Waite Road, Urrbrae, Adelaide Australia , 5064
| | - Jonathan D Judy
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) , Land and Water, Environmental Contaminant Mitigation and Technologies Research Program, Waite Road, Urrbrae, Adelaide Australia , 5064
- University of Florida , Soil and Water Sciences Department, 1692 McCarthy Drive, Gainesville, Florida 32611, United States
| | - Anupama Kumar
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) , Land and Water, Environmental Contaminant Mitigation and Technologies Research Program, Waite Road, Urrbrae, Adelaide Australia , 5064
| | - Paul Bertsch
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Land and Water , Brisbane, Queensland Australia , 4001
| | - Ming-Bo Wang
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) , Agriculture and Food Unit, Black Mountain, Canberra, Australian Capital Territory, Australia , 2601
| | - Jason K Kirby
- Commonwealth Scientific and Industrial Research Organisation (CSIRO) , Land and Water, Environmental Contaminant Mitigation and Technologies Research Program, Waite Road, Urrbrae, Adelaide Australia , 5064
| |
Collapse
|
24
|
Tokizawa M, Kusunoki K, Koyama H, Kurotani A, Sakurai T, Suzuki Y, Sakamoto T, Kurata T, Yamamoto YY. Identification of Arabidopsis genic and non-genic promoters by paired-end sequencing of TSS tags. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017; 90:587-605. [PMID: 28214361 DOI: 10.1111/tpj.13511] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 02/02/2017] [Accepted: 02/06/2017] [Indexed: 06/06/2023]
Abstract
Information about transcription start sites (TSSs) provides baseline data for the analysis of promoter architecture. In this paper we used paired- and single-end deep sequencing to analyze Arabidopsis TSS tags from several libraries prepared from roots, shoots, flowers and etiolated seedlings. The clustering of approximately 33 million mapped TSS tags led to the identification of 324 461 promoters that covered 79.7% (21 672/27 206) of protein-coding genes in the Arabidopsis genome. In addition we identified intragenic, antisense and orphan promoters that were not associated with any gene models. Of these, intragenic promoters exhibited unique characteristics regarding dinucleotide sequences at TSSs and core promoter element composition, suggesting that these promoters use different mechanisms of transcriptional initiation. An analysis of base composition with regard to promoter position revealed a low GC content throughout the promoter region and several local strand biases that were evident for TATA-type promoters, but not for Coreless-type promoters. Most observed strand biases coincided with strand biases of single nucleotide polymorphism rate. Our analysis also revealed that transcription of a gene is supported by an average of 2.7 genic promoters, among which one specific promoter, designated as a top promoter, substantially determines the expression level of the gene.
Collapse
Affiliation(s)
- Mutsutomo Tokizawa
- United Graduate School of Agriculture, Gifu University, Yanagido 1-1, Gifu City, Gifu, 501-1193, Japan
| | - Kazutaka Kusunoki
- United Graduate School of Agriculture, Gifu University, Yanagido 1-1, Gifu City, Gifu, 501-1193, Japan
| | - Hiroyuki Koyama
- United Graduate School of Agriculture, Gifu University, Yanagido 1-1, Gifu City, Gifu, 501-1193, Japan
- Faculty of Applied Biological Sciences, Gifu University, Yanagido 1-1, Gifu City, Gifu, 501-1193, Japan
| | - Atsushi Kurotani
- RIKEN Center for Sustainable Resource Science, Suehiro-cho 1-7-22, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Tetsuya Sakurai
- RIKEN Center for Sustainable Resource Science, Suehiro-cho 1-7-22, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Yutaka Suzuki
- Institute of Medical Science, University of Tokyo, Shiroganedai 4-6-1, Minato-ku, Tokyo, 108-8639, Japan
| | - Tomoaki Sakamoto
- Plant Global Education Project, Graduate School of Biological Sciences, Nara Institute of Science and Technology, Takayam-cho 8916-5, Ikoma, Nara, 630-0192, Japan
| | - Tetsuya Kurata
- Plant Global Education Project, Graduate School of Biological Sciences, Nara Institute of Science and Technology, Takayam-cho 8916-5, Ikoma, Nara, 630-0192, Japan
| | - Yoshiharu Y Yamamoto
- United Graduate School of Agriculture, Gifu University, Yanagido 1-1, Gifu City, Gifu, 501-1193, Japan
- Faculty of Applied Biological Sciences, Gifu University, Yanagido 1-1, Gifu City, Gifu, 501-1193, Japan
- RIKEN Center for Sustainable Resource Science, Suehiro-cho 1-7-22, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- JST ALCA, Tokyo, Japan
| |
Collapse
|
25
|
Guerrero-Bosagna C. Evolution with No Reason: A Neutral View on Epigenetic Changes, Genomic Variability, and Evolutionary Novelty. Bioscience 2017. [DOI: 10.1093/biosci/bix021] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
26
|
McCarrey JR, Lehle JD, Raju SS, Wang Y, Nilsson EE, Skinner MK. Tertiary Epimutations - A Novel Aspect of Epigenetic Transgenerational Inheritance Promoting Genome Instability. PLoS One 2016; 11:e0168038. [PMID: 27992467 PMCID: PMC5167269 DOI: 10.1371/journal.pone.0168038] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 11/23/2016] [Indexed: 11/29/2022] Open
Abstract
Exposure to environmental factors can induce the epigenetic transgenerational inheritance of disease. Alterations to the epigenome termed “epimutations” include “primary epimutations” which are epigenetic alterations in the absence of genetic change and “secondary epimutations” which form following an initial genetic change. To determine if secondary epimutations contribute to transgenerational transmission of disease following in utero exposure to the endocrine disruptor vinclozolin, we exposed pregnant female rats carrying the lacI mutation-reporter transgene to vinclozolin and assessed the frequency of mutations in kidney tissue and sperm recovered from F1 and F3 generation progeny. Our results confirm that vinclozolin induces primary epimutations rather than secondary epimutations, but also suggest that some primary epimutations can predispose a subsequent accelerated accumulation of genetic mutations in F3 generation descendants that have the potential to contribute to transgenerational phenotypes. We therefore propose the existence of “tertiary epimutations” which are initial primary epimutations that promote genome instability leading to an accelerated accumulation of genetic mutations.
Collapse
Affiliation(s)
- John R. McCarrey
- Department of Biology, University of Texas at San Antonio, San Antonio, TX United States of America
- * E-mail:
| | - Jake D. Lehle
- Department of Biology, University of Texas at San Antonio, San Antonio, TX United States of America
| | - Seetha S. Raju
- Department of Biology, University of Texas at San Antonio, San Antonio, TX United States of America
| | - Yufeng Wang
- Department of Biology, University of Texas at San Antonio, San Antonio, TX United States of America
| | - Eric E. Nilsson
- Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA United States of America
| | - Michael K. Skinner
- Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA United States of America
| |
Collapse
|
27
|
Comparing genetic variants detected in the 1000 genomes project with SNPs determined by the International HapMap Consortium. J Genet 2016; 94:731-40. [PMID: 26690529 DOI: 10.1007/s12041-015-0588-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Single-nucleotide polymorphisms (SNPs) determined based on SNP arrays from the international HapMap consortium (HapMap) and the genetic variants detected in the 1000 genomes project (1KGP) can serve as two references for genomewide association studies (GWAS). We conducted comparative analyses to provide a means for assessing concerns regarding SNP array-based GWAS findings as well as for realistically bounding expectations for next generation sequencing (NGS)-based GWAS. We calculated and compared base composition, transitions to transversions ratio, minor allele frequency and heterozygous rate for SNPs from HapMap and 1KGP for the 622 common individuals. We analysed the genotype discordance between HapMap and 1KGP to assess consistency in the SNPs from the two references. In 1KGP, 90.58% of 36,817,799 SNPs detected were not measured in HapMap. More SNPs with minor allele frequencies less than 0.01 were found in 1KGP than HapMap. The two references have low disc ordance (generally smaller than 0.02) in genotypes of common SNPs, with most discordance from heterozygous SNPs. Our study demonstrated that SNP array-based GWAS findings were reliable and useful, although only a small portion of genetic variances were explained. NGS can detect not only common but also rare variants, supporting the expectation that NGS-based GWAS will be able to incorporate a much larger portion of genetic variance than SNP arrays-based GWAS.
Collapse
|
28
|
Li X, Kong J, Meng X, Luo K, Luan S, Cao B, Liu N. Isolation and expression analysis of an MAPKK gene from Fenneropenaeus chinensis in response to white spot syndrome virus infection. FISH & SHELLFISH IMMUNOLOGY 2016; 55:116-122. [PMID: 27164214 DOI: 10.1016/j.fsi.2016.05.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 05/04/2016] [Accepted: 05/05/2016] [Indexed: 06/05/2023]
Abstract
Mitogen-activated kinase kinase (MAPKK) is an important gene involved in the host-virus interaction process. To obtain a better understanding of MAPKK in the interaction process between the Chinese shrimp Fenneropenaeus chinensis and white spot syndrome virus (WSSV), we cloned the sequence of an MAPKK cDNA from F. chinensis (FcMAPKK) and investigated the effect of FcMAPKK on WSSV infection. The results showed that the FcMAPKK gene contained a 1227 bp open reading frame (ORF), which encoded a highly conserved protein with a serine/threonine protein kinase catalytic (S_TKc) domain. The deduced amino acid sequence of FcMAPKK shared identities between 11.9 and 92.6% with MAPKKs from vertebrate, invertebrate, plant and fungus species. The FcMAPKK was expressed in all the examined tissues in the normal F. chinensis. FcMAPKK expression level was highest in the hepatopancreas where it was approximately 2.6-fold the expression level in the gill, and lowest in the muscle where it was approximately 0.3-fold the expression level in the hepatopancreas. The FcMAPKK expression levels in the muscle, gill, and hepatopancreas were all changed post WSSV challenge. The FcMAPKK expression was significantly (P < 0.01) up-regulated in the muscle of F. chinensis at 48 h post WSSV infection. The WSSV began to replicate quickly in the normal F. chinensis at 48 h post infection, while the WSSV replication in the U0126-treated F. chinensis could be significantly (P < 0.05) inhibited. The results suggested that FcMAPKK might be involved in the WSSV infection process, and hijacking of FcMAPKK might be required for WSSV replication in F. chinensis.
Collapse
Affiliation(s)
- Xupeng Li
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China
| | - Jie Kong
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China; Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, 1 Wenhai Road, Qingdao, 266300, PR China.
| | - Xianhong Meng
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China
| | - Kun Luo
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China
| | - Sheng Luan
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China; Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, 1 Wenhai Road, Qingdao, 266300, PR China
| | - Baoxiang Cao
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China
| | - Ning Liu
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resources, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 106 Nanjing Road, Qingdao, 266071, PR China
| |
Collapse
|
29
|
Herraiz FJ, Blanca J, Ziarsolo P, Gramazio P, Plazas M, Anderson GJ, Prohens J, Vilanova S. The first de novo transcriptome of pepino (Solanum muricatum): assembly, comprehensive analysis and comparison with the closely related species S. caripense, potato and tomato. BMC Genomics 2016; 17:321. [PMID: 27142449 PMCID: PMC4855764 DOI: 10.1186/s12864-016-2656-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/25/2016] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Solanum sect. Basarthrum is phylogenetically very close to potatoes (Solanum sect. Petota) and tomatoes (Solanum sect. Lycopersicon), two groups with great economic importance, and for which Solanum sect. Basarthrum represents a tertiary gene pool for breeding. This section includes the important regional cultigen, the pepino (Solanum muricatum), and several wild species. Among the wild species, S. caripense is prominent due to its major involvement in the origin of pepino and its wide geographical distribution. Despite the value of the pepino as an emerging crop, and the potential for gene transfer from both the pepino and S. caripense to potatoes and tomatoes, there has been virtually no genomic study of these species. RESULTS Using Illumina HiSeq 2000, RNA-Seq was performed with a pool of three tissues (young leaf, flowers in pre-anthesis and mature fruits) from S. muricatum and S. caripense, generating almost 111,000,000 reads among the two species. A high quality de novo transcriptome was assembled from S. muricatum clean reads resulting in 75,832 unigenes with an average length of 704 bp. These unigenes were functionally annotated based on similarity of public databases. We used Blast2GO, to conduct an exhaustive study of the gene ontology, including GO terms, EC numbers and KEGG pathways. Pepino unigenes were compared to both potato and tomato genomes in order to determine their estimated relative position, and to infer gene prediction models. Candidate genes related to traits of interest in other Solanaceae were evaluated by presence or absence and compared with S. caripense transcripts. In addition, by studying five genes, the phylogeny of pepino and five other members of the family, Solanaceae, were studied. The comparison of S. caripense reads against S. muricatum assembled transcripts resulted in thousands of intra- and interspecific nucleotide-level variants. In addition, more than 1000 SSRs were identified in the pepino transcriptome. CONCLUSIONS This study represents the first genomic resource for the pepino. We suggest that the data will be useful not only for improvement of the pepino, but also for potato and tomato breeding and gene transfer. The high quality of the transcriptome presented here also facilitates comparative studies in the genus Solanum. The accurate transcript annotation will enable us to figure out the gene function of particular traits of interest. The high number of markers (SSR and nucleotide-level variants) obtained will be useful for breeding programs, as well as studies of synteny, diversity evolution, and phylogeny.
Collapse
Affiliation(s)
- Francisco J. Herraiz
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| | - José Blanca
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| | - Pello Ziarsolo
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| | - Pietro Gramazio
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| | - Mariola Plazas
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| | - Gregory J. Anderson
- />Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06268-3043 USA
| | - Jaime Prohens
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| | - Santiago Vilanova
- />Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, Camino de Vera 14, 46022 Valencia Spain
| |
Collapse
|
30
|
Arenas M. Trends in substitution models of molecular evolution. Front Genet 2015; 6:319. [PMID: 26579193 PMCID: PMC4620419 DOI: 10.3389/fgene.2015.00319] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/09/2015] [Indexed: 11/13/2022] Open
Abstract
Substitution models of evolution describe the process of genetic variation through fixed mutations and constitute the basis of the evolutionary analysis at the molecular level. Almost 40 years after the development of first substitution models, highly sophisticated, and data-specific substitution models continue emerging with the aim of better mimicking real evolutionary processes. Here I describe current trends in substitution models of DNA, codon and amino acid sequence evolution, including advantages and pitfalls of the most popular models. The perspective concludes that despite the large number of currently available substitution models, further research is required for more realistic modeling, especially for DNA coding and amino acid data. Additionally, the development of more accurate complex models should be coupled with new implementations and improvements of methods and frameworks for substitution model selection and downstream evolutionary analysis.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto Porto, Portugal
| |
Collapse
|
31
|
Plyler ZE, Hill AE, McAtee CW, Cui X, Moseley LA, Sorscher EJ. SNP Formation Bias in the Murine Genome Provides Evidence for Parallel Evolution. Genome Biol Evol 2015; 7:2506-19. [PMID: 26253317 PMCID: PMC4607513 DOI: 10.1093/gbe/evv150] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this study, we show novel DNA motifs that promote single nucleotide polymorphism (SNP) formation and are conserved among exons, introns, and intergenic DNA from mice (Sanger Mouse Genomes Project), human genes (1000 Genomes), and tumor-specific somatic mutations (data from TCGA). We further characterize SNPs likely to be very recent in origin (i.e., formed in otherwise congenic mice) and show enrichment for both synonymous and parallel DNA variants occurring under circumstances not attributable to purifying selection. The findings provide insight regarding SNP contextual bias and eukaryotic codon usage as strategies that favor long-term exonic stability. The study also furnishes new information concerning rates of murine genomic evolution and features of DNA mutagenesis (at the time of SNP formation) that should be viewed as "adaptive."
Collapse
Affiliation(s)
| | - Aubrey E Hill
- Department of Computer and Information Sciences, University of Alabama at Birmingham
| | - Christopher W McAtee
- Gregory Fleming James Cystic Fibrosis Research Center, University of Alabama at Birmingham
| | - Xiangqin Cui
- Department of Biostatistics, University of Alabama at Birmingham
| | - Leah A Moseley
- Gregory Fleming James Cystic Fibrosis Research Center, University of Alabama at Birmingham
| | - Eric J Sorscher
- Department of Pediatrics, Emory University School of Medicine
| |
Collapse
|
32
|
Quashie PK, Oliviera M, Veres T, Osman N, Han YS, Hassounah S, Lie Y, Huang W, Mesplède T, Wainberg MA. Differential effects of the G118R, H51Y, and E138K resistance substitutions in different subtypes of HIV integrase. J Virol 2015. [PMID: 25552724 DOI: 10.1128/jvi.03353-3314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2023] Open
Abstract
UNLABELLED Dolutegravir (DTG) is the latest antiretroviral (ARV) approved for the treatment of human immunodeficiency virus (HIV) infection. The G118R substitution, previously identified with MK-2048 and raltegravir, may represent the initial substitution in a dolutegravir resistance pathway. We have found that subtype C integrase proteins have a low enzymatic cost associated with the G118R substitution, mostly at the strand transfer step of integration, compared to either subtype B or recombinant CRF02_AG proteins. Subtype B and circulating recombinant form AG (CRF02_AG) clonal viruses encoding G118R-bearing integrases were severely restricted in their viral replication capacity, and G118R/E138K-bearing viruses had various levels of resistance to dolutegravir, raltegravir, and elvitegravir. In cell-free experiments, the impacts of the H51Y and E138K substitutions on resistance and enzyme efficiency, when present with G118R, were highly dependent on viral subtype. Sequence alignment and homology modeling showed that the subtype-specific effects of these mutations were likely due to differential amino acid residue networks in the different integrase proteins, caused by polymorphic residues, which significantly affect native protein activity, structure, or function and are important for drug-mediated inhibition of enzyme activity. This preemptive study will aid in the interpretation of resistance patterns in dolutegravir-treated patients. IMPORTANCE Recognized drug resistance mutations have never been reported for naive patients treated with dolutegravir. Additionally, in integrase inhibitor-experienced patients, only R263K and other previously known integrase resistance substitutions have been reported. Here we suggest that alternate resistance pathways may develop in non-B HIV-1 subtypes and explain how "minor" polymorphisms and substitutions in HIV integrase that are associated with these subtypes can influence resistance against dolutegravir. This work also highlights the importance of phenotyping versus genotyping when a strong inhibitor such as dolutegravir is being used. By characterizing the G118R substitution, this work also preemptively defines parameters for a potentially important pathway in some non-B HIV subtype viruses treated with dolutegravir and will aid in the inhibition of such a virus, if detected. The general inability of strand transfer-related substitutions to diminish 3' processing indicates the importance of the 3' processing step and highlights a therapeutic angle that needs to be better exploited.
Collapse
Affiliation(s)
- Peter K Quashie
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada Division of Experimental Medicine, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Maureen Oliviera
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Tamar Veres
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Nathan Osman
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada Department of Microbiology and Immunology, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Ying-Shan Han
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Said Hassounah
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada Division of Experimental Medicine, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Yolanda Lie
- Monogram Biosciences, South San Francisco, California, USA
| | - Wei Huang
- Monogram Biosciences, South San Francisco, California, USA
| | - Thibault Mesplède
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Mark A Wainberg
- McGill University AIDS Centre, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada Division of Experimental Medicine, Faculty of Medicine, McGill University, Montreal, Quebec, Canada Department of Microbiology and Immunology, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
33
|
Differential effects of the G118R, H51Y, and E138K resistance substitutions in different subtypes of HIV integrase. J Virol 2014; 89:3163-75. [PMID: 25552724 DOI: 10.1128/jvi.03353-14] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
UNLABELLED Dolutegravir (DTG) is the latest antiretroviral (ARV) approved for the treatment of human immunodeficiency virus (HIV) infection. The G118R substitution, previously identified with MK-2048 and raltegravir, may represent the initial substitution in a dolutegravir resistance pathway. We have found that subtype C integrase proteins have a low enzymatic cost associated with the G118R substitution, mostly at the strand transfer step of integration, compared to either subtype B or recombinant CRF02_AG proteins. Subtype B and circulating recombinant form AG (CRF02_AG) clonal viruses encoding G118R-bearing integrases were severely restricted in their viral replication capacity, and G118R/E138K-bearing viruses had various levels of resistance to dolutegravir, raltegravir, and elvitegravir. In cell-free experiments, the impacts of the H51Y and E138K substitutions on resistance and enzyme efficiency, when present with G118R, were highly dependent on viral subtype. Sequence alignment and homology modeling showed that the subtype-specific effects of these mutations were likely due to differential amino acid residue networks in the different integrase proteins, caused by polymorphic residues, which significantly affect native protein activity, structure, or function and are important for drug-mediated inhibition of enzyme activity. This preemptive study will aid in the interpretation of resistance patterns in dolutegravir-treated patients. IMPORTANCE Recognized drug resistance mutations have never been reported for naive patients treated with dolutegravir. Additionally, in integrase inhibitor-experienced patients, only R263K and other previously known integrase resistance substitutions have been reported. Here we suggest that alternate resistance pathways may develop in non-B HIV-1 subtypes and explain how "minor" polymorphisms and substitutions in HIV integrase that are associated with these subtypes can influence resistance against dolutegravir. This work also highlights the importance of phenotyping versus genotyping when a strong inhibitor such as dolutegravir is being used. By characterizing the G118R substitution, this work also preemptively defines parameters for a potentially important pathway in some non-B HIV subtype viruses treated with dolutegravir and will aid in the inhibition of such a virus, if detected. The general inability of strand transfer-related substitutions to diminish 3' processing indicates the importance of the 3' processing step and highlights a therapeutic angle that needs to be better exploited.
Collapse
|
34
|
Hardy I, Brenner B, Quashie P, Thomas R, Petropoulos C, Huang W, Moisi D, Wainberg MA, Roger M. Evolution of a novel pathway leading to dolutegravir resistance in a patient harbouring N155H and multiclass drug resistance. J Antimicrob Chemother 2014; 70:405-11. [PMID: 25281399 DOI: 10.1093/jac/dku387] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES Dolutegravir has been recently approved for treatment-naive and -experienced HIV-infected subjects, including integrase inhibitor (INI)-experienced patients. Dolutegravir is a second-generation INI that can overcome many prior raltegravir and elvitegravir failures. Here, we report the evolution of resistance to dolutegravir in a highly treatment-experienced patient harbouring the major N155H mutation consequent to raltegravir treatment failure. METHODS Genotypic and phenotypic analyses were done on longitudinal samples to determine viral resistance to INIs. Integrase amino acid sequence interactions with raltegravir and dolutegravir were assessed by molecular modelling and docking simulations. RESULTS Five mutations (A49P, L68FL, T97A, E138K and L234V) were implicated in emergent dolutegravir resistance, with a concomitant severe compromise in viral replicative capacity. Molecular modelling and docking simulations revealed that dolutegravir binding to integrase was affected by these acquired dolutegravir mutations. CONCLUSIONS Our findings identify a novel mutational pathway involving integrase mutations A49P and L234V, leading to dolutegravir resistance in a patient with the N155H raltegravir mutation.
Collapse
Affiliation(s)
- Isabelle Hardy
- Centre hospitalier de l'Université de Montréal (CHUM), Montréal, Québec, Canada
| | - Bluma Brenner
- McGill AIDS Center, Lady Davis Institute, Jewish General Hospital, McGill University, Montréal, Québec, Canada
| | - Peter Quashie
- McGill AIDS Center, Lady Davis Institute, Jewish General Hospital, McGill University, Montréal, Québec, Canada
| | - Réjean Thomas
- Clinique Médicale L'Actuel, Montréal, Québec, Canada
| | | | - Wei Huang
- Monogram Biosciences, South San Francisco, CA, USA
| | - Daniela Moisi
- McGill AIDS Center, Lady Davis Institute, Jewish General Hospital, McGill University, Montréal, Québec, Canada
| | - Mark A Wainberg
- McGill AIDS Center, Lady Davis Institute, Jewish General Hospital, McGill University, Montréal, Québec, Canada
| | - Michel Roger
- Centre hospitalier de l'Université de Montréal (CHUM), Montréal, Québec, Canada Département de microbiologie, infectiologie et immunologie, Faculté de médecine, Université de Montréal, Montréal, Québec, Canada
| |
Collapse
|
35
|
Ram H, Kumar A, Thomas L, Singh VP. In silico Approach to Study Adaptive Divergence in Nucleotide Composition of the 16S rRNA Gene Among Bacteria Thriving Under Different Temperature Regimes. J Comput Biol 2014; 21:753-9. [DOI: 10.1089/cmb.2014.0116] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Hari Ram
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| | - Alok Kumar
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| | - Lebin Thomas
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| | - Ved Pal Singh
- Applied Microbiology and Biotechnology Laboratory, Department of Botany, University of Delhi, Delhi, India
| |
Collapse
|
36
|
Genetic code evolution started with the incorporation of glycine, followed by other small hydrophilic amino acids. J Mol Evol 2014; 78:307-9. [PMID: 24916657 DOI: 10.1007/s00239-014-9627-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 05/23/2014] [Indexed: 10/25/2022]
Abstract
We propose that glycine was the first amino acid to be incorporated into the genetic code, followed by serine, aspartic and/or glutamic acid-small hydrophilic amino acids that all have codons in the bottom right-hand corner of the standard genetic code table. Because primordial ribosomal synthesis is presumed to have been rudimentary, this stage would have been characterized by the synthesis of short, water-soluble peptides, the first of which would have comprised polyglycine. Evolution of the code is proposed to have occurred by the duplication and mutation of tRNA sequences, which produced a radiation of codon assignment outwards from the bottom right-hand corner. As a result of this expansion, we propose a trend from small hydrophilic to hydrophobic amino acids, with selection for longer polypeptides requiring a hydrophobic core for folding and stability driving the incorporation of hydrophobic amino acids into the code.
Collapse
|
37
|
Vergara IA, Tarailo-Graovac M, Frech C, Wang J, Qin Z, Zhang T, She R, Chu JSC, Wang K, Chen N. Genome-wide variations in a natural isolate of the nematode Caenorhabditis elegans. BMC Genomics 2014; 15:255. [PMID: 24694239 PMCID: PMC4023591 DOI: 10.1186/1471-2164-15-255] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Accepted: 03/03/2014] [Indexed: 12/02/2022] Open
Abstract
Background Increasing genetic and phenotypic differences found among natural isolates of C. elegans have encouraged researchers to explore the natural variation of this nematode species. Results Here we report on the identification of genomic differences between the reference strain N2 and the Hawaiian strain CB4856, one of the most genetically distant strains from N2. To identify both small- and large-scale genomic variations (GVs), we have sequenced the CB4856 genome using both Roche 454 (~400 bps single reads) and Illumina GA DNA sequencing methods (101 bps paired-end reads). Compared to previously described variants (available in WormBase), our effort uncovered twice as many single nucleotide variants (SNVs) and increased the number of small InDels almost 20-fold. Moreover, we identified and validated large insertions, most of which range from 150 bps to 1.2 kb in length in the CB4856 strain. Identified GVs had a widespread impact on protein-coding sequences, including 585 single-copy genes that have associated severe phenotypes of reduced viability in RNAi and genetics studies. Sixty of these genes are homologs of human genes associated with diseases. Furthermore, our work confirms previously identified GVs associated with differences in behavioural and biological traits between the N2 and CB4856 strains. Conclusions The identified GVs provide a rich resource for future studies that aim to explain the genetic basis for other trait differences between the N2 and CB4856 strains.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Nansheng Chen
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6, Canada.
| |
Collapse
|
38
|
Bhattacharjee MJ, Ghosh SK. Design of mini-barcode for catfishes for assessment of archival biodiversity. Mol Ecol Resour 2013; 14:469-77. [PMID: 24314114 DOI: 10.1111/1755-0998.12198] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2013] [Revised: 10/26/2013] [Accepted: 10/28/2013] [Indexed: 11/28/2022]
Abstract
Recovery of DNA barcode sequences is often challenging from the archived specimens. However, short fragments of DNA may be recovered, which would significantly improve many unresolved taxonomic conflicts. Here, we designed a mini-barcode for catfishes comprising several species and many cryptic taxa. We analysed a data set of 3048 publicly available COI barcode sequences representing 547 worldwide catfish species and performed 152 628 interspecies comparisons. A significantly more positively correlated interspecies distance was detected with transversion (0.78, P < 0.001) than with transition (0.70, P < 0.001). This suggested that transversions were better diagnostics for species identification. In the aligned data set, two transversion-rich fragments (53 bp and 119 bp) were identified. Transition/transversion bias value was 1.04 in 53-bp fragment, 1.23 in 119-bp fragment and 1.50 in full-length barcode. The interspecies distance with full-length barcode was 0.212 ± 0.037, while that with 53-bp and 119-bp fragments was 0.325 ± 0.039 and 0.218 ± 0.045, respectively. Survey of 53-bp fragment showed a possibility of only 1144 barcodes, while that of 119-bp fragment showed >4 million barcodes. Thus, the 119-bp fragment is a viable mini-barcode for catfishes comprising >3000 extant species. Experiment with 82 archived catfishes showed successful recovery of this mini-barcode using the designed primer. The mini-barcode sequences showed species-specific similarity in the range of 98-100% with the global database. Therefore, survey of a transversion-rich fragment within the full-length barcode would be an ideal approach of mini-barcode design for biodiversity assessment.
Collapse
|
39
|
Clarke WE, Parkin IA, Gajardo HA, Gerhardt DJ, Higgins E, Sidebottom C, Sharpe AG, Snowdon RJ, Federico ML, Iniguez-Luy FL. Genomic DNA enrichment using sequence capture microarrays: a novel approach to discover sequence nucleotide polymorphisms (SNP) in Brassica napus L. PLoS One 2013; 8:e81992. [PMID: 24312619 PMCID: PMC3849492 DOI: 10.1371/journal.pone.0081992] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 10/20/2013] [Indexed: 12/24/2022] Open
Abstract
Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.
Collapse
Affiliation(s)
- Wayne E. Clarke
- Saskatoon Research Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Isobel A. Parkin
- Saskatoon Research Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Humberto A. Gajardo
- Genomics and Bioinformatics Unit, Agriaquaculture Nutritional Genomic Center (CGNA), Temuco, Louisiana, United States of America Araucanía, Chile
| | | | - Erin Higgins
- Saskatoon Research Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Christine Sidebottom
- Plant Biotechnology Institute, National Research Council Canada, Saskatoon, Saskatchewan, Canada
| | - Andrew G. Sharpe
- Plant Biotechnology Institute, National Research Council Canada, Saskatoon, Saskatchewan, Canada
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Maria L. Federico
- Genomics and Bioinformatics Unit, Agriaquaculture Nutritional Genomic Center (CGNA), Temuco, Louisiana, United States of America Araucanía, Chile
| | - Federico L. Iniguez-Luy
- Genomics and Bioinformatics Unit, Agriaquaculture Nutritional Genomic Center (CGNA), Temuco, Louisiana, United States of America Araucanía, Chile
- * E-mail:
| |
Collapse
|
40
|
A DNA-centric protein interaction map of ultraconserved elements reveals contribution of transcription factor binding hubs to conservation. Cell Rep 2013; 5:531-45. [PMID: 24139795 DOI: 10.1016/j.celrep.2013.09.022] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Revised: 08/06/2013] [Accepted: 09/11/2013] [Indexed: 12/19/2022] Open
Abstract
Ultraconserved elements (UCEs) have been the subject of great interest because of their extreme sequence identity and their seemingly cryptic and largely uncharacterized functions. Although in vivo studies of UCE sequences have demonstrated regulatory activity, protein interactors at UCEs have not been systematically identified. Here, we combined high-throughput affinity purification, high-resolution mass spectrometry, and SILAC quantification to map intrinsic protein interactions for 193 UCE sequences. The interactome contains over 400 proteins, including transcription factors with known developmental roles. We demonstrate based on our data that UCEs consist of strongly conserved overlapping binding sites. We also generated a fine-resolution interactome of a UCE, confirming the hub-like nature of the element. The intrinsic interactions mapped here are reflected in open chromatin, as indicated by comparison with existing ChIP data. Our study argues for a strong contribution of protein-DNA interactions to UCE conservation and provides a basis for further functional characterization of UCEs.
Collapse
|
41
|
Bahi-Buisson N, Souville I, Fourniol FJ, Toussaint A, Moores CA, Houdusse A, Lemaitre JY, Poirier K, Khalaf-Nazzal R, Hully M, Leger PL, Elie C, Boddaert N, Beldjord C, Chelly J, Francis F. New insights into genotype-phenotype correlations for the doublecortin-related lissencephaly spectrum. ACTA ACUST UNITED AC 2013; 136:223-44. [PMID: 23365099 DOI: 10.1093/brain/aws323] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
X-linked isolated lissencephaly sequence and subcortical band heterotopia are allelic human disorders associated with mutations of doublecortin (DCX), giving both familial and sporadic forms. DCX encodes a microtubule-associated protein involved in neuronal migration during brain development. Structural data show that mutations can fall either in surface residues, likely to impair partner interactions, or in buried residues, likely to impair protein stability. Despite the progress in understanding the molecular basis of these disorders, the prognosis value of the location and impact of individual DCX mutations has largely remained unclear. To clarify this point, we investigated a cohort of 180 patients who were referred with the agyria-pachygyria subcortical band heterotopia spectrum. DCX mutations were identified in 136 individuals. Analysis of the parents' DNA revealed the de novo occurrence of DCX mutations in 76 cases [62 of 70 females screened (88.5%) and 14 of 60 males screened (23%)], whereas in the remaining cases, mutations were inherited from asymptomatic (n = 14) or symptomatic mothers (n = 11). This represents 100% of families screened. Female patients with DCX mutation demonstrated three degrees of clinical-radiological severity: a severe form with a thick band (n = 54), a milder form (n = 24) with either an anterior thin or an intermediate thickness band and asymptomatic carrier females (n = 14) with normal magnetic resonance imaging results. A higher proportion of nonsense and frameshift mutations were identified in patients with de novo mutations. An analysis of predicted effects of missense mutations showed that those destabilizing the structure of the protein were often associated with more severe phenotypes. We identified several severe- and mild-effect mutations affecting surface residues and observed that the substituted amino acid is also critical in determining severity. Recurrent mutations representing 34.5% of all DCX mutations often lead to similar phenotypes, for example, either severe in sporadic subcortical band heterotopia owing to Arg186 mutations or milder in familial cases owing to Arg196 mutations. Taken as a whole, these observations demonstrate that DCX-related disorders are clinically heterogeneous, with severe sporadic and milder familial subcortical band heterotopia, each associated with specific DCX mutations. There is a clear influence of the individual mutated residue and the substituted amino acid in determining phenotype severity.
Collapse
Affiliation(s)
- Nadia Bahi-Buisson
- Pediatric Neurology Hopital Necker Enfants Malades, Université Paris Descartes, APHP, 149 rue de Sevres 75015 Paris, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Ackermann AA, Panunzi LG, Cosentino RO, Sánchez DO, Agüero F. A genomic scale map of genetic diversity in Trypanosoma cruzi. BMC Genomics 2012; 13:736. [PMID: 23270511 PMCID: PMC3545726 DOI: 10.1186/1471-2164-13-736] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2012] [Accepted: 12/12/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale. RESULTS Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs): TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values. CONCLUSIONS This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the population, providing an essential resource for future studies on the development of new drugs and diagnostics, for Chagas Disease. These data is available through the TcSNP database (http://snps.tcruzi.org).
Collapse
Affiliation(s)
- Alejandro A Ackermann
- Instituto de Investigaciones Biotecnológicas - Instituto Tecnológico de Chascomús (IIB-INTECH), Universidad Nacional de San Martín - Consejo de Investigaciones Científicas y Técnicas (UNSAM-CONICET), Sede San Martín, B 1650 HMP, San Martín, Buenos Aires, Argentina
| | | | | | | | | |
Collapse
|
43
|
The interplay of mutations and electronic properties in disease-related genes. Sci Rep 2012; 2:272. [PMID: 22355784 PMCID: PMC3280594 DOI: 10.1038/srep00272] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2011] [Accepted: 01/16/2012] [Indexed: 11/13/2022] Open
Abstract
Electronic properties of DNA are believed to play a crucial role in many phenomena in living organisms, for example the location of DNA lesions by base excision repair (BER) glycosylases and the regulation of tumor-suppressor genes such as p53 by detection of oxidative damage. However, the reproducible measurement and modelling of charge migration through DNA molecules at the nanometer scale remains a challenging and controversial subject even after more than a decade of intense efforts. Here we show, by analysing 162 disease-related genes from a variety of medical databases with a total of almost 20,000 observed pathogenic mutations, a significant difference in the electronic properties of the population of observed mutations compared to the set of all possible mutations. Our results have implications for the role of the electronic properties of DNA in cellular processes, and hint at the possibility of prediction, early diagnosis and detection of mutation hotspots.
Collapse
|
44
|
Ahmadi A, Behm A, Honnalli N, Li C, Weng L, Xie X. Hobbes: optimized gram-based methods for efficient read alignment. Nucleic Acids Res 2011; 40:e41. [PMID: 22199254 PMCID: PMC3315303 DOI: 10.1093/nar/gkr1246] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Recent advances in sequencing technology have enabled the rapid generation of billions of bases at relatively low cost. A crucial first step in many sequencing applications is to map those reads to a reference genome. However, when the reference genome is large, finding accurate mappings poses a significant computational challenge due to the sheer amount of reads, and because many reads map to the reference sequence approximately but not exactly. We introduce Hobbes, a new gram-based program for aligning short reads, supporting Hamming and edit distance. Hobbes implements two novel techniques, which yield substantial performance improvements: an optimized gram-selection procedure for reads, and a cache-efficient filter for pruning candidate mappings. We systematically tested the performance of Hobbes on both real and simulated data with read lengths varying from 35 to 100 bp, and compared its performance with several state-of-the-art read-mapping programs, including Bowtie, BWA, mrsFast and RazerS. Hobbes is faster than all other read mapping programs we have tested while maintaining high mapping quality. Hobbes is about five times faster than Bowtie and about 2–10 times faster than BWA, depending on read length and error rate, when asked to find all mapping locations of a read in the human genome within a given Hamming or edit distance, respectively. Hobbes supports the SAM output format and is publicly available at http://hobbes.ics.uci.edu.
Collapse
Affiliation(s)
- Athena Ahmadi
- Department of Computer Science, University of California, Irvine, CA 92697, USA
| | | | | | | | | | | |
Collapse
|
45
|
Rivera I, Mendes D, Afonso Â, Barroso M, Ramos R, Janeiro P, Oliveira A, Gaspar A, Tavares de Almeida I. Phenylalanine hydroxylase deficiency: molecular epidemiology and predictable BH4-responsiveness in South Portugal PKU patients. Mol Genet Metab 2011; 104 Suppl:S86-92. [PMID: 21871829 DOI: 10.1016/j.ymgme.2011.07.026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2011] [Revised: 07/27/2011] [Accepted: 07/28/2011] [Indexed: 11/23/2022]
Abstract
Hyperphenylalaninemia (HPA, OMIM #261600), which includes phenylketonuria (PKU), is caused by mutations in the gene encoding phenylalanine hydroxylase (PAH), being already described more than 600 different mutations. Genotype-phenotype correlation is a useful tool to predict the metabolic phenotype, to establish the better tailored diet and, more recently, to assess the potential responsiveness to BH(4) therapy, a current theme on PKU field. The aim of this study was the molecular analysis of the PAH gene, evaluation of genotype-phenotype relationships and prediction of BH(4)-responsiveness in the HPA population living in South Portugal. We performed the molecular characterization of 83 HPA patients using genomic DNA extracted from peripheral blood samples or Guthrie cards. PAH mutations were scanned by PCR amplification of exons and related intronic boundaries, followed by direct sequence analysis. Intragenic polymorphisms were determined by PCR-RFLP analysis. The results allowed the full characterization of 67 patients. The mutational spectrum encompasses 34 distinct mutations, being the most frequent IVS10nt-11G>A (14.6%), V388M (10.8%), R261Q (8.2%) and R270K (7.6%), which account for 46% of all mutant alleles. Moreover, 12 different haplotypes were identified and most mutations were associated with a single one. Notably, more than half of the 34 mutations belong to the group of more than 70 mutations already identified in BH(4)-responsive patients, according to BIOPKU database. Fifty one different genotypic combinations were found, most of them in single patients and involving a BH(4)-responsive mutation. In conclusion, a significant number (30-35%) of South Portugal PKU patients may potentially benefit from BH(4) therapy which, combined with a less strict diet, or eventually in special cases as monotherapy, may contribute to reduce nutritional deficiencies and minimize neurological and psychological dysfunctions.
Collapse
Affiliation(s)
- Isabel Rivera
- Metabolism and Genetics Group, Faculty of Pharmacy, iMed.UL-Research Institute for Medicines and Pharmaceutical Sciences,University of Lisbon, Portugal.
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Bozorgmehr JEH. An ancient frame-shifting event in the highly conserved KPNA gene family has undergone extensive compensation by natural selection in vertebrates. Biosystems 2011; 105:210-5. [PMID: 21550380 DOI: 10.1016/j.biosystems.2011.04.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Revised: 03/02/2011] [Accepted: 04/20/2011] [Indexed: 10/18/2022]
Abstract
One of the prevailing arguments in evolutionary theory is that the duplicates of genes can acquire novel functionality. This is because only one of the paralogs need maintain the ancestral function, leaving room for natural experimentation due to a respite in purifying selection. Although many duplicates can subsequently become disabled by nullifying mutations, a few may also go on to diverge along a novel evolutionary trajectory. Here, evidence is provided that demonstrates how this scenario may not always be true. Rather, in the case of the highly conserved KPNA importin family, an initial relaxation in selection induced a frameshift that was later suppressed and heavily compensated for as part of a reparative and optimizing process. Despite a resulting divergence, there remains a distinct preservation of both sequence and functionality among the paralogs. This would indicate that duplicates can be retained by selection for reasons related to their redundant functionality. It also shows that, even when positive selection is inferred in duplicate genes, this may be of a compensatory nature rather than one representing any biochemical innovation. Generally, this development would perhaps be a more common outcome for gene duplication than is currently maintained.
Collapse
|
47
|
Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2011; 79:59-88. [PMID: 20621281 DOI: 10.1016/s1876-1623(10)79002-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The vast majority of protein sequences are aperiodic; they do not have any strong bias in the amino acid composition, and they use a subtle mixture of all or most of the 20 amino acid residues to code a great number of various structures and functions. In this context, homorepeats, runs of a single amino acid residue, represent unusual, eye-catching motifs in proteins. Despite the sequence simplicity and relatively small size, the homorepeat runs have a strong potential for molecular interactions due to the excessively high local concentration of a certain physico-chemical property. Appearance of such runs within proteins may give them new structural and functional features. An increasing number of studies demonstrate the abundance of these motifs in proteins, their important roles in biological processes, and their link to a number of hereditary and age-related diseases. In this chapter, we summarize data on the distribution of homorepeats in proteomes and on their structural properties, evolution, and functions.
Collapse
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS, University of Montpellier 1 and 2, Montpellier, France
| | | |
Collapse
|
48
|
Bai X, Rivera-Vega L, Mamidala P, Bonello P, Herms DA, Mittapalli O. Transcriptomic signatures of ash (Fraxinus spp.) phloem. PLoS One 2011; 6:e16368. [PMID: 21283712 PMCID: PMC3025028 DOI: 10.1371/journal.pone.0016368] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2010] [Accepted: 12/26/2010] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Ash (Fraxinus spp.) is a dominant tree species throughout urban and forested landscapes of North America (NA). The rapid invasion of NA by emerald ash borer (Agrilus planipennis), a wood-boring beetle endemic to Eastern Asia, has resulted in the death of millions of ash trees and threatens billions more. Larvae feed primarily on phloem tissue, which girdles and kills the tree. While NA ash species including black (F. nigra), green (F. pennsylvannica) and white (F. americana) are highly susceptible, the Asian species Manchurian ash (F. mandshurica) is resistant to A. planipennis perhaps due to their co-evolutionary history. Little is known about the molecular genetics of ash. Hence, we undertook a functional genomics approach to identify the repertoire of genes expressed in ash phloem. METHODOLOGY AND PRINCIPAL FINDINGS Using 454 pyrosequencing we obtained 58,673 high quality ash sequences from pooled phloem samples of green, white, black, blue and Manchurian ash. Intriguingly, 45% of the deduced proteins were not significantly similar to any sequences in the GenBank non-redundant database. KEGG analysis of the ash sequences revealed a high occurrence of defense related genes. Expression analysis of early regulators potentially involved in plant defense (i.e. transcription factors, calcium dependent protein kinases and a lipoxygenase 3) revealed higher mRNA levels in resistant ash compared to susceptible ash species. Lastly, we predicted a total of 1,272 single nucleotide polymorphisms and 980 microsatellite loci, among which seven microsatellite loci showed polymorphism between different ash species. CONCLUSIONS AND SIGNIFICANCE The current transcriptomic data provide an invaluable resource for understanding the genetic make-up of ash phloem, the target tissue of A. planipennis. These data along with future functional studies could lead to the identification/characterization of defense genes involved in resistance of ash to A. planipennis, and in future ash breeding programs for marker development.
Collapse
Affiliation(s)
- Xiaodong Bai
- Department of Entomology, The Ohio State University, Ohio Agricultural and Research Development Center, Wooster, Ohio, United States of America
| | - Loren Rivera-Vega
- Department of Entomology, The Ohio State University, Ohio Agricultural and Research Development Center, Wooster, Ohio, United States of America
| | - Praveen Mamidala
- Department of Entomology, The Ohio State University, Ohio Agricultural and Research Development Center, Wooster, Ohio, United States of America
| | - Pierluigi Bonello
- Department of Plant Pathology, The Ohio State University, Columbus, Ohio, United States of America
| | - Daniel A. Herms
- Department of Entomology, The Ohio State University, Ohio Agricultural and Research Development Center, Wooster, Ohio, United States of America
| | - Omprakash Mittapalli
- Department of Entomology, The Ohio State University, Ohio Agricultural and Research Development Center, Wooster, Ohio, United States of America
| |
Collapse
|
49
|
José MV, Morgado ER, Govezensky T. Genetic hotels for the standard genetic code: evolutionary analysis based upon novel three-dimensional algebraic models. Bull Math Biol 2010; 73:1443-76. [PMID: 20725796 DOI: 10.1007/s11538-010-9571-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Accepted: 07/02/2010] [Indexed: 11/30/2022]
Abstract
Herein, we rigorously develop novel 3-dimensional algebraic models called Genetic Hotels of the Standard Genetic Code (SGC). We start by considering the primeval RNA genetic code which consists of the 16 codons of type RNY (purine-any base-pyrimidine). Using simple algebraic operations, we show how the RNA code could have evolved toward the current SGC via two different intermediate evolutionary stages called Extended RNA code type I and II. By rotations or translations of the subset RNY, we arrive at the SGC via the former (type I) or via the latter (type II), respectively. Biologically, the Extended RNA code type I, consists of all codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type) and third (YRN type) reading frames. The Extended RNA code type II, comprises all codons of the type RNY plus codons that arise from transversions of the RNA code in the first (YNY type) and third (RNR) nucleotide bases. Since the dimensions of remarkable subsets of the Genetic Hotels are not necessarily integer numbers, we also introduce the concept of algebraic fractal dimension. A general decoding function which maps each codon to its corresponding amino acid or the stop signals is also derived. The Phenotypic Hotel of amino acids is also illustrated. The proposed evolutionary paths are discussed in terms of the existing theories of the evolution of the SGC. The adoption of 3-dimensional models of the Genetic and Phenotypic Hotels will facilitate the understanding of the biological properties of the SGC.
Collapse
Affiliation(s)
- Marco V José
- Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Mexico.
| | | | | |
Collapse
|
50
|
Clark MJ, Homer N, O'Connor BD, Chen Z, Eskin A, Lee H, Merriman B, Nelson SF. U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line. PLoS Genet 2010; 6:e1000832. [PMID: 20126413 PMCID: PMC2813426 DOI: 10.1371/journal.pgen.1000832] [Citation(s) in RCA: 203] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2009] [Accepted: 12/28/2009] [Indexed: 01/23/2023] Open
Abstract
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date. Glioblastoma has a particularly dismal prognosis with median survival time of less than fifteen months. Here, we describe the broad genome sequencing of U87MG, a commonly used and thus well-studied glioblastoma cell line. One of the major features of the U87MG genome is the large number of chromosomal abnormalities, which can be typical of cancer cell lines and primary cancers. The systematic, thorough, and accurate mutational analysis of the U87MG genome comprehensively identifies different classes of genetic mutations including single-nucleotide variations (SNVs), insertions/deletions (indels), and translocations. We found 2,384,470 SNVs, 191,743 small indels, and 1,314 large structural variations. Known gene models were used to predict the effect of these mutations on protein-coding sequence. Mutational analysis revealed 512 genes homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and up to 35 by interchromosomal translocations. The major mutational mechanisms in this brain cancer cell line are small indels and large structural variations. The genomic landscape of U87MG is revealed to be much more complex than previously thought based on lower resolution techniques. This mutational analysis serves as a resource for past and future studies on U87MG, informing them with a thorough description of its mutational state.
Collapse
Affiliation(s)
- Michael James Clark
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Nils Homer
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Brian D. O'Connor
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Zugen Chen
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Ascia Eskin
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Hane Lee
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Barry Merriman
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Stanley F. Nelson
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|