1
|
Bahena-Ceron R, Teixeira C, Ponce JRJ, Wolff P, Couzon F, François P, Klaholz BP, Vandenesch F, Romby P, Moreau K, Marzi S. RlmQ: a newly discovered rRNA modification enzyme bridging RNA modification and virulence traits in Staphylococcus aureus. RNA (NEW YORK, N.Y.) 2024; 30:200-212. [PMID: 38164596 PMCID: PMC10870370 DOI: 10.1261/rna.079850.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 11/29/2023] [Indexed: 01/03/2024]
Abstract
rRNA modifications play crucial roles in fine-tuning the delicate balance between translation speed and accuracy, yet the underlying mechanisms remain elusive. Comparative analyses of the rRNA modifications in taxonomically distant bacteria could help define their general, as well as species-specific, roles. In this study, we identified a new methyltransferase, RlmQ, in Staphylococcus aureus responsible for the Gram-positive specific m7G2601, which is not modified in Escherichia coli (G2574). We also demonstrate the absence of methylation on C1989, equivalent to E. coli C1962, which is methylated at position 5 by the Gram-negative specific RlmI methyltransferase, a paralog of RlmQ. Both modifications (S. aureus m7G2601 and E. coli m5C1962) are situated within the same tRNA accommodation corridor, hinting at a potential shared function in translation. Inactivation of S. aureus rlmQ causes the loss of methylation at G2601 and significantly impacts growth, cytotoxicity, and biofilm formation. These findings unravel the intricate connections between rRNA modifications, translation, and virulence in pathogenic Gram-positive bacteria.
Collapse
Affiliation(s)
- Roberto Bahena-Ceron
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Chloé Teixeira
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Jose R Jaramillo Ponce
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Philippe Wolff
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Florence Couzon
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Pauline François
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Bruno P Klaholz
- Centre for Integrative Biology, Department of Integrated Structural Biology, IGBMC, 67400 Illkirch, France
- CNRS UMR 7104, 67400 Illkirch, France
- Inserm U964, 67400 Illkirch, France
- Université de Strasbourg, 67000 Strasbourg, France
| | - François Vandenesch
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
- Institut des agents infectieux, Hospices Civils de Lyon, 69004 Lyon, France
- Centre National de Référence des Staphylocoques, Hospices Civils de Lyon, 69317 Lyon, France
| | - Pascale Romby
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| | - Karen Moreau
- CIRI, Centre International de Recherche en Infectiologie, Université de Lyon, Inserm U1111, Université Claude Bernard Lyon 1, CNRS UMR5308, ENS de Lyon, 69007 Lyon, France
| | - Stefano Marzi
- Université de Strasbourg, CNRS, Architecture et Réactivité de l'ARN, 67000 Strasbourg, France
| |
Collapse
|
2
|
Ahmad A, von Dohlen C, Ren Z. A chromosome-level genome assembly of the Rhus gall aphid Schlechtendalia chinensis provides insight into the endogenization of Parvovirus-like DNA sequences. BMC Genomics 2024; 25:16. [PMID: 38166596 PMCID: PMC10759679 DOI: 10.1186/s12864-023-09916-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/15/2023] [Indexed: 01/05/2024] Open
Abstract
The Rhus gall aphid, Schlechtendalia chinensis, feeds on its primary host plant Rhus chinensis to induce galls, which have economic importance in medicines and the food industry. Rhus gall aphids have a unique life cycle and are economically beneficial but there is huge gap in genomic information about this group of aphids. Schlechtendalia chinensis induces rich-tannin galls on its host plant and is emerging as a model organism for both commercial applications and applied research in the context of gall production by insects. Here, we generated a high-quality chromosome-level assembly for the S. chinensis genome, enabling the comparison between S. chinensis and non-galling aphids. The final genome assembly is 344.59 Mb with 91.71% of the assembled sequences anchored into 13 chromosomes. We predicted 15,013 genes, of which 14,582 (97.13%) coding genes were annotated, and 99% of the predicted genes were anchored to the 13 chromosomes. This assembly reveals the endogenization of parvovirus-related DNA sequences (PRDs) in the S. chinensis genome, which could play a role in environmental adaptations. We demonstrated the characterization and classification of cytochrome P450s in the genome assembly, which are functionally crucial for sap-feeding insects and have roles in detoxification and insecticide resistance. This genome assembly also revealed the whole genome duplication events in S. chinensis, which can be considered in comparative evolutionary analysis. Our work represents a reference genome for gall-forming aphids that could be used for comparative genomic studies between galling and non-galling aphids and provides the first insight into the endogenization of PRDs in the genome of galling aphids. It also provides novel genetic information for future research on gall-formation and insect-plant interactions.
Collapse
Affiliation(s)
- Aftab Ahmad
- School of Life Science, Shanxi University, Taiyuan, Shanxi, China
| | - Carol von Dohlen
- Department of Biology, Utah State University, Logan, Utah, United States of America
| | - Zhumei Ren
- School of Life Science, Shanxi University, Taiyuan, Shanxi, China.
| |
Collapse
|
3
|
Hodgins HP, Chen P, Lobb B, Wei X, Tremblay BJM, Mansfield MJ, Lee VCY, Lee PG, Coffin J, Duggan AT, Dolphin AE, Renaud G, Dong M, Doxey AC. Ancient Clostridium DNA and variants of tetanus neurotoxins associated with human archaeological remains. Nat Commun 2023; 14:5475. [PMID: 37673908 PMCID: PMC10482840 DOI: 10.1038/s41467-023-41174-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
The analysis of microbial genomes from human archaeological samples offers a historic snapshot of ancient pathogens and provides insights into the origins of modern infectious diseases. Here, we analyze metagenomic datasets from 38 human archaeological samples and identify bacterial genomic sequences related to modern-day Clostridium tetani, which produces the tetanus neurotoxin (TeNT) and causes the disease tetanus. These genomic assemblies had varying levels of completeness, and a subset of them displayed hallmarks of ancient DNA damage. Phylogenetic analyses revealed known C. tetani clades as well as potentially new Clostridium lineages closely related to C. tetani. The genomic assemblies encode 13 TeNT variants with unique substitution profiles, including a subgroup of TeNT variants found exclusively in ancient samples from South America. We experimentally tested a TeNT variant selected from an ancient Chilean mummy sample and found that it induced tetanus muscle paralysis in mice, with potency comparable to modern TeNT. Thus, our ancient DNA analysis identifies DNA from neurotoxigenic C. tetani in archaeological human samples, and a novel variant of TeNT that can cause disease in mammals.
Collapse
Affiliation(s)
- Harold P Hodgins
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Pengsheng Chen
- Department of Urology, Boston Children's Hospital, Boston, MA, USA
- Department of Surgery and Department of Microbiology, Harvard Medical School, Boston, MA, USA
| | - Briallen Lobb
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Xin Wei
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Benjamin J M Tremblay
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Michael J Mansfield
- Genomics and Regulatory Systems Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
| | - Victoria C Y Lee
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada
| | - Pyung-Gang Lee
- Department of Urology, Boston Children's Hospital, Boston, MA, USA
- Department of Surgery and Department of Microbiology, Harvard Medical School, Boston, MA, USA
| | - Jeffrey Coffin
- Department of Anthropology, University of Waterloo, Waterloo, ON, Canada
| | - Ana T Duggan
- McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, ON, Canada
| | - Alexis E Dolphin
- Department of Anthropology, University of Waterloo, Waterloo, ON, Canada
| | - Gabriel Renaud
- Department of Health Technology, Section of Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark.
| | - Min Dong
- Department of Urology, Boston Children's Hospital, Boston, MA, USA.
- Department of Surgery and Department of Microbiology, Harvard Medical School, Boston, MA, USA.
| | - Andrew C Doxey
- Department of Biology and the Waterloo Centre for Microbial Research, University of Waterloo, Waterloo, ON, Canada.
| |
Collapse
|
4
|
Zhao H, Souilljee M, Pavlidis P, Alachiotis N. Genome-wide scans for selective sweeps using convolutional neural networks. Bioinformatics 2023; 39:i194-i203. [PMID: 37387128 DOI: 10.1093/bioinformatics/btad265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Recent methods for selective sweep detection cast the problem as a classification task and use summary statistics as features to capture region characteristics that are indicative of a selective sweep, thereby being sensitive to confounding factors. Furthermore, they are not designed to perform whole-genome scans or to estimate the extent of the genomic region that was affected by positive selection; both are required for identifying candidate genes and the time and strength of selection. RESULTS We present ASDEC (https://github.com/pephco/ASDEC), a neural-network-based framework that can scan whole genomes for selective sweeps. ASDEC achieves similar classification performance to other convolutional neural network-based classifiers that rely on summary statistics, but it is trained 10× faster and classifies genomic regions 5× faster by inferring region characteristics from the raw sequence data directly. Deploying ASDEC for genomic scans achieved up to 15.2× higher sensitivity, 19.4× higher success rates, and 4× higher detection accuracy than state-of-the-art methods. We used ASDEC to scan human chromosome 1 of the Yoruba population (1000Genomes project), identifying nine known candidate genes.
Collapse
Affiliation(s)
- Hanqing Zhao
- Faculty of EEMCS, University of Twente, Enschede, The Netherlands
| | | | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece
| | | |
Collapse
|
5
|
Kan Y, Jiang L, Tang J, Guo Y, Guo F. A systematic view of computational methods for identifying driver genes based on somatic mutation data. Brief Funct Genomics 2021; 20:333-343. [PMID: 34312663 DOI: 10.1093/bfgp/elab032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 06/16/2021] [Accepted: 06/22/2021] [Indexed: 11/13/2022] Open
Abstract
Abnormal changes of driver genes are serious for human health and biomedical research. Identifying driver genes, exactly from enormous genes with mutations, promotes accurate diagnosis and treatment of cancer. A lot of works about uncovering driver genes have been developed over the past decades. By analyzing previous works, we find that computational methods are more efficient than traditional biological experiments when distinguishing driver genes from massive data. In this study, we summarize eight common computational algorithms only using somatic mutation data. We first group these methods into three categories according to mutation features they apply. Then, we conclude a general process of nominating candidate cancer driver genes. Finally, we evaluate three representative methods on 10 kinds of cancer derived from The Cancer Genome Atlas Program and five Chinese projects from the International Cancer Genome Consortium. In addition, we compare results of methods with various parameters. Evaluation is performed from four perspectives, including CGC, OG/TSG, Q-value and QQQuantile-Quantileplot. To sum up, we present algorithms using somatic mutation data in order to offer a systematic view of various mutation features and lay the foundation of methods based on integration of mutation information and other types of data.
Collapse
Affiliation(s)
- Yingxin Kan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
6
|
Patlar B, Jayaswal V, Ranz JM, Civetta A. Nonadaptive molecular evolution of seminal fluid proteins in Drosophila. Evolution 2021; 75:2102-2113. [PMID: 34184267 PMCID: PMC8457112 DOI: 10.1111/evo.14297] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 06/02/2021] [Accepted: 06/09/2021] [Indexed: 12/20/2022]
Abstract
Seminal fluid proteins (SFPs) are a group of reproductive proteins that are among the most evolutionarily divergent known. As SFPs can impact male and female fitness, these proteins have been proposed to evolve under postcopulatory sexual selection (PCSS). However, the fast change of the SFPs can also result from nonadaptive evolution, and the extent to which selective constraints prevent SFPs rapid evolution remains unknown. Using intra‐ and interspecific sequence information, along with genomics and functional data, we examine the molecular evolution of approximately 300 SFPs in Drosophila. We found that 50–57% of the SFP genes, depending on the population examined, are evolving under relaxed selection. Only 7–12% showed evidence of positive selection, with no evidence supporting other forms of PCSS, and 35–37% of the SFP genes were selectively constrained. Further, despite associations of positive selection with gene location on the X chromosome and protease activity, the analysis of additional genomic and functional features revealed their lack of influence on SFPs evolving under positive selection. Our results highlight a lack of sufficient evidence to claim that most SFPs are driven to evolve rapidly by PCSS while identifying genomic and functional attributes that influence different modes of SFPs evolution.
Collapse
Affiliation(s)
- Bahar Patlar
- Department of Biology, University of Winnipeg, Winnipeg, MB, R3B 2E9, Canada
| | - Vivek Jayaswal
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - José M Ranz
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, 92697
| | - Alberto Civetta
- Department of Biology, University of Winnipeg, Winnipeg, MB, R3B 2E9, Canada
| |
Collapse
|
7
|
Gupta MK, Vadde R. Divergent evolution and purifying selection of the Type 2 diabetes gene sequences in Drosophila: a phylogenomic study. Genetica 2020; 148:269-282. [PMID: 32804315 DOI: 10.1007/s10709-020-00101-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 08/12/2020] [Indexed: 11/24/2022]
Abstract
The recently developed phylogenomic approach provides a unique way to identify disease risk or protective allele in any organism. While risk alleles evolve mostly under purifying selection, protective alleles are evolving either under balancing or positive selection. Owing to insufficient information, authors employed the phylogenomic approach to detect the nature of selection acting on type 2 diabetes (T2D) genes in Drosophila genus using various models of CODEML utility of PAML. The obtained result revealed that T2D gene sequences are evolving under purifying selection. However, only a few sites in membrane proteins encoded via CG8051, ZnT35C, and kar, are significantly evolving under positive selection under specific scenarios, which might be because of positive or adaptive evolution in response to changing niche, diet or other factors. In the near future, this information will be highly useful in the field of evolutionary medicine and the drug discovery process.
Collapse
Affiliation(s)
- Manoj Kumar Gupta
- Department of Biotechnology & Bioinformatics, Yogi Vemana University, Kadapa, Andhra Pradesh, 516005, India
| | - Ramakrishna Vadde
- Department of Biotechnology & Bioinformatics, Yogi Vemana University, Kadapa, Andhra Pradesh, 516005, India.
| |
Collapse
|
8
|
Lai YP, Ioerger TR. Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes. Evol Bioinform Online 2020; 16:1176934320944932. [PMID: 32782426 PMCID: PMC7385850 DOI: 10.1177/1176934320944932] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 06/30/2020] [Indexed: 12/23/2022] Open
Abstract
Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.
Collapse
Affiliation(s)
- Yi-Pin Lai
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| | - Thomas R Ioerger
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|
9
|
Evolutionary Diversity in the Intracellular Microsporidian Parasite Nosema sp. Infecting Wild Silkworm Revealed by IGS Nucleotide Sequence Diversity. J Mol Evol 2020; 88:345-360. [PMID: 32166385 DOI: 10.1007/s00239-020-09936-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 02/27/2020] [Indexed: 10/24/2022]
Abstract
Intracellular microsporidian Nosema mylitta infects Indian wild silkworm Antheraea mylitta causing pebrine disease. Genetic structure and phylogeny of N. mylitta are analysed using nucleotide variability in 5S ribosomal DNA and intergenic spacer (IGS) sequence from 20 isolates collected from Southern, Northern and Central regions of Jharkhand State. Nucleotide diversity (π) and genetic differentiation Gst were highest in the Central isolates whereas lowest in the North. Among the isolates, absence of nucleotides, transitions and transversions were observed. Haplotyping showed nucleotide variability at 83 positions in IGS and 13 positions in 5S rDNA. Haplotype-based genetic differentiation was 0.96 to 0.97 whereas nucleotide sequence-based genetic differentiation was higher (Ks = 22.29) between Southern and Central isolates. Bottleneck analysis showed negative value for Tajima's D and other summary statistics revealing induction of loss of rare alleles and population explosion. From IGS, 17 ancestral sequences were inferred by Network algorithm. Core of nine closely related nodes having ancient nucleotides and peripheral nodes with highly divergent nucleotides were derived. Most diverged peripheral haplotype was Bero (H11) from the Central region whereas Deoghar (H3) of the Northern region diverged early. Phylogeny of N. mylitta grouped Southern and Northern isolates together revealed weak phylogenetic signal for these locations. Phylogeny of N. mylitta with Nosema sp. infecting other lepidopterans clustered N. mylitta isolates with N. antheraea and N. philosamiae of China indicating genetic similarity whereas other species were dissimilar showing diversity irrespective of country of origin.
Collapse
|
10
|
Ngassa Mbenda HG, Wang M, Guo J, Siddiqui FA, Hu Y, Yang Z, Kittichai V, Sattabongkot J, Cao Y, Jiang L, Cui L. Evolution of the Plasmodium vivax multidrug resistance 1 gene in the Greater Mekong Subregion during malaria elimination. Parasit Vectors 2020; 13:67. [PMID: 32051017 PMCID: PMC7017538 DOI: 10.1186/s13071-020-3934-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 02/03/2020] [Indexed: 11/10/2022] Open
Abstract
Background The malaria elimination plan of the Greater Mekong Subregion (GMS) is jeopardized by the increasing number of Plasmodium vivax infections and emergence of parasite strains with reduced susceptibility to the frontline drug treatment chloroquine/primaquine. This study aimed to determine the evolution of the P. vivax multidrug resistance 1 (Pvmdr1) gene in P. vivax parasites isolated from the China–Myanmar border area during the major phase of elimination. Methods Clinical isolates were collected from 275 P. vivax patients in 2008, 2012–2013 and 2015 in the China–Myanmar border area and from 55 patients in central China. Comparison was made with parasites from three border regions of Thailand. Results Overall, genetic diversity of the Pvmdr1 was relatively high in all border regions, and over the seven years in the China–Myanmar border, though slight temporal fluctuation was observed. Single nucleotide polymorphisms previously implicated in reduced chloroquine sensitivity were detected. In particular, M908L approached fixation in the China–Myanmar border area. The Y976F mutation sharply decreased from 18.5% in 2008 to 1.5% in 2012–2013 and disappeared in 2015, whereas F1076L steadily increased from 33.3% in 2008 to 77.8% in 2015. While neutrality tests suggested the action of purifying selection on the pvmdr1 gene, several likelihood-based algorithms detected positive as well as purifying selections operating on specific amino acids including M908L, T958M and F1076L. Fixation and selection of the nonsynonymous mutations are differently distributed across the three border regions and central China. Comparison with the global P. vivax populations clearly indicated clustering of haplotypes according to geographic locations. It is noteworthy that the temperate-zone parasites from central China were completely separated from the parasites from other parts of the GMS. Conclusions This study showed that P. vivax populations in the China–Myanmar border has experienced major changes in the Pvmdr1 residues proposed to be associated with chloroquine resistance, suggesting that drug selection may play an important role in the evolution of this gene in the parasite populations.![]()
Collapse
Affiliation(s)
- Huguette Gaelle Ngassa Mbenda
- Division of Infectious Diseases and International Medicine, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Meilian Wang
- Department of Immunology, College of Basic Medical Sciences, China Medical University, Shenyang, 110001, China
| | - Jian Guo
- Department of Laboratory Medicine, Shanghai East Hospital, Tongji School of Medicine, Shanghai, China
| | - Faiza Amber Siddiqui
- Division of Infectious Diseases and International Medicine, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Yue Hu
- Department of Pathogen Biology and Immunology, Kunming Medical University, Kunming, Yunnan, China
| | - Zhaoqing Yang
- Department of Pathogen Biology and Immunology, Kunming Medical University, Kunming, Yunnan, China
| | - Veerayuth Kittichai
- Mahidol Vivax Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Jetsumon Sattabongkot
- Mahidol Vivax Research Unit, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Yaming Cao
- Department of Immunology, College of Basic Medical Sciences, China Medical University, Shenyang, 110001, China
| | - Lubin Jiang
- Unit of Human Parasite Molecular and Cell Biology, Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China
| | - Liwang Cui
- Division of Infectious Diseases and International Medicine, Department of Internal Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA.
| |
Collapse
|
11
|
Liu K, Hao X, Wang Q, Hou J, Lai X, Dong Z, Shao C. Genome-wide identification and characterization of heat shock protein family 70 provides insight into its divergent functions on immune response and development of Paralichthys olivaceus. PeerJ 2019; 7:e7781. [PMID: 31737440 PMCID: PMC6855204 DOI: 10.7717/peerj.7781] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 08/28/2019] [Indexed: 01/16/2023] Open
Abstract
Flatfish undergo extreme morphological development and settle to a benthic in the adult stage, and are likely to be more susceptible to environmental stress. Heat shock proteins 70 (hsp70) are involved in embryonic development and stress response in metazoan animals. However, the evolutionary history and functions of hsp70 in flatfish are poorly understood. Here, we identified 15 hsp70 genes in the genome of Japanese flounder (Paralichthys olivaceus), a flatfish endemic to northwestern Pacific Ocean. Gene structure and motifs of the Japanese flounder hsp70 were conserved, and there were few structure variants compared to other fish species. We constructed a maximum likelihood tree to understand the evolutionary relationship of the hsp70 genes among surveyed fish. Selection pressure analysis suggested that four genes, hspa4l, hspa9, hspa13, and hyou1, showed signs of positive selection. We then extracted transcriptome data on the Japanese flounder with Edwardsiella tarda to induce stress, and found that hspa9, hspa12b, hspa4l, hspa13, and hyou1 were highly expressed, likely to protect cells from stress. Interestingly, expression patterns of hsp70 genes were divergent in different developmental stages of the Japanese flounder. We found that at least one hsp70 gene was always highly expressed at various stages of embryonic development of the Japanese flounder, thereby indicating that hsp70 genes were constitutively expressed in the Japanese flounder. Our findings provide basic and useful resources to better understand hsp70 genes in flatfish.
Collapse
Affiliation(s)
- Kaiqiang Liu
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resource, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, QingDao, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, QingDao, China.,Jiangsu Key Laboratory of Marine Bioresources and Environment, Jiangsu Key Laboratory of Marine Biotechnology, Huaihai Institute of Technology, Lianyungang, China
| | - Xiancai Hao
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resource, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, QingDao, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, QingDao, China
| | - Qian Wang
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resource, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, QingDao, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, QingDao, China
| | - Jilun Hou
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Beidaihe, China
| | - Xiaofang Lai
- Jiangsu Key Laboratory of Marine Bioresources and Environment, Jiangsu Key Laboratory of Marine Biotechnology, Huaihai Institute of Technology, Lianyungang, China
| | - Zhiguo Dong
- Jiangsu Key Laboratory of Marine Bioresources and Environment, Jiangsu Key Laboratory of Marine Biotechnology, Huaihai Institute of Technology, Lianyungang, China
| | - Changwei Shao
- Key Laboratory for Sustainable Utilization of Marine Fisheries Resource, Ministry of Agriculture, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, QingDao, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, QingDao, China
| |
Collapse
|
12
|
Rhee JK, Yoo J, Kim KR, Kim J, Lee YJ, Chul Cho B, Kim TM. Identification of Local Clusters of Mutation Hotspots in Cancer-Related Genes and Their Biological Relevance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1656-1662. [PMID: 29993813 DOI: 10.1109/tcbb.2018.2813375] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Mutation hotspots are either solitary amino acid residues or stretches of amino acids that show elevated mutation frequency in cancer-related genes, but their prevalence and biological relevance are not completely understood. Here, we developed a Smith-Waterman algorithm-based mutation hotspot discovery method, MutClustSW, to identify mutation hotspots of either single or clustered amino acid residues. We identified 181 missense mutation hotspots from COSMIC and TCGA mutation databases. In addition to 77 single amino acid residue hotspots (42.5 percent) including well-known mutation hotspots such as IDH1 (p.R132) and BRAF (p.V600), we identified 104 mutation hotspots (57.5 percent) as clusters or stretches of multiple amino acids, and the hotspots on MUC2, EPPK1, KMT2C, and TP53 were larger than 50 amino acids. Twelve of 27 nonsense mutation hotspots (44.4 percent) were observed in four cancer-related genes, TP53, ARID1A, CDKN2A, and PTEN, suggesting that truncating mutations on some tumor suppressor genes are not randomly distributed as previously assumed. We also show that hotspot mutations have higher mutation allele frequency than non-hotspots, and the hotspot information can be used to prioritize the cancer drivers. Together, the proposed algorithm and the mutation hotspot information can serve as valuable resources in the selection of functional driver mutations and associated genes.
Collapse
|
13
|
Dong Y, Chen S, Cheng S, Zhou W, Ma Q, Chen Z, Fu CX, Liu X, Zhao YP, Soltis PS, Wong GKS, Soltis DE, Xiang QYJ. Natural selection and repeated patterns of molecular evolution following allopatric divergence. eLife 2019; 8:45199. [PMID: 31373555 PMCID: PMC6744222 DOI: 10.7554/elife.45199] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 08/01/2019] [Indexed: 11/13/2022] Open
Abstract
Although geographic isolation is a leading driver of speciation, the tempo and pattern of divergence at the genomic level remain unclear. We examine genome-wide divergence of putatively single-copy orthologous genes (POGs) in 20 allopatric species/variety pairs from diverse angiosperm clades, with 16 pairs reflecting the classic eastern Asia-eastern North America floristic disjunction. In each pair, >90% of POGs are under purifying selection, and <10% are under positive selection. A set of POGs are under strong positive selection, 14 of which are shared by 10-15 pairs, and one shared by all pairs; 15 POGs are annotated to biological processes responding to various stimuli. The relative abundance of POGs under different selective forces exhibits a repeated pattern among pairs despite an ~10 million-year difference in divergence time. Species divergence times are positively correlated with abundance of POGs under moderate purifying selection, but negatively correlated with abundance of POGs under strong purifying selection.
Collapse
Affiliation(s)
- Yibo Dong
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, United States.,Plant Biology Division, Noble Research Institute, Ardmore, United States
| | - Shichao Chen
- Florida Museum of Natural History, University of Florida, Gainesville, United States.,Department of Biology, University of Florida, Gainesville, United States.,School of Life Sciences and Technology, Tongji University, Shanghai, China
| | | | - Wenbin Zhou
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, United States
| | - Qing Ma
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, United States
| | - Zhiduan Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Cheng-Xin Fu
- Laboratory of Systematic & Evolutionary Botany and Biodiversity, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Xin Liu
- Beijing Genomics Institute, Shenzhen, China
| | - Yun-Peng Zhao
- Laboratory of Systematic & Evolutionary Botany and Biodiversity, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, United States
| | - Gane Ka-Shu Wong
- Beijing Genomics Institute, Shenzhen, China.,Department of Biological Sciences, University of Alberta, Edmonton, Canada.,Department of Medicine, University of Alberta, Edmonton, Canada
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, United States.,Department of Biology, University of Florida, Gainesville, United States
| | - Qiu-Yun Jenny Xiang
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, United States
| |
Collapse
|
14
|
Liu EM, Martinez-Fundichely A, Diaz BJ, Aronson B, Cuykendall T, MacKay M, Dhingra P, Wong EWP, Chi P, Apostolou E, Sanjana NE, Khurana E. Identification of Cancer Drivers at CTCF Insulators in 1,962 Whole Genomes. Cell Syst 2019; 8:446-455.e8. [PMID: 31078526 PMCID: PMC6917527 DOI: 10.1016/j.cels.2019.04.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 11/20/2018] [Accepted: 04/02/2019] [Indexed: 12/15/2022]
Abstract
Recent studies have shown that mutations at non-coding elements, such as promoters and enhancers, can act as cancer drivers. However, an important class of non-coding elements, namely CTCF insulators, has been overlooked in the previous driver analyses. We used insulator annotations from CTCF and cohesin ChIA-PET and analyzed somatic mutations in 1,962 whole genomes from 21 cancer types. Using the heterogeneous patterns of transcription-factor-motif disruption, functional impact, and recurrence of mutations, we developed a computational method that revealed 21 insulators showing signals of positive selection. In particular, mutations in an insulator in multiple cancer types, including 16% of melanoma samples, are associated with TGFB1 up-regulation. Using CRISPR-Cas9, we find that alterations at two of the most frequently mutated regions in this insulator increase cell growth by 40%-50%, supporting the role of this boundary element as a cancer driver. Thus, our study reveals several CTCF insulators as putative cancer drivers.
Collapse
Affiliation(s)
- Eric Minwei Liu
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Alexander Martinez-Fundichely
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Bianca Jay Diaz
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Boaz Aronson
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Tawny Cuykendall
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Matthew MacKay
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Priyanka Dhingra
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Elissa W P Wong
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ping Chi
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Effie Apostolou
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Neville E Sanjana
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Ekta Khurana
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
15
|
Zhao ZM, Campbell MC, Li N, Lee DSW, Zhang Z, Townsend JP. Detection of Regional Variation in Selection Intensity within Protein-Coding Genes Using DNA Sequence Polymorphism and Divergence. Mol Biol Evol 2018; 34:3006-3022. [PMID: 28962009 DOI: 10.1093/molbev/msx213] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Numerous approaches have been developed to infer natural selection based on the comparison of polymorphism within species and divergence between species. These methods are especially powerful for the detection of uniform selection operating across a gene. However, empirical analyses have demonstrated that regions of protein-coding genes exhibiting clusters of amino acid substitutions are subject to different levels of selection relative to other regions of the same gene. To quantify this heterogeneity of selection within coding sequences, we developed Model Averaged Site Selection via Poisson Random Field (MASS-PRF). MASS-PRF identifies an ensemble of intragenic clustering models for polymorphic and divergent sites. This ensemble of models is used within the Poisson Random Field framework to estimate selection intensity on a site-by-site basis. Using simulations, we demonstrate that MASS-PRF has high power to detect clusters of amino acid variants in small genic regions, can reliably estimate the probability of a variant occurring at each nucleotide site in sequence data and is robust to historical demographic trends and recombination. We applied MASS-PRF to human gene polymorphism derived from the 1,000 Genomes Project and divergence data from the common chimpanzee. On the basis of this analysis, we discovered striking regional variation in selection intensity, indicative of positive or negative selection, in well-defined domains of genes that have previously been associated with neurological processing, immunity, and reproduction. We suggest that amino acid-altering substitutions within these regions likely are or have been selectively advantageous in the human lineage, playing important roles in protein function.
Collapse
Affiliation(s)
- Zi-Ming Zhao
- Department of Biostatistics, Yale University, New Haven, CT
| | - Michael C Campbell
- Department of Biostatistics, Yale University, New Haven, CT.,Department of Biology, Howard University, Washington, DC
| | - Ning Li
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT
| | - Daniel S W Lee
- Department of Biostatistics, Yale University, New Haven, CT
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Jeffrey P Townsend
- Department of Biostatistics, Yale University, New Haven, CT.,Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT.,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT
| |
Collapse
|
16
|
Human Genomic Loci Important in Common Infectious Diseases: Role of High-Throughput Sequencing and Genome-Wide Association Studies. CANADIAN JOURNAL OF INFECTIOUS DISEASES & MEDICAL MICROBIOLOGY 2018; 2018:1875217. [PMID: 29755620 PMCID: PMC5884297 DOI: 10.1155/2018/1875217] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Accepted: 03/07/2018] [Indexed: 12/27/2022]
Abstract
HIV/AIDS, tuberculosis (TB), and malaria are 3 major global public health threats that undermine development in many resource-poor settings. Recently, the notion that positive selection during epidemics or longer periods of exposure to common infectious diseases may have had a major effect in modifying the constitution of the human genome is being interrogated at a large scale in many populations around the world. This positive selection from infectious diseases increases power to detect associations in genome-wide association studies (GWASs). High-throughput sequencing (HTS) has transformed both the management of infectious diseases and continues to enable large-scale functional characterization of host resistance/susceptibility alleles and loci; a paradigm shift from single candidate gene studies. Application of genome sequencing technologies and genomics has enabled us to interrogate the host-pathogen interface for improving human health. Human populations are constantly locked in evolutionary arms races with pathogens; therefore, identification of common infectious disease-associated genomic variants/markers is important in therapeutic, vaccine development, and screening susceptible individuals in a population. This review describes a range of host-pathogen genomic loci that have been associated with disease susceptibility and resistant patterns in the era of HTS. We further highlight potential opportunities for these genetic markers.
Collapse
|
17
|
Vijayan V, López-González S, Sánchez F, Ponz F, Pagán I. Virulence evolution of a sterilizing plant virus: Tuning multiplication and resource exploitation. Virus Evol 2017; 3:vex033. [PMID: 29250431 PMCID: PMC5724401 DOI: 10.1093/ve/vex033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Virulence evolution may have far-reaching consequences for virus epidemiology and emergence, and virologists have devoted increasing effort to understand the modulators of this process. However, still little is known on the mechanisms and determinants of virulence evolution in sterilizing viruses that, as they prevent host reproduction, may have devastating effects on host populations. Theory predicts that sterilizing parasites, including viruses, would evolve towards lower virulence and absolute host sterilization to optimize the exploitation of host resources and maximize fitness. However, this hypothesis has seldom been analyzed experimentally. We investigated the evolution of virulence of the sterilizing plant virus Turnip mosaic virus (TuMV) in its natural host Arabidopsis thaliana by serial passage experiments. After passaging, we quantified virus accumulation and infectivity, the effect of infection on plant growth and development, and virulence of the ancestral and passaged viral genotypes in A. thaliana. Results indicated that serial passaging increased the proportion of infected plants showing absolute sterility, reduced TuMV virulence, and increased virus multiplication and infectivity. Genomic comparison of the ancestral and passaged TuMV genotypes identified significant mutation clustering in the P1, P3, and 6K2 proteins, suggesting a role of these viral proteins in the observed phenotypic changes. Our results support theoretical predictions on the evolution of virulence of sterilizing parasites and contribute to better understand the phenotypic and genetic changes associated with this process.
Collapse
Affiliation(s)
- Viji Vijayan
- Centro de Biotecnología y Genómica de Plantas (UPM-INIA), Autopista M-40, km 38, Campus Montegancedo, Pozuelo de Alarcón, 28223 Madrid, Spain
| | - Silvia López-González
- Centro de Biotecnología y Genómica de Plantas (UPM-INIA), Autopista M-40, km 38, Campus Montegancedo, Pozuelo de Alarcón, 28223 Madrid, Spain
| | - Flora Sánchez
- Centro de Biotecnología y Genómica de Plantas (UPM-INIA), Autopista M-40, km 38, Campus Montegancedo, Pozuelo de Alarcón, 28223 Madrid, Spain
| | - Fernando Ponz
- Centro de Biotecnología y Genómica de Plantas (UPM-INIA), Autopista M-40, km 38, Campus Montegancedo, Pozuelo de Alarcón, 28223 Madrid, Spain
| | - Israel Pagán
- Centro de Biotecnología y Genómica de Plantas (UPM-INIA), Autopista M-40, km 38, Campus Montegancedo, Pozuelo de Alarcón, 28223 Madrid, Spain
| |
Collapse
|
18
|
Gershoni M, Hauser R, Yogev L, Lehavi O, Azem F, Yavetz H, Pietrokovski S, Kleiman SE. A familial study of azoospermic men identifies three novel causative mutations in three new human azoospermia genes. Genet Med 2017; 19:998-1006. [PMID: 28206990 DOI: 10.1038/gim.2016.225] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 12/15/2016] [Indexed: 02/03/2023] Open
Abstract
PURPOSE Up to 1% of all men experience azoospermia, a condition of complete absence of sperm in the semen. The mechanisms and genes involved in spermatogenesis are mainly studied in model organisms, and their relevance to humans is unclear because human genetic studies are very scarce. Our objective was to uncover novel human mutations and genes causing azoospermia due to testicular meiotic maturation arrest. METHODS Affected and unaffected siblings from three families were subjected to whole-exome or whole-genome sequencing, followed by comprehensive bioinformatics analyses to identify mutations suspected to cause azoospermia. These likely mutations were further screened in azoospermic and normozoospermic men and in men proven to be fertile, as well as in a reference database of local populations. RESULTS We identified three novel likely causative mutations of azoospermia in three genes: MEIOB, TEX14, and DNAH6. These genes are associated with different meiotic processes: meiotic crossovers, daughter cell abscission, and possibly rapid prophase movements. CONCLUSION The genes and pathways we identified are fundamental for delineating common causes of azoospermia originating in mutations affecting diverse meiotic processes and have great potential for accelerating approaches to diagnose, treat, and prevent infertility.Genet Med advance online publication 16 February 2017.
Collapse
Affiliation(s)
- Moran Gershoni
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Hauser
- Racine IVF Unit and Male Fertility Clinic and Sperm Bank, Lis Maternity Hospital, Tel Aviv Sourasky Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Leah Yogev
- Racine IVF Unit and Male Fertility Clinic and Sperm Bank, Lis Maternity Hospital, Tel Aviv Sourasky Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ofer Lehavi
- Racine IVF Unit and Male Fertility Clinic and Sperm Bank, Lis Maternity Hospital, Tel Aviv Sourasky Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Foad Azem
- Racine IVF Unit and Male Fertility Clinic and Sperm Bank, Lis Maternity Hospital, Tel Aviv Sourasky Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Haim Yavetz
- Racine IVF Unit and Male Fertility Clinic and Sperm Bank, Lis Maternity Hospital, Tel Aviv Sourasky Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Shmuel Pietrokovski
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Sandra E Kleiman
- Racine IVF Unit and Male Fertility Clinic and Sperm Bank, Lis Maternity Hospital, Tel Aviv Sourasky Medical Center, affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
19
|
Astaxanthin biosynthetic pathway: Molecular phylogenies and evolutionary behaviour of Crt genes in eubacteria. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.plgene.2016.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
20
|
Tokheim C, Bhattacharya R, Niknafs N, Gygax DM, Kim R, Ryan M, Masica DL, Karchin R. Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. Cancer Res 2016; 76:3719-31. [PMID: 27197156 DOI: 10.1158/0008-5472.can-15-3190] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 04/01/2016] [Indexed: 12/12/2022]
Abstract
The impact of somatic missense mutation on cancer etiology and progression is often difficult to interpret. One common approach for assessing the contribution of missense mutations in carcinogenesis is to identify genes mutated with statistically nonrandom frequencies. Even given the large number of sequenced cancer samples currently available, this approach remains underpowered to detect drivers, particularly in less studied cancer types. Alternative statistical and bioinformatic approaches are needed. One approach to increase power is to focus on localized regions of increased missense mutation density or hotspot regions, rather than a whole gene or protein domain. Detecting missense mutation hotspot regions in three-dimensional (3D) protein structure may also be beneficial because linear sequence alone does not fully describe the biologically relevant organization of codons. Here, we present a novel and statistically rigorous algorithm for detecting missense mutation hotspot regions in 3D protein structures. We analyzed approximately 3 × 10(5) mutations from The Cancer Genome Atlas (TCGA) and identified 216 tumor-type-specific hotspot regions. In addition to experimentally determined protein structures, we considered high-quality structural models, which increase genomic coverage from approximately 5,000 to more than 15,000 genes. We provide new evidence that 3D mutation analysis has unique advantages. It enables discovery of hotspot regions in many more genes than previously shown and increases sensitivity to hotspot regions in tumor suppressor genes (TSG). Although hotspot regions have long been known to exist in both TSGs and oncogenes, we provide the first report that they have different characteristic properties in the two types of driver genes. We show how cancer researchers can use our results to link 3D protein structure and the biologic functions of missense mutations in cancer, and to generate testable hypotheses about driver mechanisms. Our results are included in a new interactive website for visualizing protein structures with TCGA mutations and associated hotspot regions. Users can submit new sequence data, facilitating the visualization of mutations in a biologically relevant context. Cancer Res; 76(13); 3719-31. ©2016 AACR.
Collapse
Affiliation(s)
- Collin Tokheim
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Rohit Bhattacharya
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Noushin Niknafs
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | | | - Rick Kim
- In Silico Solutions, Fairfax, Virginia
| | | | - David L Masica
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland
| | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland. Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|
21
|
Ryslik GA, Cheng Y, Modis Y, Zhao H. Leveraging protein quaternary structure to identify oncogenic driver mutations. BMC Bioinformatics 2016; 17:137. [PMID: 27001666 PMCID: PMC4802602 DOI: 10.1186/s12859-016-0963-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 02/18/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Identifying key "driver" mutations which are responsible for tumorigenesis is critical in the development of new oncology drugs. Due to multiple pharmacological successes in treating cancers that are caused by such driver mutations, a large body of methods have been developed to differentiate these mutations from the benign "passenger" mutations which occur in the tumor but do not further progress the disease. Under the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of algorithms that identify these clusters has become a critical area of research. RESULTS We have developed a novel methodology, QuartPAC (Quaternary Protein Amino acid Clustering), that identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins. The R package is available on Bioconductor at: http://bioconductor.jp/packages/3.1/bioc/html/QuartPAC.html . CONCLUSION QuartPAC provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure.
Collapse
Affiliation(s)
- Gregory A. Ryslik
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
| | - Yuwei Cheng
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| | - Yorgo Modis
- />Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH UK
| | - Hongyu Zhao
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| |
Collapse
|
22
|
Meyer MJ, Lapcevic R, Romero AE, Yoon M, Das J, Beltrán JF, Mort M, Stenson PD, Cooper DN, Paccanaro A, Yu H. mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome. Hum Mutat 2016; 37:447-56. [PMID: 26841357 DOI: 10.1002/humu.22963] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/14/2016] [Indexed: 12/20/2022]
Abstract
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported.
Collapse
Affiliation(s)
- Michael J Meyer
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853.,Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, 10065
| | - Ryan Lapcevic
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Mark Yoon
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Juan Felipe Beltrán
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| |
Collapse
|
23
|
Liao PC, Wang KK, Tsai SS, Liu HJ, Huang BH, Chuang KP. Recurrent positive selection and heterogeneous codon usage bias events leading to coexistence of divergent pigeon circoviruses. J Gen Virol 2015; 96:2262-2273. [PMID: 25911731 DOI: 10.1099/vir.0.000163] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The capsid genes from 14 pigeon circovirus (PiCV) sequences, collected from Taiwan between 2009 and 2010, were sequenced and compared with 14 PiCV capsid gene sequences from GenBank. Based on pairwise comparison, PiCV strains from Taiwan shared 73.9-100% nucleotide identity and 72-100% amino acid identity with those of the 14 reported PiCV sequences. Phylogenetic analyses revealed that Taiwanese PiCV isolates can be grouped into two clades: clade 1 comprising isolates from Belgium, Australia, USA, Italy and China, and clade 2 showing close relation to isolates from Germany and France. Recurrent positive selection was detected in clade 1 PiCV lineages, which may contribute to the diversification of predominant PiCV sequences in Taiwan. Further observations suggest that synonymous codon usage variations between PiCV clade 1 and clade 2 may reflect the adaptive divergence on translation efficiency of capsid genes in infectious hosts. Variation in selective pressures acting on the evolutionary divergence and codon usage bias of both clades explains the regional coexistence of virus sequences congeners prevented from competitive exclusion within an island such as Taiwan. Our genotyping results also provide insight into the aetiological agents of PiCV outbreak in Taiwan and we present a comparative analysis of the central coding region of PiCV genome. From the sequence comparison results of 28 PiCVs which differs in regard to the geographical origin and columbid species, we identified conserved regions within the capsid gene that are likely to be suitable for primer selection and vaccine development.
Collapse
Affiliation(s)
- Pei-Chun Liao
- Department of Life Science, National Taiwan Normal University, Taipei 11677, Taiwan, ROC
| | - Kung-Kai Wang
- Graduate Institute of Animal Vaccine Technology, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan, ROC
| | - Shinn-Shyong Tsai
- Department of Veterinary Medicine, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan, ROC
| | - Hung-Jen Liu
- Institute of Molecular Biology, National Chung Hsing University, 40227 Taichung, Taiwan, ROC
| | - Bing-Hong Huang
- Department of Life Science, National Taiwan Normal University, Taipei 11677, Taiwan, ROC
| | - Kuo-Pin Chuang
- Animal Biologics Pilot Production Center, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan, ROC.,Graduate Institute of Animal Vaccine Technology, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung 91201, Taiwan, ROC
| |
Collapse
|
24
|
A spatial simulation approach to account for protein structure when identifying non-random somatic mutations. BMC Bioinformatics 2014; 15:231. [PMID: 24990767 PMCID: PMC4227039 DOI: 10.1186/1471-2105-15-231] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 05/27/2014] [Indexed: 02/08/2023] Open
Abstract
Background Current research suggests that a small set of “driver” mutations are responsible for tumorigenesis while a larger body of “passenger” mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical. Results We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html. Conclusion SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.
Collapse
|
25
|
Borštnik B, Pumpernik D. The apparent enhancement of CpG transversions in primate lineage is a consequence of multiple replacements. J Bioinform Comput Biol 2014; 12:1450011. [PMID: 24969749 DOI: 10.1142/s0219720014500115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We claim that the apparently enhanced CpG transversions in the form CpG to CpC/GpG or to ApG/CpT are caused by the hypermutable CpG to CpA/TpG transition. The nucleotide replacement counts obtained from the human/chimpanzee/gorilla/orangutan sequence alignments representing the replacements due to the evolutionary species divergence and the results of 1000 genomes project that provide us with the differences due to the intraspecies diversification were analyzed to estimate the ratio of CpG versus non-CpG transversion probabilities. The trinucleotide replacement counts were extracted from the regions that are free of functional constraints. The CpG transversion probabilities based upon the genomic comparisons were found to exceed more than twice the non-CpG transversions. The diversity data emerging from 14 population groups were partitioned in five classes as a function of the parameter quantifying the spread of the polymorphic allele among the group of individuals. The results based upon the human polymorphism exhibit a trend where CpG over non-CpG transversion probability ratio is less and less exceeding unity as the values of the derived allele frequency (DAF) of snps are diminishing. A computer simulation of a simplified model indicates that the phenomenon of the apparent enhancement of CpG transversions can have its source in the interference of the entropic effects with the maximum likelihood methodologies.
Collapse
Affiliation(s)
- Branko Borštnik
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | |
Collapse
|
26
|
Tamborero D, Gonzalez-Perez A, Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. ACTA ACUST UNITED AC 2013; 29:2238-44. [PMID: 23884480 DOI: 10.1093/bioinformatics/btt395] [Citation(s) in RCA: 303] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Gain-of-function mutations often cluster in specific protein regions, a signal that those mutations provide an adaptive advantage to cancer cells and consequently are positively selected during clonal evolution of tumours. We sought to determine the overall extent of this feature in cancer and the possibility to use this feature to identify drivers. RESULTS We have developed OncodriveCLUST, a method to identify genes with a significant bias towards mutation clustering within the protein sequence. This method constructs the background model by assessing coding-silent mutations, which are assumed not to be under positive selection and thus may reflect the baseline tendency of somatic mutations to be clustered. OncodriveCLUST analysis of the Catalogue of Somatic Mutations in Cancer retrieved a list of genes enriched by the Cancer Gene Census, prioritizing those with dominant phenotypes but also highlighting some recessive cancer genes, which showed wider but still delimited mutation clusters. Assessment of datasets from The Cancer Genome Atlas demonstrated that OncodriveCLUST selected cancer genes that were nevertheless missed by methods based on frequency and functional impact criteria. This stressed the benefit of combining approaches based on complementary principles to identify driver mutations. We propose OncodriveCLUST as an effective tool for that purpose. AVAILABILITY OncodriveCLUST has been implemented as a Python script and is freely available from http://bg.upf.edu/oncodriveclust CONTACT nuria.lopez@upf.edu or abel.gonzalez@upf.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Tamborero
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona and Institució Catalana de Recerca i Estudis Avançats ICREA, Passeig Lluis Companys, 23, 08010 Barcelona, Spain
| | | | | |
Collapse
|
27
|
Utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 2013; 14:190. [PMID: 23758891 PMCID: PMC3691676 DOI: 10.1186/1471-2105-14-190] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2013] [Accepted: 05/28/2013] [Indexed: 02/07/2023] Open
Abstract
Background Human cancer is caused by the accumulation of somatic mutations in tumor suppressors and oncogenes within the genome. In the case of oncogenes, recent theory suggests that there are only a few key “driver” mutations responsible for tumorigenesis. As there have been significant pharmacological successes in developing drugs that treat cancers that carry these driver mutations, several methods that rely on mutational clustering have been developed to identify them. However, these methods consider proteins as a single strand without taking their spatial structures into account. We propose an extension to current methodology that incorporates protein tertiary structure in order to increase our power when identifying mutation clustering. Results We have developed iPAC (identification of Protein Amino acid Clustering), an algorithm that identifies non-random somatic mutations in proteins while taking into account the three dimensional protein structure. By using the tertiary information, we are able to detect both novel clusters in proteins that are known to exhibit mutation clustering as well as identify clusters in proteins without evidence of clustering based on existing methods. For example, by combining the data in the Protein Data Bank (PDB) and the Catalogue of Somatic Mutations in Cancer, our algorithm identifies new mutational clusters in well known cancer proteins such as KRAS and PI3KC α. Further, by utilizing the tertiary structure, our algorithm also identifies clusters in EGFR, EIF2AK2, and other proteins that are not identified by current methodology. The R package is available at: http://www.bioconductor.org/packages/2.12/bioc/html/iPAC.html. Conclusion Our algorithm extends the current methodology to identify oncogenic activating driver mutations by utilizing tertiary protein structure when identifying nonrandom somatic residue mutation clusters.
Collapse
|
28
|
McFerrin LG, Stone EA. The non-random clustering of non-synonymous substitutions and its relationship to evolutionary rate. BMC Genomics 2011; 12:415. [PMID: 21846337 PMCID: PMC3176261 DOI: 10.1186/1471-2164-12-415] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2011] [Accepted: 08/16/2011] [Indexed: 01/11/2023] Open
Abstract
Background Protein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids. This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. We invert this paradigm by leveraging the regional nature of protein structure and function to both illuminate and make use of genome-wide patterns of local sequence conservation. Results Our hypothesis is that the regional nature of structural and functional constraints will assert a positive autocorrelation on the evolutionary rates of neighboring sites, which, in a pairwise comparison of orthologous proteins, will manifest itself as the clustering of non-synonymous changes across the amino acid sequence. We introduce a dispersion ratio statistic to test this and related hypotheses. Using genome-wide interspecific comparisons of orthologous protein pairs, we reveal a strong log-linear relationship between the degree of clustering and the intensity of constraint. We further demonstrate how this relationship varies with the evolutionary distance between the species being compared. We provide some evidence that proteins with a history of positive selection deviate from genome-wide trends. Conclusions We find a significant association between the evolutionary rate of a protein and the degree to which non-synonymous changes cluster along its primary sequence. We show that clustering is a non-redundant predictor of evolutionary rate, and we speculate that conflicting signals of clustering and constraint may be indicative of a historical period of relaxed selection.
Collapse
Affiliation(s)
- Lisa G McFerrin
- Graduate program in Bioinformatics, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | |
Collapse
|
29
|
Finch CE. Evolution in health and medicine Sackler colloquium: Evolution of the human lifespan and diseases of aging: roles of infection, inflammation, and nutrition. Proc Natl Acad Sci U S A 2010; 107 Suppl 1:1718-24. [PMID: 19966301 PMCID: PMC2868286 DOI: 10.1073/pnas.0909606106] [Citation(s) in RCA: 227] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Humans have evolved much longer lifespans than the great apes, which rarely exceed 50 years. Since 1800, lifespans have doubled again, largely due to improvements in environment, food, and medicine that minimized mortality at earlier ages. Infections cause most mortality in wild chimpanzees and in traditional forager-farmers with limited access to modern medicine. Although we know little of the diseases of aging under premodern conditions, in captivity, chimpanzees present a lower incidence of cancer, ischemic heart disease, and neurodegeneration than current human populations. These major differences in pathology of aging are discussed in terms of genes that mediate infection, inflammation, and nutrition. Apolipoprotein E alleles are proposed as a prototype of pleiotropic genes, which influence immune responses, arterial and Alzheimer's disease, and brain development.
Collapse
Affiliation(s)
- Caleb E. Finch
- Davis School of Gerontology and the University of Southern California, Los Angeles, CA 90089
| |
Collapse
|
30
|
Ye J, Pavlicek A, Lunney EA, Rejto PA, Teng CH. Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics 2010; 11:11. [PMID: 20053295 PMCID: PMC2822753 DOI: 10.1186/1471-2105-11-11] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2009] [Accepted: 01/07/2010] [Indexed: 02/07/2023] Open
Abstract
Background Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention. Results We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC) database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and β-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors. Conclusions Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.
Collapse
Affiliation(s)
- Jingjing Ye
- Global Pre-Clinical Statistics, Pfizer Global Research and Development, San Diego, CA 92121, USA.
| | | | | | | | | |
Collapse
|
31
|
Ortiz M, Guex N, Patin E, Martin O, Xenarios I, Ciuffi A, Quintana-Murci L, Telenti A. Evolutionary trajectories of primate genes involved in HIV pathogenesis. Mol Biol Evol 2009; 26:2865-75. [PMID: 19726537 DOI: 10.1093/molbev/msp197] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The current availability of five complete genomes of different primate species allows the analysis of genetic divergence over the last 40 million years of evolution. We hypothesized that the interspecies differences observed in susceptibility to HIV-1 would be influenced by the long-range selective pressures on host genes associated with HIV-1 pathogenesis. We established a list of human genes (n = 140) proposed to be involved in HIV-1 biology and pathogenesis and a control set of 100 random genes. We retrieved the orthologous genes from the genome of humans and of four nonhuman primates (Pan troglodytes, Pongo pygmaeus abeli, Macaca mulatta, and Callithrix jacchus) and analyzed the nucleotide substitution patterns of this data set using codon-based maximum likelihood procedures. In addition, we evaluated whether the candidate genes have been targets of recent positive selection in humans by analyzing HapMap Phase 2 single-nucleotide polymorphisms genotyped in a region centered on each candidate gene. A total of 1,064 sequences were used for the analyses. Similar median K(A)/K(S) values were estimated for the set of genes involved in HIV-1 pathogenesis and for control genes, 0.19 and 0.15, respectively. However, genes of the innate immunity had median values of 0.37 (P value = 0.0001, compared with control genes), and genes of intrinsic cellular defense had K(A)/K(S) values around or greater than 1.0 (P value = 0.0002). Detailed assessment allowed the identification of residues under positive selection in 13 proteins: AKT1, APOBEC3G, APOBEC3H, CD4, DEFB1, GML, IL4, IL8RA, L-SIGN/CLEC4M, PTPRC/CD45, Tetherin/BST2, TLR7, and TRIM5alpha. A number of those residues are relevant for HIV-1 biology. The set of 140 genes involved in HIV-1 pathogenesis did not show a significant enrichment in signals of recent positive selection in humans (intraspecies selection). However, we identified within or near these genes 24 polymorphisms showing strong signatures of recent positive selection. Interestingly, the DEFB1 gene presented signatures of both interspecies positive selection in primates and intraspecies recent positive selection in humans. The systematic assessment of long-acting selective pressures on primate genomes is a useful tool to extend our understanding of genetic variation influencing contemporary susceptibility to HIV-1.
Collapse
Affiliation(s)
- Millán Ortiz
- Institute of Microbiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Vallender EJ. Bioinformatic approaches to identifying orthologs and assessing evolutionary relationships. Methods 2009; 49:50-5. [PMID: 19467333 PMCID: PMC2732758 DOI: 10.1016/j.ymeth.2009.05.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Revised: 04/27/2009] [Accepted: 05/18/2009] [Indexed: 01/26/2023] Open
Abstract
Non-human primate genetic research defines itself through comparisons to humans; few other species require the implicit comparative genomics approaches. Because of this, errors in the identification of non-human primate orthologs can have profound effects. Gene prediction algorithms can and have produced false transcripts that have become incorporated into commonly used databases and genomics portals. These false transcripts can arise from deficiencies in the algorithms themselves as well as through gaps and other problems in the genome assembly. Putative genes generated can not only miss microexons, but improperly incorporate non-coding sequence resulting in pseudogenes or other transcripts without biological relevance. False transcripts then become identified as orthologs to established human genes and are too often taken as gospel by unwary researchers. Here, the processes through which these errors propagate are isolated and methods are described for identifying false orthologs in databases with several representative errors illustrated. Through these steps any researcher seeking to make use of non-human primate genetic information will have the tools at their disposal to ascertain where errors exist and to remedy them once encountered.
Collapse
Affiliation(s)
- Eric J Vallender
- Division of Neurosciences, New England Primate Research Center, Harvard Medical School, Pine Hill Drive, Southborough Campus, Southborough, MA 01772, USA.
| |
Collapse
|
33
|
Zhang Z, Townsend JP. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Comput Biol 2009; 5:e1000421. [PMID: 19557160 PMCID: PMC2695770 DOI: 10.1371/journal.pcbi.1000421] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 05/21/2009] [Indexed: 11/19/2022] Open
Abstract
A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.
Collapse
Affiliation(s)
- Zhang Zhang
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Jeffrey P. Townsend
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
34
|
Wang X, Gowik U, Tang H, Bowers JE, Westhoff P, Paterson AH. Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses. Genome Biol 2009; 10:R68. [PMID: 19549309 PMCID: PMC2718502 DOI: 10.1186/gb-2009-10-6-r68] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Revised: 05/27/2009] [Accepted: 06/23/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor. RESULTS We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function. CONCLUSIONS Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis.
Collapse
Affiliation(s)
- Xiyin Wang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA
- College of Sciences, Hebei Polytechnic University, Tangshan, Hebei 063000, China
| | - Udo Gowik
- Institut fur Entwicklungs- und Molekularbiologie der Pflanzen, Heinrich-Heine-Universitat 1, Universitatsstrasse, D-40225 Dusseldorf, Germany
| | - Haibao Tang
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | - John E Bowers
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA
| | - Peter Westhoff
- Institut fur Entwicklungs- und Molekularbiologie der Pflanzen, Heinrich-Heine-Universitat 1, Universitatsstrasse, D-40225 Dusseldorf, Germany
| | - Andrew H Paterson
- Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
35
|
Abstract
Positive selection for protein function can lead to multiple mutations within a small stretch of DNA, i.e., to a cluster of mutations. Recently, Wagner proposed a method to detect such mutation clusters. His method, however, did not take into account that residues with high solvent accessibility are inherently more variable than residues with low solvent accessibility. Here, we propose a new algorithm to detect clustered evolution. Our algorithm controls for different substitution probabilities at buried and exposed sites in the tertiary protein structure, and uses random permutations to calculate accurate P values for inferred clusters. We apply the algorithm to genomes of bacteria, fly, and mammals, and find several clusters of mutations in functionally important regions of proteins. Surprisingly, clustered evolution is a relatively rare phenomenon. Only between 2% and 10% of the genes we analyze contain a statistically significant mutation cluster. We also find that not controlling for solvent accessibility leads to an excess of clusters in terminal and solvent-exposed regions of proteins. Our algorithm provides a novel method to identify functionally relevant divergence between groups of species. Moreover, it could also be useful to detect artifacts in automatically assembled genomes.
Collapse
Affiliation(s)
- Tong Zhou
- Center for Computational Biology and Bioinformatics, Section of Integrative Biology, University of Texas at Austin, Austin, Texas, United States of America
| | - Peter J. Enyeart
- Institute for Cell and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America
| | - Claus O. Wilke
- Center for Computational Biology and Bioinformatics, Section of Integrative Biology, University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cell and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
36
|
|
37
|
Ortiz M, Kaessmann H, Zhang K, Bashirova A, Carrington M, Quintana-Murci L, Telenti A. The evolutionary history of the CD209 (DC-SIGN) family in humans and non-human primates. Genes Immun 2008; 9:483-92. [PMID: 18528403 DOI: 10.1038/gene.2008.40] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The CD209 gene family that encodes C-type lectins in primates includes CD209 (DC-SIGN), CD209L (L-SIGN) and CD209L2. Understanding the evolution of these genes can help understand the duplication events generating this family, the process leading to the repeated neck region and identify protein domains under selective pressure. We compiled sequences from 14 primates representing 40 million years of evolution and from three non-primate mammal species. Phylogenetic analyses used Bayesian inference, and nucleotide substitutional patterns were assessed by codon-based maximum likelihood. Analyses suggest that CD209 genes emerged from a first duplication event in the common ancestor of anthropoids, yielding CD209L2 and an ancestral CD209 gene, which, in turn, duplicated in the common Old World primate ancestor, giving rise to CD209L and CD209. K(A)/K(S) values averaged over the entire tree were 0.43 (CD209), 0.52 (CD209L) and 0.35 (CD209L2), consistent with overall signatures of purifying selection. We also assessed the Toll-like receptor (TLR) gene family, which shares with CD209 genes a common profile of evolutionary constraint. The general feature of purifying selection of CD209 genes, despite an apparent redundancy (gene absence and gene loss), may reflect the need to faithfully recognize a multiplicity of pathogen motifs, commensals and a number of self-antigens.
Collapse
Affiliation(s)
- M Ortiz
- Institute of Microbiology, University of Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | |
Collapse
|