1
|
Song J, Liu Y, Guo R, Pacheco A, Muñoz-Zavala C, Song W, Wang H, Cao S, Hu G, Zheng H, Dhliwayo T, San Vicente F, Prasanna BM, Wang C, Zhang X. Exploiting genomic tools for genetic dissection and improving the resistance to Fusarium stalk rot in tropical maize. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:109. [PMID: 38649662 DOI: 10.1007/s00122-024-04597-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 03/07/2024] [Indexed: 04/25/2024]
Abstract
KEY MESSAGE A stable genomic region conferring FSR resistance at ~250 Mb on chromosome 1 was identified by GWAS. Genomic prediction has the potential to improve FSR resistance. Fusarium stalk rot (FSR) is a global destructive disease in maize; the efficiency of phenotypic selection for improving FSR resistance was low. Novel genomic tools of genome-wide association study (GWAS) and genomic prediction (GP) provide an opportunity for genetic dissection and improving FSR resistance. In this study, GWAS and GP analyses were performed on 562 tropical maize inbred lines consisting of two populations. In total, 15 SNPs significantly associated with FSR resistance were identified across two populations and the combinedPOP consisting of all 562 inbred lines, with the P-values ranging from 1.99 × 10-7 to 8.27 × 10-13, and the phenotypic variance explained (PVE) values ranging from 0.94 to 8.30%. The genetic effects of the 15 favorable alleles ranged from -4.29 to -14.21% of the FSR severity. One stable genomic region at ~ 250 Mb on chromosome 1 was detected across all populations, and the PVE values of the SNPs detected in this region ranged from 2.16 to 5.18%. Prediction accuracies of FSR severity estimated with the genome-wide SNPs were moderate and ranged from 0.29 to 0.51. By incorporating genotype-by-environment interaction, prediction accuracies were improved between 0.36 and 0.55 in different breeding scenarios. Considering both the genome coverage and the threshold of the P-value of SNPs to select a subset of molecular markers further improved the prediction accuracies. These findings extend the knowledge of exploiting genomic tools for genetic dissection and improving FSR resistance in tropical maize.
Collapse
Affiliation(s)
- Junqiao Song
- Henan University of Science and Technology, Luoyang, 471000, Henan, China
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- Anyang Academy of Agricultural Sciences, Anyang, 455000, Henan, China
| | - Yubo Liu
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- CIMMYT-China Specialty Maize Research Center, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai, 200063, China
| | - Rui Guo
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, Shijiazhuang, 050035, Hebei, China
| | - Angela Pacheco
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
| | - Carlos Muñoz-Zavala
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
| | - Wei Song
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, Shijiazhuang, 050035, Hebei, China
| | - Hui Wang
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- CIMMYT-China Specialty Maize Research Center, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai, 200063, China
| | - Shiliang Cao
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- Institute of Maize Research, Heilongjiang Academy of Agricultural Sciences, Harbin, 150070, Heilongjiang, China
| | - Guanghui Hu
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
- Institute of Maize Research, Heilongjiang Academy of Agricultural Sciences, Harbin, 150070, Heilongjiang, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Crop Breeding and Cultivation Research Institute, Shanghai Academy of Agricultural Sciences, Shanghai, 200063, China
| | - Thanda Dhliwayo
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
| | - Felix San Vicente
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico
| | - Boddupalli M Prasanna
- International Maize and Wheat Improvement Center (CIMMYT), Village Market, P. O. Box 1041, Nairobi, 00621, Kenya
| | - Chunping Wang
- Henan University of Science and Technology, Luoyang, 471000, Henan, China.
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT), 56237, Texcoco, Mexico.
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (CAAS), CIMMYT-China Office, 12 Zhongguancun South Street, Beijing, 100081, China.
- Nanfan Research Institute, CAAS, Sanya, 572024, Hainan, China.
| |
Collapse
|
2
|
King SB, Singh M. Primate protein-ligand interfaces exhibit significant conservation and unveil human-specific evolutionary drivers. PLoS Comput Biol 2023; 19:e1010966. [PMID: 36952575 PMCID: PMC10035887 DOI: 10.1371/journal.pcbi.1010966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 02/22/2023] [Indexed: 03/25/2023] Open
Abstract
Despite the vast phenotypic differences observed across primates, their protein products are largely similar to each other at the sequence level. We hypothesized that, since proteins accomplish all their functions via interactions with other molecules, alterations in the sites that participate in these interactions may be of critical importance. To uncover the extent to which these sites evolve across primates, we built a structurally-derived dataset of ~4,200 one-to-one orthologous sequence groups across 18 primate species, consisting of ~68,000 ligand-binding sites that interact with DNA, RNA, small molecules, ions, or peptides. Using this dataset, we identify functionally important patterns of conservation and variation within the amino acid residues that facilitate protein-ligand interactions across the primate phylogeny. We uncover that interaction sites are significantly more conserved than other sites, and that sites binding DNA and RNA further exhibit the lowest levels of variation. We also show that the subset of ligand-binding sites that do vary are enriched in components of gene regulatory pathways and uncover several instances of human-specific ligand-binding site changes within transcription factors. Altogether, our results suggest that ligand-binding sites have experienced selective pressure in primates and propose that variation in these sites may have an outsized effect on phenotypic variation in primates through pleiotropic effects on gene regulation.
Collapse
Affiliation(s)
- Sean B. King
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
3
|
Zhang J. What Has Genomics Taught An Evolutionary Biologist? GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1-12. [PMID: 36720382 PMCID: PMC10373158 DOI: 10.1016/j.gpb.2023.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 01/06/2023] [Accepted: 01/19/2023] [Indexed: 01/30/2023]
Abstract
Genomics, an interdisciplinary field of biology on the structure, function, and evolution of genomes, has revolutionized many subdisciplines of life sciences, including my field of evolutionary biology, by supplying huge data, bringing high-throughput technologies, and offering a new approach to biology. In this review, I describe what I have learned from genomics and highlight the fundamental knowledge and mechanistic insights gained. I focus on three broad topics that are central to evolutionary biology and beyond-variation, interaction, and selection-and use primarily my own research and study subjects as examples. In the next decade or two, I expect that the most important contributions of genomics to evolutionary biology will be to provide genome sequences of nearly all known species on Earth, facilitate high-throughput phenotyping of natural variants and systematically constructed mutants for mapping genotype-phenotype-fitness landscapes, and assist the determination of causality in evolutionary processes using experimental evolution.
Collapse
Affiliation(s)
- Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
4
|
Chen P, Michel AH, Zhang J. Transposon insertional mutagenesis of diverse yeast strains suggests coordinated gene essentiality polymorphisms. Nat Commun 2022; 13:1490. [PMID: 35314699 PMCID: PMC8938418 DOI: 10.1038/s41467-022-29228-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 03/01/2022] [Indexed: 12/18/2022] Open
Abstract
Due to epistasis, the same mutation can have drastically different phenotypic consequences in different individuals. This phenomenon is pertinent to precision medicine as well as antimicrobial drug development, but its general characteristics are largely unknown. We approach this question by genome-wide assessment of gene essentiality polymorphism in 16 Saccharomyces cerevisiae strains using transposon insertional mutagenesis. Essentiality polymorphism is observed for 9.8% of genes, most of which have had repeated essentiality switches in evolution. Genes exhibiting essentiality polymorphism lean toward having intermediate numbers of genetic and protein interactions. Gene essentiality changes tend to occur concordantly among components of the same protein complex or metabolic pathway and among a group of over 100 mitochondrial proteins, revealing molecular machines or functional modules as units of gene essentiality variation. Most essential genes tolerate transposon insertions consistently among strains in one or more coding segments, delineating nonessential regions within essential genes.
Collapse
Affiliation(s)
- Piaopiao Chen
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Agnès H Michel
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
5
|
Jia Y, Qin C, Traw MB, Chen X, He Y, Kai J, Yang S, Wang L, Hurst LD. In rice splice variants that restore the reading frame after frameshifting indel introduction are common, often induced by the indels and sometimes lead to organism-level rescue. PLoS Genet 2022; 18:e1010071. [PMID: 35180223 PMCID: PMC8893660 DOI: 10.1371/journal.pgen.1010071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 03/03/2022] [Accepted: 02/02/2022] [Indexed: 11/24/2022] Open
Abstract
The introduction of frameshifting non-3n indels enables the identification of gene-trait associations. However, it has been hypothesised that recovery of the original reading frame owing to usage of non-canonical splice forms could cause rescue. To date there is very little evidence for organism-level rescue by such a mechanism and it is unknown how commonly indels induce, or are otherwise associated with, frame-restoring splice forms. We perform CRISPR/Cas9 editing of randomly selected loci in rice to investigate these issues. We find that the majority of loci have a frame-restoring isoform. Importantly, three quarters of these isoforms are not seen in the absence of the indels, consistent with indels commonly inducing novel isoforms. This is supported by analysis in the context of NMD knockdowns. We consider in detail the two top rescue candidates, in wax deficient anther 1 (wda1) and brittle culm (bc10), finding that organismal-level rescue in both cases is strong but owing to different splice modification routes. More generally, however, as frame-restoring isoforms are low abundance and possibly too disruptive, such rescue we suggest to be the rare exception, not the rule. Nonetheless, assuming that indels commonly induce frame-restoring isoforms, these results emphasize the need to examine RNA level effects of non-3n indels and suggest that multiple non-3n indels in any given gene are advisable to probe a gene’s trait associations. As protein coding genes are read in units of three (codons), insertions or deletions (indels) that are not a multiple of three long (non 3n) are expected to be especially harmful. Whether they are is important both for interpreting the results of non-3n indel experiments to probe a gene’s functional importance and for diagnostics. Particularly enigmatic are incidences where some non-3n changes in a gene compromise phenotypes while other seemingly comparable ones do not. One explanation for the latter is that a non-3n indel might be rescued via a frame-restoring splice form. Here we examine this hypothesis by inducing non-3n indels in many genes in rice and find that many non-3n indels are associated with a splice form that restores the reading frame. In the majority of these cases the indel appears to induce the potential rescuing splice form. We examine two top hit cases in detail and show functional rescue by splice modification. More generally, the frame-restoring forms are, however, low abundance and probably result in compromised proteins. We conclude then that splice mediated rescue is possible, but probably uncommon. Nonetheless it should not be overlooked in experimental design and interpretation.
Collapse
Affiliation(s)
- Yanxiao Jia
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Chao Qin
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Milton Brian Traw
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Xiaonan Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Ying He
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Jing Kai
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Sihai Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
- * E-mail: (SY); (LW); (LDH)
| | - Long Wang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
- * E-mail: (SY); (LW); (LDH)
| | - Laurence D. Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
- * E-mail: (SY); (LW); (LDH)
| |
Collapse
|
6
|
Ghadie M, Xia Y. Mutation Edgotype Drives Fitness Effect in Human. FRONTIERS IN BIOINFORMATICS 2021; 1:690769. [PMID: 36303776 PMCID: PMC9581054 DOI: 10.3389/fbinf.2021.690769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 08/18/2021] [Indexed: 11/24/2022] Open
Abstract
Missense mutations are known to perturb protein-protein interaction networks (known as interactome networks) in different ways. However, it remains unknown how different interactome perturbation patterns (“edgotypes”) impact organismal fitness. Here, we estimate the fitness effect of missense mutations with different interactome perturbation patterns in human, by calculating the fractions of neutral and deleterious mutations that do not disrupt PPIs (“quasi-wild-type”), or disrupt PPIs either by disrupting the binding interface (“edgetic”) or by disrupting overall protein stability (“quasi-null”). We first map pathogenic mutations and common non-pathogenic mutations onto homology-based three-dimensional structural models of proteins and protein-protein interactions in human. Next, we perform structure-based calculations to classify each mutation as either quasi-wild-type, edgetic, or quasi-null. Using our predicted as well as experimentally determined interactome perturbation patterns, we estimate that >∼40% of quasi-wild-type mutations are effectively neutral and the remaining are mostly mildly deleterious, that >∼75% of edgetic mutations are only mildly deleterious, and that up to ∼75% of quasi-null mutations may be strongly detrimental. These estimates are the first such estimates of fitness effect for different network perturbation patterns in any interactome. Our results suggest that while mutations that do not disrupt the interactome tend to be effectively neutral, the majority of human PPIs are under strong purifying selection and the stability of most human proteins is essential to human life.
Collapse
|
7
|
Qian Y, Zhang R, Jiang X, Wu G. The constraints between amino acids influence the unequal distribution of codons and protein sequence evolution. ROYAL SOCIETY OPEN SCIENCE 2021; 8:201852. [PMID: 34109035 PMCID: PMC8170185 DOI: 10.1098/rsos.201852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 03/31/2021] [Indexed: 06/12/2023]
Abstract
Four nucleotides (A, U, C and G) constitute 64 codons at free combination but 64 codons are unequally assigned to 21 items (20 amino acids plus one stop). About 500 amino acids are known but only 20 are selected to make up the proteins. However, the relationships between amino acid and codon and between 20 amino acids have been unclear. In this paper, we studied the relationships between 20 amino acids in 33 species and found there were three constraints between 20 amino acids, such as the relatively stable mean carbon and hydrogen (C : H) ratios (0.50), similarity interactions between the constituent ratios of amino acids, and the frequency of amino acids according with Poisson distribution under certain conditions. We demonstrated that the unequal distribution of 64 codons and the choice of amino acids in molecular evolution would be constrained to remain stable C : H ratios. The constituent ratios and frequency of 20 amino acids in a species or a protein are two determinants of protein sequence evolution, so this finding showed the constraints between 20 amino acids played an important role in protein sequence evolution.
Collapse
Affiliation(s)
- Yi Qian
- Department of General Surgery, Zhongda Hospital, Southeast University, 87 Ding Jiaqiao, Nanjing 210009, People's Republic of China
| | - Rui Zhang
- Medical School, Southeast University, 87 Ding Jiaqiao, Nanjing 210009, People's Republic of China
| | - Xinglu Jiang
- Medical School, Southeast University, 87 Ding Jiaqiao, Nanjing 210009, People's Republic of China
| | - Guoqiu Wu
- Center of Clinical Laboratory Medicine, Zhongda Hospital, Medical School of Southeast University, Southeast University, 87 Ding Jiaqiao, Nanjing 210009, People's Republic of China
- Jiangsu Provincial Key Laboratory of Critical Care Medicine, Southeast University, 87 Ding Jiaqiao, Nanjing 210009, People's Republic of China
| |
Collapse
|
8
|
Swamy KBS, Schuyler SC, Leu JY. Protein Complexes Form a Basis for Complex Hybrid Incompatibility. Front Genet 2021; 12:609766. [PMID: 33633780 PMCID: PMC7900514 DOI: 10.3389/fgene.2021.609766] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 01/20/2021] [Indexed: 12/20/2022] Open
Abstract
Proteins are the workhorses of the cell and execute many of their functions by interacting with other proteins forming protein complexes. Multi-protein complexes are an admixture of subunits, change their interaction partners, and modulate their functions and cellular physiology in response to environmental changes. When two species mate, the hybrid offspring are usually inviable or sterile because of large-scale differences in the genetic makeup between the two parents causing incompatible genetic interactions. Such reciprocal-sign epistasis between inter-specific alleles is not limited to incompatible interactions between just one gene pair; and, usually involves multiple genes. Many of these multi-locus incompatibilities show visible defects, only in the presence of all the interactions, making it hard to characterize. Understanding the dynamics of protein-protein interactions (PPIs) leading to multi-protein complexes is better suited to characterize multi-locus incompatibilities, compared to studying them with traditional approaches of genetics and molecular biology. The advances in omics technologies, which includes genomics, transcriptomics, and proteomics can help achieve this end. This is especially relevant when studying non-model organisms. Here, we discuss the recent progress in the understanding of hybrid genetic incompatibility; omics technologies, and how together they have helped in characterizing protein complexes and in turn multi-locus incompatibilities. We also review advances in bioinformatic techniques suitable for this purpose and propose directions for leveraging the knowledge gained from model-organisms to identify genetic incompatibilities in non-model organisms.
Collapse
Affiliation(s)
- Krishna B. S. Swamy
- Division of Biological and Life Sciences, School of Arts and Sciences, Ahmedabad University, Ahmedabad, India
| | - Scott C. Schuyler
- Department of Biomedical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Division of Head and Neck Surgery, Department of Otolaryngology, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Jun-Yi Leu
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
9
|
Watson A, Habib M, Bapteste E. Phylosystemics: Merging Phylogenomics, Systems Biology, and Ecology to Study Evolution. Trends Microbiol 2020; 28:176-190. [DOI: 10.1016/j.tim.2019.10.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 10/21/2019] [Accepted: 10/22/2019] [Indexed: 11/28/2022]
|
10
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
11
|
Estimating dispensable content in the human interactome. Nat Commun 2019; 10:3205. [PMID: 31324802 PMCID: PMC6642175 DOI: 10.1038/s41467-019-11180-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 06/21/2019] [Indexed: 11/21/2022] Open
Abstract
Protein-protein interaction (PPI) networks (interactome networks) have successfully advanced our knowledge of molecular function, disease and evolution. While much progress has been made in quantifying errors and biases in experimental PPI datasets, it remains unknown what fraction of the error-free PPIs in the cell are completely dispensable, i.e., effectively neutral upon disruption. Here, we estimate dispensable content in the human interactome by calculating the fractions of PPIs disrupted by neutral and non-neutral mutations. Starting with the human reference interactome determined by experiments, we construct a human structural interactome by building homology-based three-dimensional structural models for PPIs. Next, we map common mutations from healthy individuals as well as Mendelian disease-causing mutations onto the human structural interactome, and perform structure-based calculations of how these mutations perturb the interactome. Using our predicted as well as experimentally-determined interactome perturbation patterns by common and disease mutations, we estimate that <~20% of the human interactome is completely dispensable. The fraction of protein-protein interactions (PPIs) that can be disrupted without fitness effect is unknown. Here, the authors model how disease-causing mutations and common mutations carried by healthy people perturb the interactome, and estimate that <20% of human PPIs are completely dispensable.
Collapse
|
12
|
Di Silvestre D, Bergamaschi A, Bellini E, Mauri P. Large Scale Proteomic Data and Network-Based Systems Biology Approaches to Explore the Plant World. Proteomes 2018; 6:proteomes6020027. [PMID: 29865292 PMCID: PMC6027444 DOI: 10.3390/proteomes6020027] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 05/30/2018] [Accepted: 06/01/2018] [Indexed: 12/26/2022] Open
Abstract
The investigation of plant organisms by means of data-derived systems biology approaches based on network modeling is mainly characterized by genomic data, while the potential of proteomics is largely unexplored. This delay is mainly caused by the paucity of plant genomic/proteomic sequences and annotations which are fundamental to perform mass-spectrometry (MS) data interpretation. However, Next Generation Sequencing (NGS) techniques are contributing to filling this gap and an increasing number of studies are focusing on plant proteome profiling and protein-protein interactions (PPIs) identification. Interesting results were obtained by evaluating the topology of PPI networks in the context of organ-associated biological processes as well as plant-pathogen relationships. These examples foreshadow well the benefits that these approaches may provide to plant research. Thus, in addition to providing an overview of the main-omic technologies recently used on plant organisms, we will focus on studies that rely on concepts of module, hub and shortest path, and how they can contribute to the plant discovery processes. In this scenario, we will also consider gene co-expression networks, and some examples of integration with metabolomic data and genome-wide association studies (GWAS) to select candidate genes will be mentioned.
Collapse
Affiliation(s)
- Dario Di Silvestre
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| | - Andrea Bergamaschi
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| | - Edoardo Bellini
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| | - PierLuigi Mauri
- Institute for Biomedical Technologies-National Research Council; F.lli Cervi 93, 20090 Segrate, Milan, Italy.
| |
Collapse
|
13
|
Corona E, Wang L, Ko D, Patel CJ. Systematic detection of positive selection in the human-pathogen interactome and lasting effects on infectious disease susceptibility. PLoS One 2018; 13:e0196676. [PMID: 29799843 PMCID: PMC5969750 DOI: 10.1371/journal.pone.0196676] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2018] [Accepted: 04/17/2018] [Indexed: 01/07/2023] Open
Abstract
Infectious disease has shaped the natural genetic diversity of humans throughout the world. A new approach to capture positive selection driven by pathogens would provide information regarding pathogen exposure in distinct human populations and the constantly evolving arms race between host and disease-causing agents. We created a human pathogen interaction database and used the integrated haplotype score (iHS) to detect recent positive selection in genes that interact with proteins from 26 different pathogens. We used the Human Genome Diversity Panel to identify specific populations harboring pathogen-interacting genes that have undergone positive selection. We found that human genes that interact with 9 pathogen species show evidence of recent positive selection. These pathogens are Yersenia pestis, human immunodeficiency virus (HIV) 1, Zaire ebolavirus, Francisella tularensis, dengue virus, human respiratory syncytial virus, measles virus, Rubella virus, and Bacillus anthracis. For HIV-1, GWAS demonstrate that some naturally selected variants in the host-pathogen protein interaction networks continue to have functional consequences for susceptibility to these pathogens. We show that selected human genes were enriched for HIV susceptibility variants (identified through GWAS), providing further support for the hypothesis that ancient humans were exposed to lentivirus pandemics. Human genes in the Italian, Miao, and Biaka Pygmy populations that interact with Y. pestis show significant signs of selection. These results reveal some of the genetic footprints created by pathogens in the human genome that may have left lasting marks on susceptibility to infectious disease.
Collapse
Affiliation(s)
- Erik Corona
- Department of Biomedical Informatics, RTI International, Durham, NC, United States of America
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States of America
- * E-mail:
| | - Liuyang Wang
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, United States of America
| | - Dennis Ko
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC, United States of America
- Department of Medicine, Duke University Medical Center, Durham, NC, United States of America
| | - Chirag J. Patel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States of America
| |
Collapse
|
14
|
Ghadie MA, Coulombe-Huntington J, Xia Y. Interactome evolution: insights from genome-wide analyses of protein-protein interactions. Curr Opin Struct Biol 2017; 50:42-48. [PMID: 29112911 DOI: 10.1016/j.sbi.2017.10.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/05/2017] [Accepted: 10/12/2017] [Indexed: 12/12/2022]
Abstract
We highlight new evolutionary insights enabled by recent genome-wide studies on protein-protein interaction (PPI) networks ('interactomes'). While most PPIs are mediated by a single sequence region promoting or inhibiting interactions, many PPIs are mediated by multiple sequence regions acting cooperatively. Most PPIs perform important functions maintained by negative selection: we estimate that less than ∼10% of the human interactome is effectively neutral upon perturbation (i.e. 'junk' PPIs), and the rest are deleterious upon perturbation; interfacial sites evolve more slowly than other sites; many conserved PPIs show signatures of co-evolution at the interface; PPIs evolve more slowly than protein sequence. At the same time, many PPIs undergo rewiring during evolution for lineage-specific adaptation. Finally, chaperone-protein and host-pathogen interactomes are governed by distinct evolutionary principles.
Collapse
Affiliation(s)
- Mohamed A Ghadie
- Department of Bioengineering, McGill University, Montreal, Quebec H3C 0C3, Canada
| | - Jasmin Coulombe-Huntington
- Institute for Research in Immunology and Cancer, University of Montreal, Montreal, Quebec H3C 3J7, Canada
| | - Yu Xia
- Department of Bioengineering, McGill University, Montreal, Quebec H3C 0C3, Canada.
| |
Collapse
|
15
|
Genesis of the vertebrate FoxP subfamily member genes occurred during two ancestral whole genome duplication events. Gene 2016; 588:156-62. [DOI: 10.1016/j.gene.2016.05.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 05/02/2016] [Accepted: 05/12/2016] [Indexed: 12/20/2022]
|
16
|
Gallagher JP, Grover CE, Hu G, Wendel JF. Insights into the Ecology and Evolution of Polyploid Plants through Network Analysis. Mol Ecol 2016; 25:2644-60. [PMID: 27027619 DOI: 10.1111/mec.13626] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 03/09/2016] [Accepted: 03/22/2016] [Indexed: 12/18/2022]
Abstract
Polyploidy is a widespread phenomenon throughout eukaryotes, with important ecological and evolutionary consequences. Although genes operate as components of complex pathways and networks, polyploid changes in genes and gene expression have typically been evaluated as either individual genes or as a part of broad-scale analyses. Network analysis has been fruitful in associating genomic and other 'omic'-based changes with phenotype for many systems. In polyploid species, network analysis has the potential not only to facilitate a better understanding of the complex 'omic' underpinnings of phenotypic and ecological traits common to polyploidy, but also to provide novel insight into the interaction among duplicated genes and genomes. This adds perspective to the global patterns of expression (and other 'omic') change that accompany polyploidy and to the patterns of recruitment and/or loss of genes following polyploidization. While network analysis in polyploid species faces challenges common to other analyses of duplicated genomes, present technologies combined with thoughtful experimental design provide a powerful system to explore polyploid evolution. Here, we demonstrate the utility and potential of network analysis to questions pertaining to polyploidy with an example involving evolution of the transgressively superior cotton fibres found in polyploid Gossypium hirsutum. By combining network analysis with prior knowledge, we provide further insights into the role of profilins in fibre domestication and exemplify the potential for network analysis in polyploid species.
Collapse
Affiliation(s)
- Joseph P Gallagher
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Corrinne E Grover
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Guanjing Hu
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Jonathan F Wendel
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
17
|
Proost S, Mutwil M. Tools of the trade: studying molecular networks in plants. CURRENT OPINION IN PLANT BIOLOGY 2016; 30:143-150. [PMID: 26990519 DOI: 10.1016/j.pbi.2016.02.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Revised: 02/23/2016] [Accepted: 02/29/2016] [Indexed: 06/05/2023]
Abstract
Driven by recent technological improvements, genes can be now studied in a larger biological context. Genes and their protein products rarely operate as a single entity and large-scale mapping by protein-protein interactions can unveil the molecular complexes that form in the cell to carry out various functions. Expression analysis under multiple conditions, supplemented with protein-DNA binding data can highlight when genes are active and how they are regulated. Representing these data in networks and finding strongly connected sub-graphs has proven to be a powerful tool to predict the function of unknown genes. As such networks are gradually becoming available for various plant species, it becomes possible to study how networks evolve. This review summarizes currently available network data and related tools for plants. Furthermore we aim to provide an outlook of future analyses that can be done in plants based on work done in other fields.
Collapse
Affiliation(s)
- Sebastian Proost
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Marek Mutwil
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| |
Collapse
|
18
|
Ames RM, Talavera D, Williams SG, Robertson DL, Lovell SC. Binding interface change and cryptic variation in the evolution of protein-protein interactions. BMC Evol Biol 2016; 16:40. [PMID: 26892785 PMCID: PMC4758157 DOI: 10.1186/s12862-016-0608-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 02/02/2016] [Indexed: 12/03/2022] Open
Abstract
Background Physical interactions between proteins are essential for almost all biological functions and systems. To understand the evolution of function it is therefore important to understand the evolution of molecular interactions. Of key importance is the evolution of binding specificity, the set of interactions made by a protein, since change in specificity can lead to “rewiring” of interaction networks. Unfortunately, the interfaces through which proteins interact are complex, typically containing many amino-acid residues that collectively must contribute to binding specificity as well as binding affinity, structural integrity of the interface and solubility in the unbound state. Results In order to study the relationship between interface composition and binding specificity, we make use of paralogous pairs of yeast proteins. Immediately after duplication these paralogues will have identical sequences and protein products that make an identical set of interactions. As the sequences diverge, we can correlate amino-acid change in the interface with any change in the specificity of binding. We show that change in interface regions correlates only weakly with change in specificity, and many variants in interfaces are functionally equivalent. We show that many of the residue replacements within interfaces are silent with respect to their contribution to binding specificity. Conclusions We conclude that such functionally-equivalent change has the potential to contribute to evolutionary plasticity in interfaces by creating cryptic variation, which in turn may provide the raw material for functional innovation and coevolution. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0608-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ryan M Ames
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK. .,Current address: Wellcome Trust Centre for Biomedical Modelling and Analysis, University of Exeter, RILD Level 3, Exeter, EX2 5DW, UK.
| | - David Talavera
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK. .,Current address: Institute of Cardiovascular Sciences, Faculty of Medical and Human Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK.
| | - Simon G Williams
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK. .,Current address: Institute of Cardiovascular Sciences, Faculty of Medical and Human Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK.
| | - David L Robertson
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK.
| | - Simon C Lovell
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester, M13 9PT, UK.
| |
Collapse
|
19
|
Scienski K, Fay JC, Conant GC. Patterns of Gene Conversion in Duplicated Yeast Histones Suggest Strong Selection on a Coadapted Macromolecular Complex. Genome Biol Evol 2015; 7:3249-58. [PMID: 26560339 PMCID: PMC4700949 DOI: 10.1093/gbe/evv216] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
We find evidence for interlocus gene conversion in five duplicated histone genes from six yeast species. The sequences of these duplicated genes, surviving from the ancient genome duplication, show phylogenetic patterns inconsistent with the well-resolved orthology relationships inferred from a likelihood model of gene loss after the genome duplication. Instead, these paralogous genes are more closely related to each other than any is to its nearest ortholog. In addition to simulations supporting gene conversion, we also present evidence for elevated rates of radical amino acid substitutions along the branches implicated in the conversion events. As these patterns are similar to those seen in ribosomal proteins that have undergone gene conversion, we speculate that in cases where duplicated genes code for proteins that are a part of tightly interacting complexes, selection may favor the fixation of gene conversion events in order to maintain high protein identities between duplicated copies.
Collapse
Affiliation(s)
- Kathy Scienski
- Division of Animal Sciences, University of Missouri, Columbia Present address: Genetics Graduate Program, Texas A&M University, College Station, TX
| | - Justin C Fay
- Department of Genetics, Washington University Center for Genome Sciences and Systems Biology, Washington University
| | - Gavin C Conant
- Division of Animal Sciences, University of Missouri, Columbia Informatics Institute, University of Missouri, Columbia
| |
Collapse
|
20
|
The barber's pole worm CAP protein superfamily--A basis for fundamental discovery and biotechnology advances. Biotechnol Adv 2015; 33:1744-54. [PMID: 26239368 DOI: 10.1016/j.biotechadv.2015.07.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 07/02/2015] [Accepted: 07/11/2015] [Indexed: 01/22/2023]
Abstract
Parasitic worm proteins that belong to the cysteine-rich secretory proteins, antigen 5 and pathogenesis-related 1 (CAP) superfamily are proposed to play key roles in the infection process and the modulation of immune responses in host animals. However, there is limited information on these proteins for most socio-economically important worms. Here, we review the CAP protein superfamily of Haemonchus contortus (barber's pole worm), a highly significant parasitic roundworm (order Strongylida) of small ruminants. To do this, we mined genome and transcriptomic datasets, predicted and curated full-length amino acid sequences (n=45), undertook systematic phylogenetic analyses of these data and investigated transcription throughout the life cycle of H. contortus. We inferred functions for selected Caenorhabditis elegans orthologs (including vap-1, vap-2, scl-5 and lon-1) based on genetic networking and by integrating data and published information, and were able to infer that a subset of orthologs and their interaction partners play pivotal roles in growth and development via the insulin-like and/or the TGF-beta signalling pathways. The identification of the important and conserved growth regulator LON-1 led us to appraise the three-dimensional structure of this CAP protein by comparative modelling. This model revealed the presence of different topological moieties on the canonical fold of the CAP domain, which coincide with an overall charge separation as indicated by the electrostatic surface potential map. These observations suggest the existence of separate sites for effector binding and receptor interactions, and thus support the proposal that these worm molecules act in similar ways as venoms act as ligands for chemokine receptors or G protein-coupled receptor effectors. In conclusion, this review should guide future molecular studies of these molecules, and could support the development of novel interventions against haemonchosis.
Collapse
|
21
|
Goncearenco A, Shaytan AK, Shoemaker BA, Panchenko AR. Structural Perspectives on the Evolutionary Expansion of Unique Protein-Protein Binding Sites. Biophys J 2015. [PMID: 26213149 DOI: 10.1016/j.bpj.2015.06.056] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Structures of protein complexes provide atomistic insights into protein interactions. Human proteins represent a quarter of all structures in the Protein Data Bank; however, available protein complexes cover less than 10% of the human proteome. Although it is theoretically possible to infer interactions in human proteins based on structures of homologous protein complexes, it is still unclear to what extent protein interactions and binding sites are conserved, and whether protein complexes from remotely related species can be used to infer interactions and binding sites. We considered biological units of protein complexes and clustered protein-protein binding sites into similarity groups based on their structure and sequence, which allowed us to identify unique binding sites. We showed that the growth rate of the number of unique binding sites in the Protein Data Bank was much slower than the growth rate of the number of structural complexes. Next, we investigated the evolutionary roots of unique binding sites and identified the major phyletic branches with the largest expansion in the number of novel binding sites. We found that many binding sites could be traced to the universal common ancestor of all cellular organisms, whereas relatively few binding sites emerged at the major evolutionary branching points. We analyzed the physicochemical properties of unique binding sites and found that the most ancient sites were the largest in size, involved many salt bridges, and were the most compact and least planar. In contrast, binding sites that appeared more recently in the evolution of eukaryotes were characterized by a larger fraction of polar and aromatic residues, and were less compact and more planar, possibly due to their more transient nature and roles in signaling processes.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Alexey K Shaytan
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Benjamin A Shoemaker
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Anna R Panchenko
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland.
| |
Collapse
|
22
|
Mulder NJ, Akinola RO, Mazandu GK, Rapanoel H. Using biological networks to improve our understanding of infectious diseases. Comput Struct Biotechnol J 2014; 11:1-10. [PMID: 25379138 PMCID: PMC4212278 DOI: 10.1016/j.csbj.2014.08.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Infectious diseases are the leading cause of death, particularly in developing countries. Although many drugs are available for treating the most common infectious diseases, in many cases the mechanism of action of these drugs or even their targets in the pathogen remain unknown. In addition, the key factors or processes in pathogens that facilitate infection and disease progression are often not well understood. Since proteins do not work in isolation, understanding biological systems requires a better understanding of the interconnectivity between proteins in different pathways and processes, which includes both physical and other functional interactions. Such biological networks can be generated within organisms or between organisms sharing a common environment using experimental data and computational predictions. Though different data sources provide different levels of accuracy, confidence in interactions can be measured using interaction scores. Connections between interacting proteins in biological networks can be represented as graphs and edges, and thus studied using existing algorithms and tools from graph theory. There are many different applications of biological networks, and here we discuss three such applications, specifically applied to the infectious disease tuberculosis, with its causative agent Mycobacterium tuberculosis and host, Homo sapiens. The applications include the use of the networks for function prediction, comparison of networks for evolutionary studies, and the generation and use of host–pathogen interaction networks.
Collapse
Affiliation(s)
- Nicola J Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Richard O Akinola
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Holifidy Rapanoel
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| |
Collapse
|
23
|
Conant GC. Comparative genomics as a time machine: how relative gene dosage and metabolic requirements shaped the time-dependent resolution of yeast polyploidy. Mol Biol Evol 2014; 31:3184-93. [PMID: 25158798 DOI: 10.1093/molbev/msu250] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Using a phylogenetic model of evolution after genome duplication (i.e., polyploidy) and 12 yeast genomes with a shared genome duplication, I show that the loss of duplicate genes after that duplication occurred in three phases. First, losses that occurred immediately after the event were biased toward genes functioning in DNA repair and organellar functions. Then, the main group of duplicate losses appear to have been shaped by a requirement to maintain balance in protein levels: There is a strong statistical association between the number of protein interactions a gene's product is involved in and its propensity to have remained in duplicate. Moreover, when duplicated genes with interactions were lost, it was more common than expected for both members of an interaction pair to have been lost on the same branch of the phylogeny. Finally, in the third phase of the resolution process, overretention of duplicated enzymes carrying high flux and of duplicated genes involved in transcriptional regulation became dominant. I speculate that initial retention of such genes by a requirement to maintain gene dosage set the stage for the later functional changes that then maintained these duplicates for long periods.
Collapse
Affiliation(s)
- Gavin C Conant
- Informatics Institute, University of Missouri, Columbia Division of Animal Sciences, University of Missouri, Columbia
| |
Collapse
|
24
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
25
|
Abstract
Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation.
Collapse
Affiliation(s)
- Wenfeng Qian
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA; Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
26
|
Zarin T, Moses AM. Insights into molecular evolution from yeast genomics. Yeast 2014; 31:233-41. [PMID: 24760744 DOI: 10.1002/yea.3018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 04/09/2014] [Accepted: 04/10/2014] [Indexed: 12/13/2022] Open
Abstract
Enabled by comparative genomics, yeasts have increasingly developed into a powerful model system for molecular evolution. Here we survey several areas in which yeast studies have made important contributions, including regulatory evolution, gene duplication and divergence, evolution of gene order and evolution of complexity. In each area we highlight key studies and findings based on techniques ranging from statistical analysis of large datasets to direct laboratory measurements of fitness. Future work will combine traditional evolutionary genetics analysis and experimental evolution with tools from systems biology to yield mechanistic insight into complex phenotypes.
Collapse
Affiliation(s)
- Taraneh Zarin
- Department of Cell and Systems Biology, University of Toronto, ON, Canada
| | | |
Collapse
|
27
|
Abrusán G. Integration of new genes into cellular networks, and their structural maturation. Genetics 2013; 195:1407-17. [PMID: 24056411 PMCID: PMC3832282 DOI: 10.1534/genetics.113.152256] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 08/27/2013] [Indexed: 12/21/2022] Open
Abstract
It has been recently discovered that new genes can originate de novo from noncoding DNA, and several biological traits including expression or sequence composition form a continuum from noncoding sequences to conserved genes. In this article, using yeast genes I test whether the integration of new genes into cellular networks and their structural maturation shows such a continuum by analyzing their changes with gene age. I show that 1) The number of regulatory, protein-protein, and genetic interactions increases continuously with gene age, although with very different rates. New regulatory interactions emerge rapidly within a few million years, while the number of protein-protein and genetic interactions increases slowly, with a rate of 2-2.25 × 10(-8)/year and 4.8 × 10(-8)/year, respectively. 2) Gene essentiality evolves relatively quickly: the youngest essential genes appear in proto-genes ∼14 MY old. 3) In contrast to interactions, the secondary structure of proteins and their robustness to mutations indicate that new genes face a bottleneck in their evolution: proto-genes are characterized by high β-strand content, high aggregation propensity, and low robustness against mutations, while conserved genes are characterized by lower strand content and higher stability, most likely due to the higher probability of gene loss among young genes and accumulation of neutral mutations.
Collapse
Affiliation(s)
- György Abrusán
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged H-6701, Hungary
| |
Collapse
|
28
|
Zhu Y, Lin Z, Nakhleh L. Evolution after whole-genome duplication: a network perspective. G3 (BETHESDA, MD.) 2013; 3:2049-57. [PMID: 24048644 PMCID: PMC3815064 DOI: 10.1534/g3.113.008458] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2013] [Accepted: 09/10/2013] [Indexed: 01/31/2023]
Abstract
Gene duplication plays an important role in the evolution of genomes and interactomes. Elucidating how evolution after gene duplication interplays at the sequence and network level is of great interest. In this work, we analyze a data set of gene pairs that arose through whole-genome duplication (WGD) in yeast. All these pairs have the same duplication time, making them ideal for evolutionary investigation. We investigated the interplay between evolution after WGD at the sequence and network levels and correlated these two levels of divergence with gene expression and fitness data. We find that molecular interactions involving WGD genes evolve at rates that are three orders of magnitude slower than the rates of evolution of the corresponding sequences. Furthermore, we find that divergence of WGD pairs correlates strongly with gene expression and fitness data. Because of the role of gene duplication in determining redundancy in biological systems and particularly at the network level, we investigated the role of interaction networks in elucidating the evolutionary fate of duplicated genes. We find that gene neighborhoods in interaction networks provide a mechanism for inferring these fates, and we developed an algorithm for achieving this task. Further epistasis analysis of WGD pairs categorized by their inferred evolutionary fates demonstrated the utility of these techniques. Finally, we find that WGD pairs and other pairs of paralogous genes of small-scale duplication origin share similar properties, giving good support for generalizing our results from WGD pairs to evolution after gene duplication in general.
Collapse
Affiliation(s)
- Yun Zhu
- Department of Computer Science, Rice University, Houston, Texas 77005
| | - Zhenguo Lin
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas 77005
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas 77005
| |
Collapse
|
29
|
Daly TK, Sutherland-Smith AJ, Penny D. Beyond BLASTing: tertiary and quaternary structure analysis helps identify major vault proteins. Genome Biol Evol 2013; 5:217-32. [PMID: 23275487 PMCID: PMC3595041 DOI: 10.1093/gbe/evs135] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We examine the advantages of going beyond sequence similarity and use both protein three-dimensional (3D) structure prediction and then quaternary structure (docking) of inferred 3D structures to help evaluate whether comparable sequences can fold into homologous structures with sufficient lateral associations for quaternary structure formation. Our test case is the major vault protein (MVP) that oligomerizes in multiple copies to form barrel-like vault particles and is relatively widespread among eukaryotes. We used the iterative threading assembly refinement server (I-TASSER) to predict whether putative MVP sequences identified by BLASTp and PSI Basic Local Alignment Search Tool are structurally similar to the experimentally determined rodent MVP tertiary structures. Then two identical predicted quaternary structures from I-TASSER are analyzed by RosettaDock to test whether a pair-wise association occurs, and hence whether the oligomeric vault complex is likely to form for a given MVP sequence. Positive controls for the method are the experimentally determined rat (Rattus norvegicus) vault X-ray crystal structure and the purple sea urchin (Strongylocentrotus purpuratus) MVP sequence that forms experimentally observed vaults. These and two kinetoplast MVP structural homologs were predicted with high confidence value, and RosettaDock predicted that these MVP sequences would dock laterally and therefore could form oligomeric vaults. As the negative control, I-TASSER did not predict an MVP-like structure from a randomized rat MVP sequence, even when constrained to the rat MVP crystal structure (PDB:2ZUO), thus further validating the method. The protocol identified six putative homologous MVP sequences in the heterobolosean Naegleria gruberi within the excavate kingdom. Two of these sequences are predicted to be structurally similar to rat MVP, despite being in excess of 300 residues shorter. The method can be used generally to help test predictions of homology via structural analysis.
Collapse
Affiliation(s)
- Toni K Daly
- Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand.
| | | | | |
Collapse
|
30
|
Reinke AW, Baek J, Ashenberg O, Keating AE. Networks of bZIP protein-protein interactions diversified over a billion years of evolution. Science 2013; 340:730-4. [PMID: 23661758 DOI: 10.1126/science.1233465] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Differences in biomolecular sequence and function underlie dramatic ranges of appearance and behavior among species. We studied the basic region-leucine zipper (bZIP) transcription factors and quantified bZIP dimerization networks for five metazoan and two single-cell species, measuring interactions in vitro for 2891 protein pairs. Metazoans have a higher proportion of heteromeric bZIP interactions and more network complexity than the single-cell species. The metazoan bZIP interactomes have broadly similar structures, but there has been extensive rewiring of connections compared to the last common ancestor, and each species network is highly distinct. Many metazoan bZIP orthologs and paralogs have strikingly different interaction specificities, and some differences arise from minor sequence changes. Our data show that a shifting landscape of biochemical functions related to signaling and gene expression contributes to species diversity.
Collapse
Affiliation(s)
- Aaron W Reinke
- Massachusetts Institute of Technology, Department of Biology, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|
31
|
Jin Y, Turaev D, Weinmaier T, Rattei T, Makse HA. The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks. PLoS One 2013; 8:e58134. [PMID: 23526967 PMCID: PMC3603955 DOI: 10.1371/journal.pone.0058134] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 01/30/2013] [Indexed: 11/18/2022] Open
Abstract
Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. A number of theoretical models have been developed to explain both the network formation and the current structure. Favored are models based on duplication and divergence of genes, as they most closely represent the biological foundation of network evolution. However, studies are often based on simulated instead of empirical data or they cover only single organisms. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics.
Collapse
Affiliation(s)
- Yuliang Jin
- Levich Institute and Physics Department, City College of New York, New York, New York, United States of America
| | - Dmitrij Turaev
- Department of Computational Systems Biology, University of Vienna, Vienna, Austria
| | - Thomas Weinmaier
- Department of Computational Systems Biology, University of Vienna, Vienna, Austria
| | - Thomas Rattei
- Department of Computational Systems Biology, University of Vienna, Vienna, Austria
| | - Hernán A. Makse
- Levich Institute and Physics Department, City College of New York, New York, New York, United States of America
| |
Collapse
|
32
|
Watching the grin fade: tracing the effects of polyploidy on different evolutionary time scales. Semin Cell Dev Biol 2013; 24:320-31. [PMID: 23466286 DOI: 10.1016/j.semcdb.2013.02.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 02/05/2013] [Accepted: 02/07/2013] [Indexed: 12/13/2022]
Abstract
Polyploidy, or whole-genome duplication (WGD), is a recurrent mutation both in cell lineages and over evolutionary time. By globally changing the relationship between gene copy number and other cellular entities, it can induce dramatic changes at the cellular and phenotypic level. Perhaps surprisingly, then, the insights that these events can bring to understanding other cellular features are not as well appreciated as they could be. In this review, we draw on examples of polyploidy from animals, plants and yeast to explore how investigations of polyploid cells have improved our understanding of the cell cycle, biological network complexity, metabolic phenotypes and tumor biology. We argue that the study of polyploidy across organisms, cell types, and time scales serves not only as a window into basic cell biology, but also as a basis for a predictive biology with applications ranging from crop improvement to treating cancer.
Collapse
|
33
|
Abrusán G, Szilágyi A, Zhang Y, Papp B. Turning gold into 'junk': transposable elements utilize central proteins of cellular networks. Nucleic Acids Res 2013; 41:3190-200. [PMID: 23341038 PMCID: PMC3597677 DOI: 10.1093/nar/gkt011] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The numerous discovered cases of domesticated transposable element (TE) proteins led to the recognition that TEs are a significant source of evolutionary innovation. However, much less is known about the reverse process, whether and to what degree the evolution of TEs is influenced by the genome of their hosts. We addressed this issue by searching for cases of incorporation of host genes into the sequence of TEs and examined the systems-level properties of these genes using the Saccharomyces cerevisiae and Drosophila melanogaster genomes. We identified 51 cases where the evolutionary scenario was the incorporation of a host gene fragment into a TE consensus sequence, and we show that both the yeast and fly homologues of the incorporated protein sequences have central positions in the cellular networks. An analysis of selective pressure (Ka/Ks ratio) detected significant selection in 37% of the cases. Recent research on retrovirus-host interactions shows that virus proteins preferentially target hubs of the host interaction networks enabling them to take over the host cell using only a few proteins. We propose that TEs face a similar evolutionary pressure to evolve proteins with high interacting capacities and take some of the necessary protein domains directly from their hosts.
Collapse
Affiliation(s)
- György Abrusán
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Center of the Hungarian Academy of Sciences, Temesváry krt. 62. Szeged H-6701, Hungary.
| | | | | | | |
Collapse
|
34
|
Pérez-Bercoff Å, Hudson CM, Conant GC. A conserved mammalian protein interaction network. PLoS One 2013; 8:e52581. [PMID: 23320073 PMCID: PMC3539715 DOI: 10.1371/journal.pone.0052581] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 11/20/2012] [Indexed: 11/19/2022] Open
Abstract
Physical interactions between proteins mediate a variety of biological functions, including signal transduction, physical structuring of the cell and regulation. While extensive catalogs of such interactions are known from model organisms, their evolutionary histories are difficult to study given the lack of interaction data from phylogenetic outgroups. Using phylogenomic approaches, we infer a upper bound on the time of origin for a large set of human protein-protein interactions, showing that most such interactions appear relatively ancient, dating no later than the radiation of placental mammals. By analyzing paired alignments of orthologous and putatively interacting protein-coding genes from eight mammals, we find evidence for weak but significant co-evolution, as measured by relative selective constraint, between pairs of genes with interacting proteins. However, we find no strong evidence for shared instances of directional selection within an interacting pair. Finally, we use a network approach to show that the distribution of selective constraint across the protein interaction network is non-random, with a clear tendency for interacting proteins to share similar selective constraints. Collectively, the results suggest that, on the whole, protein interactions in mammals are under selective constraint, presumably due to their functional roles.
Collapse
Affiliation(s)
- Åsa Pérez-Bercoff
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
| | - Corey M. Hudson
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
| | - Gavin C. Conant
- Informatics Institute, University of Missouri, Columbia, Missouri, United States of America
- Division of Animal Sciences, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
35
|
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013; 41:D808-15. [PMID: 23203871 PMCID: PMC3531103 DOI: 10.1093/nar/gks1094] [Citation(s) in RCA: 3220] [Impact Index Per Article: 292.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Revised: 10/15/2012] [Accepted: 10/18/2012] [Indexed: 12/12/2022] Open
Abstract
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
Collapse
Affiliation(s)
- Andrea Franceschini
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Sune Frankild
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Michael Kuhn
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Milan Simonovic
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Alexander Roth
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Jianyi Lin
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Pablo Minguez
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Peer Bork
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| | - Lars J. Jensen
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Switzerland, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Denmark, Biotechnology Center, Technical University Dresden, Germany, Department of Computer Science, University of Milan, Italy, European Molecular Biology Laboratory, Heidelberg and Max-Delbrück-Centre for Molecular Medicine, Berlin, Germany
| |
Collapse
|
36
|
Leducq JB, Charron G, Diss G, Gagnon-Arsenault I, Dubé AK, Landry CR. Evidence for the robustness of protein complexes to inter-species hybridization. PLoS Genet 2012; 8:e1003161. [PMID: 23300466 PMCID: PMC3531474 DOI: 10.1371/journal.pgen.1003161] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Accepted: 10/26/2012] [Indexed: 01/11/2023] Open
Abstract
Despite the tremendous efforts devoted to the identification of genetic incompatibilities underlying hybrid sterility and inviability, little is known about the effect of inter-species hybridization at the protein interactome level. Here, we develop a screening platform for the comparison of protein-protein interactions (PPIs) among closely related species and their hybrids. We examine in vivo the architecture of protein complexes in two yeast species (Saccharomyces cerevisiae and Saccharomyces kudriavzevii) that diverged 5-20 million years ago and in their F1 hybrids. We focus on 24 proteins of two large complexes: the RNA polymerase II and the nuclear pore complex (NPC), which show contrasting patterns of molecular evolution. We found that, with the exception of one PPI in the NPC sub-complex, PPIs were highly conserved between species, regardless of protein divergence. Unexpectedly, we found that the architecture of the complexes in F1 hybrids could not be distinguished from that of the parental species. Our results suggest that the conservation of PPIs in hybrids likely results from the slow evolution taking place on the very few protein residues involved in the interaction or that protein complexes are inherently robust and may accommodate protein divergence up to the level that is observed among closely related species.
Collapse
Affiliation(s)
- Jean-Baptiste Leducq
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Pavillon Charles-Eugène-Marchand, Université Laval, Québec City, Canada
| | | | | | | | | | | |
Collapse
|
37
|
Makino T, McLysaght A. Positionally biased gene loss after whole genome duplication: evidence from human, yeast, and plant. Genome Res 2012; 22:2427-35. [PMID: 22835904 PMCID: PMC3514672 DOI: 10.1101/gr.131953.111] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Accepted: 07/20/2012] [Indexed: 01/23/2023]
Abstract
Whole genome duplication (WGD) has made a significant contribution to many eukaryotic genomes including yeast, plants, and vertebrates. Following WGD, some ohnologs (WGD paralogs) remain in the genome arranged in blocks of conserved gene order and content (paralogons). However, the most common outcome is loss of one of the ohnolog pair. It is unclear what factors, if any, govern gene loss from paralogons. Recent studies have reported physical clustering (genetic linkage) of functionally linked (interacting) genes in the human genome and propose a biological significance for the clustering of interacting genes such as coexpression or preservation of epistatic interactions. Here we conduct a novel test of a hypothesis that functionally linked genes in the same paralogon are preferentially retained in cis after WGD. We compare the number of protein-protein interactions (PPIs) between linked singletons within a paralogon (defined as cis-PPIs) with that of PPIs between singletons across paralogon pairs (defined as trans-PPIs). We find that paralogons in which the number of cis-PPIs is greater than that of trans-PPIs are significantly enriched in human and yeast. The trend is similar in plants, but it is difficult to assess statistical significance due to multiple, overlapping WGD events. Interestingly, human singletons participating in cis-PPIs tend to be classified into "response to stimulus." We uncover strong evidence of biased gene loss after WGD, which further supports the hypothesis of biologically significant gene clusters in eukaryotic genomes. These observations give us new insight for understanding the evolution of genome structure and of protein interaction networks.
Collapse
Affiliation(s)
- Takashi Makino
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
- Department of Ecology and Evolutionary Biology, Graduate School of Life Sciences, Tohoku University, Sendai 980-8578, Japan
| | - Aoife McLysaght
- Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| |
Collapse
|
38
|
The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. PLoS Comput Biol 2012; 8:e1002784. [PMID: 23209392 PMCID: PMC3510086 DOI: 10.1371/journal.pcbi.1002784] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 10/02/2012] [Indexed: 11/19/2022] Open
Abstract
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.
Collapse
|
39
|
Lewis ACF, Jones NS, Porter MA, Deane CM. What evidence is there for the homology of protein-protein interactions? PLoS Comput Biol 2012; 8:e1002645. [PMID: 23028270 PMCID: PMC3447968 DOI: 10.1371/journal.pcbi.1002645] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2011] [Accepted: 06/21/2012] [Indexed: 12/17/2022] Open
Abstract
The notion that sequence homology implies functional similarity underlies much of computational biology. In the case of protein-protein interactions, an interaction can be inferred between two proteins on the basis that sequence-similar proteins have been observed to interact. The use of transferred interactions is common, but the legitimacy of such inferred interactions is not clear. Here we investigate transferred interactions and whether data incompleteness explains the lack of evidence found for them. Using definitions of homology associated with functional annotation transfer, we estimate that conservation rates of interactions are low even after taking interactome incompleteness into account. For example, at a blastp E-value threshold of 10(-70), we estimate the conservation rate to be about 11 % between S. cerevisiae and H. sapiens. Our method also produces estimates of interactome sizes (which are similar to those previously proposed). Using our estimates of interaction conservation we estimate the rate at which protein-protein interactions are lost across species. To our knowledge, this is the first such study based on large-scale data. Previous work has suggested that interactions transferred within species are more reliable than interactions transferred across species. By controlling for factors that are specific to within-species interaction prediction, we propose that the transfer of interactions within species might be less reliable than transfers between species. Protein-protein interactions appear to be very rarely conserved unless very high sequence similarity is observed. Consequently, inferred interactions should be used with care.
Collapse
Affiliation(s)
- Anna C. F. Lewis
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Systems Biology Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| | - Nick S. Jones
- Department of Mathematics, Imperial College, London, United Kingdom
- Department of Physics, University of Oxford, Oxford, United Kingdom
- CABDyN Complexity Centre, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Integrative Systems Biology, University of Oxford, Oxford, United Kingdom
| | - Mason A. Porter
- CABDyN Complexity Centre, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Integrative Systems Biology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
40
|
Alternative splicing interference by xenobiotics. Toxicology 2012; 296:1-12. [DOI: 10.1016/j.tox.2012.01.014] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Revised: 01/21/2012] [Accepted: 01/23/2012] [Indexed: 12/21/2022]
|
41
|
Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A 2012; 109:E831-40. [PMID: 22416125 DOI: 10.1073/pnas.1117408109] [Citation(s) in RCA: 129] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The tempo and mode of protein evolution have been central questions in biology. Genomic data have shown a strong influence of the expression level of a protein on its rate of sequence evolution (E-R anticorrelation), which is currently explained by the protein misfolding avoidance hypothesis. Here, we show that this hypothesis does not fully explain the E-R anticorrelation, especially for protein surface residues. We propose that natural selection against protein-protein misinteraction, which wastes functional molecules and is potentially toxic, constrains the evolution of surface residues. Because highly expressed proteins are under stronger pressures to avoid misinteraction, surface residues are expected to show an E-R anticorrelation. Our molecular-level evolutionary simulation and yeast genomic analysis confirm multiple predictions of the hypothesis. These findings show a pluralistic origin of the E-R anticorrelation and reveal the role of protein misinteraction, an inherent property of complex cellular systems, in constraining protein evolution.
Collapse
|
42
|
Cantacessi C, Hofmann A, Young ND, Broder U, Hall RS, Loukas A, Gasser RB. Insights into SCP/TAPS proteins of liver flukes based on large-scale bioinformatic analyses of sequence datasets. PLoS One 2012; 7:e31164. [PMID: 22384000 PMCID: PMC3284463 DOI: 10.1371/journal.pone.0031164] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 01/03/2012] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND SCP/TAPS proteins of parasitic helminths have been proposed to play key roles in fundamental biological processes linked to the invasion of and establishment in their mammalian host animals, such as the transition from free-living to parasitic stages and the modulation of host immune responses. Despite the evidence that SCP/TAPS proteins of parasitic nematodes are involved in host-parasite interactions, there is a paucity of information on this protein family for parasitic trematodes of socio-economic importance. METHODOLOGY/PRINCIPAL FINDINGS We conducted the first large-scale study of SCP/TAPS proteins of a range of parasitic trematodes of both human and veterinary importance (including the liver flukes Clonorchis sinensis, Opisthorchis viverrini, Fasciola hepatica and F. gigantica as well as the blood flukes Schistosoma mansoni, S. japonicum and S. haematobium). We mined all current transcriptomic and/or genomic sequence datasets from public databases, predicted secondary structures of full-length protein sequences, undertook systematic phylogenetic analyses and investigated the differential transcription of SCP/TAPS genes in O. viverrini and F. hepatica, with an emphasis on those that are up-regulated in the developmental stages infecting the mammalian host. CONCLUSIONS This work, which sheds new light on SCP/TAPS proteins, guides future structural and functional explorations of key SCP/TAPS molecules associated with diseases caused by flatworms. Future fundamental investigations of these molecules in parasites and the integration of structural and functional data could lead to new approaches for the control of parasitic diseases.
Collapse
Affiliation(s)
- Cinzia Cantacessi
- Department of Veterinary Science, The University of Melbourne, Parkville, Victoria, Australia
| | - Andreas Hofmann
- Eskitis Institute for Cell and Molecular Therapies, Griffith University, Brisbane, Queensland, Australia
| | - Neil D. Young
- Department of Veterinary Science, The University of Melbourne, Parkville, Victoria, Australia
| | - Ursula Broder
- Eskitis Institute for Cell and Molecular Therapies, Griffith University, Brisbane, Queensland, Australia
| | - Ross S. Hall
- Department of Veterinary Science, The University of Melbourne, Parkville, Victoria, Australia
| | - Alex Loukas
- Queensland Tropical Health Alliance, James Cook University, Smithfield, Queensland, Australia
| | - Robin B. Gasser
- Department of Veterinary Science, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
43
|
Jiang L, Sørensen P, Thomsen B, Edwards SM, Skarman A, Røntved CM, Lund MS, Workman CT. Gene prioritization for livestock diseases by data integration. Physiol Genomics 2012; 44:305-17. [PMID: 22234994 DOI: 10.1152/physiolgenomics.00047.2011] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Identifying causal genes that underlie complex traits such as susceptibility to disease is a primary aim of genetic and biomedical studies. Genetic mapping of quantitative trait loci (QTL) and gene expression profiling based on high-throughput technologies are common first approaches toward identifying associations between genes and traits; however, it is often difficult to assess whether the biological function of a putative candidate gene is consistent with a particular phenotype. Here, we have implemented a network-based disease gene prioritization approach for ranking genes associated with quantitative traits and diseases in livestock species. The approach uses ortholog mapping and integrates information on disease or trait phenotypes, gene-associated phenotypes, and protein-protein interactions. It was used for ranking all known genes present in the cattle genome for their potential roles in bovine mastitis. Gene-associated phenome profile and transcriptome profile in response to Escherichia coli infection in the mammary gland were integrated to make a global inference of bovine genes involved in mastitis. The top ranked genes were highly enriched for pathways and biological processes underlying inflammation and immune responses, which supports the validity of our approach for identifying genes that are relevant to animal health and disease. These gene-associated phenotypes were used for a local prioritization of candidate genes located in a QTL affecting the susceptibility to mastitis. Our study provides a general framework for prioritizing genes associated with various complex traits in different species. To our knowledge this is the first time that gene expression, ortholog mapping, protein interactions, and biomedical text data have been integrated systematically for ranking candidate genes in any livestock species.
Collapse
Affiliation(s)
- Li Jiang
- Dept. of Molecular Biology and Genetics, Aarhus Univ., Blichers Allé 20, PO Box 50, DK-8830 Tjele, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Abstract
Gene duplication plays key roles in organismal evolution. Duplicate genes, if they survive, tend to diverge in regulatory and coding regions. Divergences in coding regions, especially those that can change the function of the gene, can be caused by amino acid-altering substitutions and/or alterations in exon-intron structure. Much has been learned about the mode, tempo, and consequences of nucleotide substitutions, yet relatively little is known about structural divergences. In this study, by analyzing 612 pairs of sibling paralogs from seven representative gene families and 300 pairs of one-to-one orthologs from different species, we investigated the occurrence and relative importance of structural divergences during the evolution of duplicate and nonduplicate genes. We found that structural divergences have been very prevalent in duplicate genes and, in many cases, have led to the generation of functionally distinct paralogs. Comparisons of the genomic sequences of these genes further indicated that the differences in exon-intron structure were actually accomplished by three main types of mechanisms (exon/intron gain/loss, exonization/pseudoexonization, and insertion/deletion), each of which contributed differently to structural divergence. Like nucleotide substitutions, insertion/deletion and exonization/pseudoexonization occurred more or less randomly, with the number of observable mutational events per gene pair being largely proportional to evolutionary time. Notably, however, compared with paralogs with similar evolutionary times, orthologs have accumulated significantly fewer structural changes, whereas the amounts of amino acid replacements accumulated did not show clear differences. This finding suggests that structural divergences have played a more important role during the evolution of duplicate than nonduplicate genes.
Collapse
|