1
|
Comparisons of sampling methods for assessing intra- and inter-accession genetic diversity in three rice species using genotyping by sequencing. Sci Rep 2020; 10:13995. [PMID: 32814806 PMCID: PMC7438528 DOI: 10.1038/s41598-020-70842-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 07/27/2020] [Indexed: 11/09/2022] Open
Abstract
To minimize the cost of sample preparation and genotyping, most genebank genomics studies in self-pollinating species are conducted on a single individual to represent an accession, which may be heterogeneous with larger than expected intra-accession genetic variation. Here, we compared various population genetics parameters among six DNA (leaf) sampling methods on 90 accessions representing a wild species (O. barthii), cultivated and landraces (O. glaberrima, O. sativa), and improved varieties derived through interspecific hybridizations. A total of 1,527 DNA samples were genotyped with 46,818 polymorphic single nucleotide polymorphisms (SNPs) using DArTseq. Various statistical analyses were performed on eleven datasets corresponding to 5 plants per accession individually and in a bulk (two sets), 10 plants individually and in a bulk (two sets), all 15 plants individually (one set), and a randomly sampled individual repeated six times (six sets). Overall, we arrived at broadly similar conclusions across 11 datasets in terms of SNP polymorphism, heterozygosity/heterogeneity, diversity indices, concordance among genetic dissimilarity matrices, population structure, and genetic differentiation; there were, however, a few discrepancies between some pairs of datasets. Detailed results of each sampling method, the concordance in their outputs, and the technical and cost implications of each method were discussed.
Collapse
|
2
|
Igolkina AA, Bazykin GA, Chizhevskaya EP, Provorov NA, Andronov EE. Matching population diversity of rhizobial nodA and legume NFR5 genes in plant-microbe symbiosis. Ecol Evol 2019; 9:10377-10386. [PMID: 31624556 PMCID: PMC6787799 DOI: 10.1002/ece3.5556] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Revised: 07/07/2019] [Accepted: 07/15/2019] [Indexed: 12/31/2022] Open
Abstract
We hypothesized that population diversities of partners in nitrogen-fixing rhizobium-legume symbiosis can be matched for "interplaying" genes. We tested this hypothesis using data on nucleotide polymorphism of symbiotic genes encoding two components of the plant-bacteria signaling system: (a) the rhizobial nodA acyltransferase involved in the fatty acid tail decoration of the Nod factor (signaling molecule); (b) the plant NFR5 receptor required for Nod factor binding. We collected three wild-growing legume species together with soil samples adjacent to the roots from one large 25-year fallow: Vicia sativa, Lathyrus pratensis, and Trifolium hybridum nodulated by one of the two Rhizobium leguminosarum biovars (viciae and trifolii). For each plant species, we prepared three pools for DNA extraction and further sequencing: the plant pool (30 plant indiv.), the nodule pool (90 nodules), and the soil pool (30 samples). We observed the following statistically significant conclusions: (a) a monotonic relationship between the diversity in the plant NFR5 gene pools and the nodule rhizobial nodA gene pools; (b) higher topological similarity of the NFR5 gene tree with the nodA gene tree of the nodule pool, than with the nodA gene tree of the soil pool. Both nonsynonymous diversity and Tajima's D were increased in the nodule pools compared with the soil pools, consistent with relaxation of negative selection and/or admixture of balancing selection. We propose that the observed genetic concordance between NFR5 gene pools and nodule nodA gene pools arises from the selection of particular genotypes of the nodA gene by the host plant.
Collapse
Affiliation(s)
- Anna A. Igolkina
- ARRIAM, All‐Russia Research Institute for Agricultural MicrobiologyPushkinRussia
- Peter the Great St. Petersburg Polytechnic UniversitySaint‐PetersburgRussia
| | - Georgii A. Bazykin
- Center for Life SciencesSkolkovo Institute of Science and TechnologyMoscowRussia
- Laboratory for Molecular EvolutionKharkevich Institute of Information Transmission Problems of the Russian Academy of SciencesMoscowRussia
| | | | - Nikolai A. Provorov
- ARRIAM, All‐Russia Research Institute for Agricultural MicrobiologyPushkinRussia
| | - Evgeny E. Andronov
- ARRIAM, All‐Russia Research Institute for Agricultural MicrobiologyPushkinRussia
- Saint‐Petersburg State UniversitySaint‐PetersburgRussia
- Dokuchaev Soil Science InstituteMoscowRussia
| |
Collapse
|
3
|
Hutchinson MC, Cagua EF, Balbuena JA, Stouffer DB, Poisot T. paco: implementing Procrustean Approach to Cophylogeny in R. Methods Ecol Evol 2017. [DOI: 10.1111/2041-210x.12736] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Matthew C. Hutchinson
- Department of Ecology and Evolutionary Biology Princeton University 106A Guyot Hall Princeton NJ 08544 USA
- Centre for Integrative Ecology, School of Biological Sciences University of Canterbury Private Bag 4800 Christchurch New Zealand
| | - E. Fernando Cagua
- Centre for Integrative Ecology, School of Biological Sciences University of Canterbury Private Bag 4800 Christchurch New Zealand
| | - Juan A. Balbuena
- Cavanilles Institute of Biodiversity and Evolutionary Biology University of Valencia 2 Professor José Beltrán Martínez StreetPaterna Valencia 46980 Spain
| | - Daniel B. Stouffer
- Centre for Integrative Ecology, School of Biological Sciences University of Canterbury Private Bag 4800 Christchurch New Zealand
| | - Timothée Poisot
- Department of Biological Sciences, University of Montréal Pavillon Marie‐Victorin 90 Vincent‐d’Indy Avenue Montréal QC H2V 2S9 Canada
| |
Collapse
|
4
|
Ochoa D, Pazos F. Practical aspects of protein co-evolution. Front Cell Dev Biol 2014; 2:14. [PMID: 25364721 PMCID: PMC4207036 DOI: 10.3389/fcell.2014.00014] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 04/02/2014] [Indexed: 11/15/2022] Open
Abstract
Co-evolution is a fundamental aspect of Evolutionary Theory. At the molecular level, co-evolutionary linkages between protein families have been used as indicators of protein interactions and functional relationships from long ago. Due to the complexity of the problem and the amount of genomic data required for these approaches to achieve good performances, it took a relatively long time from the appearance of the first ideas and concepts to the quotidian application of these approaches and their incorporation to the standard toolboxes of bioinformaticians and molecular biologists. Today, these methodologies are mature (both in terms of performance and usability/implementation), and the genomic information that feeds them large enough to allow their general application. This review tries to summarize the current landscape of co-evolution-based methodologies, with a strong emphasis on describing interesting cases where their application to important biological systems, alone or in combination with other computational and experimental approaches, allowed getting new insight into these.
Collapse
Affiliation(s)
- David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) Hinxton, UK
| | - Florencio Pazos
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC) Madrid, Spain
| |
Collapse
|
5
|
Ghyselinck J, Coorevits A, Van Landschoot A, Samyn E, Heylen K, De Vos P. An rpoD gene sequence based evaluation of cultured Pseudomonas diversity on different growth media. Microbiology (Reading) 2013; 159:2097-2108. [DOI: 10.1099/mic.0.068031-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Jonas Ghyselinck
- Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K.L. Ledeganckstraat 35, Gent B-9000, Belgium
| | - An Coorevits
- Faculty of Bioscience Engineering, Ghent University, Campus Schoonmeersen, Valentin Vaerwyckweg 1, Gent B-9000, Belgium
- Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K.L. Ledeganckstraat 35, Gent B-9000, Belgium
| | - Anita Van Landschoot
- Faculty of Bioscience Engineering, Ghent University, Campus Schoonmeersen, Valentin Vaerwyckweg 1, Gent B-9000, Belgium
| | - Emly Samyn
- Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K.L. Ledeganckstraat 35, Gent B-9000, Belgium
| | - Kim Heylen
- Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K.L. Ledeganckstraat 35, Gent B-9000, Belgium
| | - Paul De Vos
- BCCM/LMG Bacteria Collection, K.L. Ledeganckstraat 35, Gent B-9000, Belgium
- Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, K.L. Ledeganckstraat 35, Gent B-9000, Belgium
| |
Collapse
|
6
|
The effect of primer choice and short read sequences on the outcome of 16S rRNA gene based diversity studies. PLoS One 2013; 8:e71360. [PMID: 23977026 PMCID: PMC3747265 DOI: 10.1371/journal.pone.0071360] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/30/2013] [Indexed: 11/19/2022] Open
Abstract
Different regions of the bacterial 16S rRNA gene evolve at different evolutionary rates. The scientific outcome of short read sequencing studies therefore alters with the gene region sequenced. We wanted to gain insight in the impact of primer choice on the outcome of short read sequencing efforts. All the unknowns associated with sequencing data, i.e. primer coverage rate, phylogeny, OTU-richness and taxonomic assignment, were therefore implemented in one study for ten well established universal primers (338f/r, 518f/r, 799f/r, 926f/r and 1062f/r) targeting dispersed regions of the bacterial 16S rRNA gene. All analyses were performed on nearly full length and in silico generated short read sequence libraries containing 1175 sequences that were carefully chosen as to present a representative substitute of the SILVA SSU database. The 518f and 799r primers, targeting the V4 region of the 16S rRNA gene, were found to be particularly suited for short read sequencing studies, while the primer 1062r, targeting V6, seemed to be least reliable. Our results will assist scientists in considering whether the best option for their study is to select the most informative primer, or the primer that excludes interferences by host-organelle DNA. The methodology followed can be extrapolated to other primers, allowing their evaluation prior to the experiment.
Collapse
|
7
|
Balbuena JA, Míguez-Lozano R, Blasco-Costa I. PACo: a novel procrustes application to cophylogenetic analysis. PLoS One 2013; 8:e61048. [PMID: 23580325 PMCID: PMC3620278 DOI: 10.1371/journal.pone.0061048] [Citation(s) in RCA: 170] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2012] [Accepted: 03/05/2013] [Indexed: 11/19/2022] Open
Abstract
We present Procrustean Approach to Cophylogeny (PACo), a novel statistical tool to test for congruence between phylogenetic trees, or between phylogenetic distance matrices of associated taxa. Unlike previous tests, PACo evaluates the dependence of one phylogeny upon the other. This makes it especially appropriate to test the classical coevolutionary model that assumes that parasites that spend part of their life in or on their hosts track the phylogeny of their hosts. The new method does not require fully resolved phylogenies and allows for multiple host-parasite associations. PACo produces a Procrustes superimposition plot enabling a graphical assessment of the fit of the parasite phylogeny onto the host phylogeny and a goodness-of-fit statistic, whose significance is established by randomization of the host-parasite association data. The contribution of each individual host-parasite association to the global fit is measured by means of jackknife estimation of their respective squared residuals and confidence intervals associated to each host-parasite link. We carried out different simulations to evaluate the performance of PACo in terms of Type I and Type II errors with respect to two similar published tests. In most instances, PACo performed at least as well as the other tests and showed higher overall statistical power. In addition, the jackknife estimation of squared residuals enabled more elaborate validations about the nature of individual links than the ParaFitLink1 test of the program ParaFit. In order to demonstrate how it can be used in real biological situations, we applied PACo to two published studies using a script written in the public-domain statistical software R.
Collapse
Affiliation(s)
- Juan Antonio Balbuena
- Cavanilles Institute of Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain.
| | | | | |
Collapse
|
8
|
Abstract
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Collapse
Affiliation(s)
- David de Juan
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | |
Collapse
|
9
|
Muley VY, Ranjan A. Evaluation of physical and functional protein-protein interaction prediction methods for detecting biological pathways. PLoS One 2013; 8:e54325. [PMID: 23349851 PMCID: PMC3547882 DOI: 10.1371/journal.pone.0054325] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 12/11/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Cellular activities are governed by the physical and the functional interactions among several proteins involved in various biological pathways. With the availability of sequenced genomes and high-throughput experimental data one can identify genome-wide protein-protein interactions using various computational techniques. Comparative assessments of these techniques in predicting protein interactions have been frequently reported in the literature but not their ability to elucidate a particular biological pathway. METHODS Towards the goal of understanding the prediction capabilities of interactions among the specific biological pathway proteins, we report the analyses of 14 biological pathways of Escherichia coli catalogued in KEGG database using five protein-protein functional linkage prediction methods. These methods are phylogenetic profiling, gene neighborhood, co-presence of orthologous genes in the same gene clusters, a mirrortree variant, and expression similarity. CONCLUSIONS Our results reveal that the prediction of metabolic pathway protein interactions continues to be a challenging task for all methods which possibly reflect flexible/independent evolutionary histories of these proteins. These methods have predicted functional associations of proteins involved in amino acids, nucleotide, glycans and vitamins & co-factors pathways slightly better than the random performance on carbohydrate, lipid and energy metabolism. We also make similar observations for interactions involved among the environmental information processing proteins. On the contrary, genetic information processing or specialized processes such as motility related protein-protein linkages that occur in the subset of organisms are predicted with comparable accuracy. Metabolic pathways are best predicted by using neighborhood of orthologous genes whereas phyletic pattern is good enough to reconstruct central dogma pathway protein interactions. We have also shown that the effective use of a particular prediction method depends on the pathway under investigation. In case one is not focused on specific pathway, gene expression similarity method is the best option.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India
- * E-mail:
| |
Collapse
|
10
|
Muley VY, Ranjan A. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction. PLoS One 2012; 7:e42057. [PMID: 22844541 PMCID: PMC3406042 DOI: 10.1371/journal.pone.0042057] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2011] [Accepted: 07/02/2012] [Indexed: 12/20/2022] Open
Abstract
Background Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions Higher performance for predicting protein-protein interactions was achievable even with 100–150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50–100 genomes for comparable accuracy of predictions when computational resources are limited.
Collapse
Affiliation(s)
- Vijaykumar Yogesh Muley
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- Department of Biotechnology, Dr. Babasaheb Ambedkar Marathwada University, Sub-centre, Osmanabad, Maharashtra, India
| | - Akash Ranjan
- Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, Andhra Pradesh, India
- * E-mail:
| |
Collapse
|
11
|
Gao H, Dou Y, Yang J, Wang J. New methods to measure residues coevolution in proteins. BMC Bioinformatics 2011; 12:206. [PMID: 21612664 PMCID: PMC3123609 DOI: 10.1186/1471-2105-12-206] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 05/26/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The covariation of two sites in a protein is often used as the degree of their coevolution. To quantify the covariation many methods have been developed and most of them are based on residues position-specific frequencies by using the mutual information (MI) model. RESULTS In the paper, we proposed several new measures to incorporate new biological constraints in quantifying the covariation. The first measure is the mutual information with the amino acid background distribution (MIB), which incorporates the amino acid background distribution into the marginal distribution of the MI model. The modification is made to remove the effect of amino acid evolutionary pressure in measuring covariation. The second measure is the mutual information of residues physicochemical properties (MIP), which is used to measure the covariation of physicochemical properties of two sites. The third measure called MIBP is proposed by applying residues physicochemical properties into the MIB model. Moreover, scores of our new measures are applied to a robust indicator conn(k) in finding the covariation signal of each site. CONCLUSIONS We find that incorporating amino acid background distribution is effective in removing the effect of evolutionary pressure of amino acids. Thus the MIB measure describes more biological background information for the coevolution of residues. Besides, our analysis also reveals that the covariation of physicochemical properties is a new aspect of coevolution information.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, People’s Republic of China
| | | | | | | |
Collapse
|