1
|
Manrique-Carpintero NC, Berdugo-Cely JA, Cerón-Souza I, Lasso-Paredes Z, Reyes-Herrera PH, Yockteng R. Defining a diverse core collection of the Colombian Central Collection of potatoes: a tool to advance research and breeding. Front Plant Sci 2023; 14:1046400. [PMID: 37180391 PMCID: PMC10173156 DOI: 10.3389/fpls.2023.1046400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 03/14/2023] [Indexed: 05/16/2023]
Abstract
The highly diverse Colombian Central Collection (CCC) of cultivated potatoes is the most important source of genetic variation for breeding and the agricultural development of this staple crop in Colombia. Potato is the primary source of income for more than 100.000 farming families in Colombia. However, biotic and abiotic challenges limit crop production. Furthermore, climate change, food security, and malnutrition constraints call for adaptive crop development to be urgently addressed. The clonal CCC of potatoes contains 1,255 accessions - an extensive collection size that limits its optimal assessment and use. Our study evaluated different collection sizes from the whole clonal collection to define the best core collection that captures the total genetic diversity of this unique collection, to support a characterization more cost-effectively. Initially, we genotyped 1,141 accessions from the clonal collection and 20 breeding lines using 3,586 genome-wide polymorphic markers to study CCC's genetic diversity. The analysis of molecular variance confirmed the CCC's diversity with a significant population structure (Phi=0.359; p-value=0.001). Three main genetic pools were identified within this collection (CCC_Group_A, CCC_Group_B1, and CCC_Group_B2), and the commercial varieties were located across the pools. The ploidy level was the main driver of pool identification, followed by a robust representation of accessions from Phureja and Andigenum cultivar groups based on former taxonomic classifications. We also found divergent heterozygosity values within genetic groups, with greater diversity in genetic groups with tetraploids (CCC_Group_B1: 0.37, and CCC_Group_B2: 0.53) than in diploid accessions (CCC_Group_A: 0.14). We subsequently generated one mini-core collection size of 3 percent (39 entries) and three further core collections sizes of 10, 15, and 20 percent (i.e., 129, 194, and 258 entries, respectively) from the total samples genotyped. As our results indicated that genetic diversity was similar across the sampled core collection sizes compared to the main collection, we selected the smallest core collection size of 10 percent. We expect this 10 percent core collection to be an optimal tool for discovering and evaluating functional diversity in the genebank to advance potato breeding and agricultural-related studies. This study also lays the foundations for continued CCC curation by evaluating duplicity and admixing between accessions, completing the digitalization of data, and ploidy determination using chloroplast count.
Collapse
Affiliation(s)
| | - Jhon A. Berdugo-Cely
- Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación Tibaitatá, Mosquera, Colombia
- Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación Turipaná, Montería, Colombia
| | - Ivania Cerón-Souza
- Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación Tibaitatá, Mosquera, Colombia
| | - Zahara Lasso-Paredes
- Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación Tibaitatá, Mosquera, Colombia
| | - Paula H. Reyes-Herrera
- Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación Tibaitatá, Mosquera, Colombia
| | - Roxana Yockteng
- Corporación Colombiana de Investigación Agropecuaria-AGROSAVIA, Centro de Investigación Tibaitatá, Mosquera, Colombia
- Institut de Systématique, Evolution, Biodiversité-UMR-CNRS 7205, National Museum of Natural History, Paris, France
- *Correspondence: Roxana Yockteng,
| |
Collapse
|
2
|
Reyes-Herrera PH, Torres-Bedoya E, Lopez-Alvarez D, Burbano-David D, Carmona SL, Bebber DP, Studholme DJ, Betancourt M, Soto-Suarez M. Genome Sequence Data Reveal at Least Two Distinct Incursions of the Tropical Race 4 Variant of Fusarium Wilt into South America. Phytopathology 2023; 113:90-97. [PMID: 36095335 DOI: 10.1094/phyto-01-22-0034-r] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The global banana industry is threatened by one of the most devastating diseases: Fusarium wilt of banana. Fusarium wilt of banana is caused by the soilborne fungus Fusarium oxysporum f. sp. cubense (Foc), which almost annihilated the banana production in the late 1950s. A new strain of Foc, known as tropical race 4 (TR4), attacks a wide range of banana varieties, including Cavendish clones, which are the source of 99% of banana exports. In 2019, Foc TR4 was reported in Colombia, and more recently (2021) in Peru. In this study, we sequenced three fungal isolates identified as Foc TR4 from La Guajira (Colombia) and compared them against 19 whole-genome sequences of Foc TR4 publicly available, including four genome sequences recently released from Peru. To understand the genetic relatedness of the Colombian Foc TR4 isolates and those from Peru, we conducted a phylogenetic analysis based on a genome-wide set of single nucleotide polymorphisms (SNPs). Additionally, we compared the genomes of the 22 available Foc TR4 isolates, looking for the presence-absence of gene polymorphisms and genomic regions. Our results reveal that (i) the Colombian and Peruvian isolates are genetically distant, which could be better explained by independent incursions of the pathogen to the continent, and (ii) there is a high correspondence between the genetic relatedness and geographic origin of Foc TR4. The profile of present/absent genes and the distribution of missing genomic regions showed a high correspondence to the clades recovered in the phylogenetic analysis, supporting the results obtained by SNP-based phylogeny.
Collapse
Affiliation(s)
- Paula H Reyes-Herrera
- Corporación Colombiana de Investigación Agropecuaria-Agrosavia, C.I Tibaitatá, Km 14 vía, Mosquera-Bogotá, Cundinamarca, Colombia
| | - Eliana Torres-Bedoya
- Corporación Colombiana de Investigación Agropecuaria-Agrosavia, C.I Tibaitatá, Km 14 vía, Mosquera-Bogotá, Cundinamarca, Colombia
- Biosciences, University of Exeter, Geoffrey Pope Building, Exeter, United Kingdom
| | - Diana Lopez-Alvarez
- Universidad Nacional de Colombia, Sede Palmira, Facultad de Ciencias Agropecuarias, Departamento de Ciencias Biológicas, Palmira, Colombia
| | - Diana Burbano-David
- Corporación Colombiana de Investigación Agropecuaria-Agrosavia, C.I Tibaitatá, Km 14 vía, Mosquera-Bogotá, Cundinamarca, Colombia
| | - Sandra L Carmona
- Corporación Colombiana de Investigación Agropecuaria-Agrosavia, C.I Tibaitatá, Km 14 vía, Mosquera-Bogotá, Cundinamarca, Colombia
| | - Daniel P Bebber
- Biosciences, University of Exeter, Geoffrey Pope Building, Exeter, United Kingdom
| | - David J Studholme
- Biosciences, University of Exeter, Geoffrey Pope Building, Exeter, United Kingdom
| | - Monica Betancourt
- Corporación Colombiana de Investigación Agropecuaria-Agrosavia, C.I Tibaitatá, Km 14 vía, Mosquera-Bogotá, Cundinamarca, Colombia
| | - Mauricio Soto-Suarez
- Corporación Colombiana de Investigación Agropecuaria-Agrosavia, C.I Tibaitatá, Km 14 vía, Mosquera-Bogotá, Cundinamarca, Colombia
| |
Collapse
|
3
|
Parra-Salazar A, Gomez J, Lozano-Arce D, Reyes-Herrera PH, Duitama J. Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species. Mol Ecol Resour 2021; 22:439-454. [PMID: 34288487 DOI: 10.1111/1755-0998.13477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 07/08/2021] [Accepted: 07/13/2021] [Indexed: 12/14/2022]
Abstract
Genotyping-by-sequencing (GBS) is a widely used and cost-effective technique for obtaining large numbers of genetic markers from populations by sequencing regions adjacent to restriction cut sites. Although a standard reference-based pipeline can be followed to analyse GBS reads, a reference genome is still not available for a large number of species. Hence, reference-free approaches are required to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, available tools to perform de novo analysis of GBS reads face issues of usability, accuracy and performance. Furthermore, few available tools are suitable for analysing data sets from polyploid species. In this manuscript, we describe a novel algorithm to perform reference-free variant detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow for efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-of-the-art variant detector already implemented in this tool. We performed benchmark experiments with three different empirical data sets of plants and animals with different population structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for many research groups conducting population genetic studies in a wide variety of species.
Collapse
Affiliation(s)
- Andrea Parra-Salazar
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Gomez
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Daniela Lozano-Arce
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Jorge Duitama
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
4
|
Reyes-Herrera PH, Muñoz-Baena L, Velásquez-Zapata V, Patiño L, Delgado-Paz OA, Díaz-Diez CA, Navas-Arboleda AA, Cortés AJ. Inheritance of Rootstock Effects in Avocado ( Persea americana Mill.) cv. Hass. Front Plant Sci 2020; 11:555071. [PMID: 33424874 PMCID: PMC7785968 DOI: 10.3389/fpls.2020.555071] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 11/17/2020] [Indexed: 05/16/2023]
Abstract
Grafting is typically utilized to merge adapted seedling rootstocks with highly productive clonal scions. This process implies the interaction of multiple genomes to produce a unique tree phenotype. However, the interconnection of both genotypes obscures individual contributions to phenotypic variation (rootstock-mediated heritability), hampering tree breeding. Therefore, our goal was to quantify the inheritance of seedling rootstock effects on scion traits using avocado (Persea americana Mill.) cv. Hass as a model fruit tree. We characterized 240 diverse rootstocks from 8 avocado cv. Hass orchards with similar management in three regions of the province of Antioquia, northwest Andes of Colombia, using 13 microsatellite markers simple sequence repeats (SSRs). Parallel to this, we recorded 20 phenotypic traits (including morphological, biomass/reproductive, and fruit yield and quality traits) in the scions for 3 years (2015-2017). Relatedness among rootstocks was inferred through the genetic markers and inputted in a "genetic prediction" model to calculate narrow-sense heritabilities (h 2) on scion traits. We used three different randomization tests to highlight traits with consistently significant heritability estimates. This strategy allowed us to capture five traits with significant heritability values that ranged from 0.33 to 0.45 and model fits (r) that oscillated between 0.58 and 0.73 across orchards. The results showed significance in the rootstock effects for four complex harvest and quality traits (i.e., total number of fruits, number of fruits with exportation quality, and number of fruits discarded because of low weight or thrips damage), whereas the only morphological trait that had a significant heritability value was overall trunk height (an emergent property of the rootstock-scion interaction). These findings suggest the inheritance of rootstock effects, beyond root phenotype, on a surprisingly wide spectrum of scion traits in "Hass" avocado. They also reinforce the utility of polymorphic SSRs for relatedness reconstruction and genetic prediction of complex traits. This research is, up to date, the most cohesive evidence of narrow-sense inheritance of rootstock effects in a tropical fruit tree crop. Ultimately, our work highlights the importance of considering the rootstock-scion interaction to broaden the genetic basis of fruit tree breeding programs while enhancing our understanding of the consequences of grafting.
Collapse
Affiliation(s)
- Paula H. Reyes-Herrera
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI Tibaitatá, Mosquera, Colombia
| | - Laura Muñoz-Baena
- Department of Microbiology and Immunology, Western University, London, ON, Canada
| | - Valeria Velásquez-Zapata
- Department of Plant Pathology and Microbiology, Interdepartmental Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
| | - Laura Patiño
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| | - Oscar A. Delgado-Paz
- Facultad de Ingenierías, Universidad Católica de Oriente—UCO, Rionegro, Antioquia
| | - Cipriano A. Díaz-Diez
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| | | | - Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)—CI La Selva, Rionegro, Colombia
| |
Collapse
|
5
|
Abstract
High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes--generically known as restriction site associated DNA sequencing (RAD-seq)--is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for nonmodel species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting "neutral" elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced representation data are available (including transcriptomes and neutral RAD-seq data sets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods.
Collapse
Affiliation(s)
- Santiago Herrera
- Biology Department, Woods Hole Oceanographic Institution Biology Department, Massachusetts Institute of Technology
| | | | | |
Collapse
|
6
|
Reyes-Herrera PH, Speck-Hernandez CA, Sierra CA, Herrera S. BackCLIP: a tool to identify common background presence in PAR-CLIP datasets. Bioinformatics 2015; 31:3703-5. [PMID: 26227145 DOI: 10.1093/bioinformatics/btv442] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 07/19/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION PAR-CLIP, a CLIP-seq protocol, derives a transcriptome wide set of binding sites for RNA-binding proteins. Even though the protocol uses stringent washing to remove experimental noise, some of it remains. A recent study measured three sets of non-specific RNA backgrounds which are present in several PAR-CLIP datasets. However, a tool to identify the presence of common background in PAR-CLIP datasets is not yet available. RESULTS We used the measured sets of non-specific RNA backgrounds to build a common background set. Each element from the common background set has a score that reflects its presence in several PAR-CLIP datasets. We present a tool that uses this score to identify the amount of common backgrounds present in a PAR-CLIP dataset, and we provide the user the option to use or remove it. We used the proposed strategy in 30 PAR-CLIP datasets from nine proteins. It is possible to identify the presence of common backgrounds in a dataset and identify differences in datasets for the same protein. This method is the first step in the process of completely removing such backgrounds. AVAILABILITY The tool was implemented in python. The common background set and the supplementary data are available at https://github.com/phrh/BackCLIP. CONTACT phreyes@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P H Reyes-Herrera
- Colombian Corporation for Agricultural Research (CORPOICA), 250047 Bogotá, Colombia
| | | | - C A Sierra
- Universidad Antonio Nariño, 110311 Bogotá, Colombia
| | - S Herrera
- Woods Hole Oceanographic Institution, 02543 Massachusetts, USA and Massachusetts Institute of Technology, 02139 Massachusetts, USA
| |
Collapse
|
7
|
Reyes-Herrera PH, Ficarra E. Computational Methods for CLIP-seq Data Processing. Bioinform Biol Insights 2014; 8:199-207. [PMID: 25336930 PMCID: PMC4196881 DOI: 10.4137/bbi.s16803] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Revised: 07/29/2014] [Accepted: 08/01/2014] [Indexed: 12/25/2022] Open
Abstract
RNA-binding proteins (RBPs) are at the core of post-transcriptional regulation and thus of gene expression control at the RNA level. One of the principal challenges in the field of gene expression regulation is to understand RBPs mechanism of action. As a result of recent evolution of experimental techniques, it is now possible to obtain the RNA regions recognized by RBPs on a transcriptome-wide scale. In fact, CLIP-seq protocols use the joint action of CLIP, crosslinking immunoprecipitation, and high-throughput sequencing to recover the transcriptome-wide set of interaction regions for a particular protein. Nevertheless, computational methods are necessary to process CLIP-seq experimental data and are a key to advancement in the understanding of gene regulatory mechanisms. Considering the importance of computational methods in this area, we present a review of the current status of computational approaches used and proposed for CLIP-seq data.
Collapse
Affiliation(s)
- Paula H Reyes-Herrera
- Facultad de Ingeniería Electrónica y Biomédica, Universidad Antonio Nariño, Bogotá, Colombia
| | - Elisa Ficarra
- Department of Control and Computer Engineering, Politecnico di Torino, TO, Italy
| |
Collapse
|
8
|
Abstract
Background Computational methods for microRNA target prediction are a fundamental step to understand the miRNA role in gene regulation, a key process in molecular biology. In this paper we present miREE, a novel microRNA target prediction tool. miREE is an ensemble of two parts entailing complementary but integrated roles in the prediction. The Ab-Initio module leverages upon a genetic algorithmic approach to generate a set of candidate sites on the basis of their microRNA-mRNA duplex stability properties. Then, a Support Vector Machine (SVM) learning module evaluates the impact of microRNA recognition elements on the target gene. As a result the prediction takes into account information regarding both miRNA-target structural stability and accessibility. Results The proposed method significantly improves the state-of-the-art prediction tools in terms of accuracy with a better balance between specificity and sensitivity, as demonstrated by the experiments conducted on several large datasets across different species. miREE achieves this result by tackling two of the main challenges of current prediction tools: (1) The reduced number of false positives for the Ab-Initio part thanks to the integration of a machine learning module (2) the specificity of the machine learning part, obtained through an innovative technique for rich and representative negative records generation. The validation was conducted on experimental datasets where the miRNA:mRNA interactions had been obtained through (1) direct validation where even the binding site is provided, or through (2) indirect validation, based on gene expression variations obtained from high-throughput experiments where the specific interaction is not validated in detail and consequently the specific binding site is not provided. Conclusions The coupling of two parts: a sensitive Ab-Initio module and a selective machine learning part capable of recognizing the false positives, leads to an improved balance between sensitivity and specificity. miREE obtains a reasonable trade-off between filtering false positives and identifying targets. miREE tool is available online at http://didattica-online.polito.it/eda/miREE/
Collapse
Affiliation(s)
- Paula H Reyes-Herrera
- Department of Control and Computer Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 TO, Italy.
| | | | | | | |
Collapse
|