1
|
Hoyos Sanchez MC, Ospina Zapata HS, Suarez BD, Ospina C, Barbosa HJ, Carranza Martinez JC, Vallejo GA, Urrea Montes D, Duitama J. A phased genome assembly of a Colombian Trypanosoma cruzi TcI strain and the evolution of gene families. Sci Rep 2024; 14:2054. [PMID: 38267502 PMCID: PMC10808112 DOI: 10.1038/s41598-024-52449-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 01/18/2024] [Indexed: 01/26/2024] Open
Abstract
Chagas is an endemic disease in tropical regions of Latin America, caused by the parasite Trypanosoma cruzi. High intraspecies variability and genome complexity have been challenges to assemble high quality genomes needed for studies in evolution, population genomics, diagnosis and drug development. Here we present a chromosome-level phased assembly of a TcI T. cruzi strain (Dm25). While 29 chromosomes show a large collinearity with the assembly of the Brazil A4 strain, three chromosomes show both large heterozygosity and large divergence, compared to previous assemblies of TcI T. cruzi strains. Nucleotide and protein evolution statistics indicate that T. cruzi Marinkellei separated before the diversification of T. cruzi in the known DTUs. Interchromosomal paralogs of dispersed gene families and histones appeared before but at the same time have a more strict purifying selection, compared to other repeat families. Previously unreported large tandem arrays of protein kinases and histones were identified in this assembly. Over one million variants obtained from Illumina reads aligned to the primary assembly clearly separate the main DTUs. We expect that this new assembly will be a valuable resource for further studies on evolution and functional genomics of Trypanosomatids.
Collapse
Affiliation(s)
- Maria Camila Hoyos Sanchez
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX, 79106, USA
| | | | - Brayhan Dario Suarez
- Laboratorio de Investigaciones en Parasitología Tropical (LIPT), Universidad del Tolima, Ibagué, Colombia
| | - Carlos Ospina
- Laboratorio de Investigaciones en Parasitología Tropical (LIPT), Universidad del Tolima, Ibagué, Colombia
| | - Hamilton Julian Barbosa
- Laboratorio de Investigaciones en Parasitología Tropical (LIPT), Universidad del Tolima, Ibagué, Colombia
| | | | - Gustavo Adolfo Vallejo
- Laboratorio de Investigaciones en Parasitología Tropical (LIPT), Universidad del Tolima, Ibagué, Colombia
| | - Daniel Urrea Montes
- Laboratorio de Investigaciones en Parasitología Tropical (LIPT), Universidad del Tolima, Ibagué, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia.
| |
Collapse
|
2
|
Gaitán N, Duitama J. A graph clustering algorithm for detection and genotyping of structural variants from long reads. Gigascience 2024; 13:giad112. [PMID: 38206589 PMCID: PMC10783151 DOI: 10.1093/gigascience/giad112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 08/02/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed. FINDINGS We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths. CONCLUSION The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.
Collapse
Affiliation(s)
- Nicolás Gaitán
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
| |
Collapse
|
3
|
Lozano-Arce D, García T, Gonzalez-Garcia LN, Guyot R, Chacón-Sánchez MI, Duitama J. Selection signatures and population dynamics of transposable elements in lima bean. Commun Biol 2023; 6:803. [PMID: 37532823 PMCID: PMC10397206 DOI: 10.1038/s42003-023-05144-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 07/13/2023] [Indexed: 08/04/2023] Open
Abstract
The domestication process in lima bean (Phaseolus lunatus L.) involves two independent events, within the Mesoamerican and Andean gene pools. This makes lima bean an excellent model to understand convergent evolution. The mechanisms of adaptation followed by Mesoamerican and Andean landraces are largely unknown. Genes related to these adaptations can be selected by identification of selective sweeps within gene pools. Previous genetic analyses in lima bean have relied on Single Nucleotide Polymorphism (SNP) loci, and have ignored transposable elements (TEs). Here we show the analysis of whole-genome sequencing data from 61 lima bean accessions to characterize a genomic variation database including TEs and SNPs, to associate selective sweeps with variable TEs and to predict candidate domestication genes. A small percentage of genes under selection are shared among gene pools, suggesting that domestication followed different genetic avenues in both gene pools. About 75% of TEs are located close to genes, which shows their potential to affect gene functions. The genetic structure inferred from variable TEs is consistent with that obtained from SNP markers, suggesting that TE dynamics can be related to the demographic history of wild and domesticated lima bean and its adaptive processes, in particular selection processes during domestication.
Collapse
Affiliation(s)
- Daniela Lozano-Arce
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Tatiana García
- Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Laura Natalia Gonzalez-Garcia
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
- Institut de Recherche pour le Développement (IRD), UMR DIADE, Université de Montpellier, CIRAD, 34394, Montpellier, France
| | - Romain Guyot
- Institut de Recherche pour le Développement (IRD), UMR DIADE, Université de Montpellier, CIRAD, 34394, Montpellier, France
| | - Maria Isabel Chacón-Sánchez
- Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia.
| |
Collapse
|
4
|
Gonzalez‐García LN, Lozano‐Arce D, Londoño JP, Guyot R, Duitama J. Efficient homology-based annotation of transposable elements using minimizers. Appl Plant Sci 2023; 11:e11520. [PMID: 37601317 PMCID: PMC10439823 DOI: 10.1002/aps3.11520] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 03/02/2023] [Accepted: 03/04/2023] [Indexed: 08/22/2023]
Abstract
Premise Transposable elements (TEs) make up more than half of the genomes of complex plant species and can modulate the expression of neighboring genes, producing significant variability of agronomically relevant traits. The availability of long-read sequencing technologies allows the building of genome assemblies for plant species with large and complex genomes. Unfortunately, TE annotation currently represents a bottleneck in the annotation of genome assemblies. Methods and Results We present a new functionality of the Next-Generation Sequencing Experience Platform (NGSEP) to perform efficient homology-based TE annotation. Sequences in a reference library are treated as long reads and mapped to an input genome assembly. A hierarchical annotation is then assigned by homology using the annotation of the reference library. We tested the performance of our algorithm on genome assemblies of different plant species, including Arabidopsis thaliana, Oryza sativa, Coffea humblotiana, and Triticum aestivum (bread wheat). Our algorithm outperforms traditional homology-based annotation tools in speed by a factor of three to >20, reducing the annotation time of the T. aestivum genome from months to hours, and recovering up to 80% of TEs annotated with RepeatMasker with a precision of up to 0.95. Conclusions NGSEP allows rapid analysis of TEs, especially in very large and TE-rich plant genomes.
Collapse
Affiliation(s)
- Laura Natalia Gonzalez‐García
- Systems and Computing Engineering DepartmentUniversidad de los AndesBogotáColombia
- UMR DIADE, Institut de Recherche pour le DéveloppementUniversité de Montpellier, CIRAD34394MontpellierFrance
| | - Daniela Lozano‐Arce
- Systems and Computing Engineering DepartmentUniversidad de los AndesBogotáColombia
| | | | - Romain Guyot
- UMR DIADE, Institut de Recherche pour le DéveloppementUniversité de Montpellier, CIRAD34394MontpellierFrance
| | - Jorge Duitama
- Systems and Computing Engineering DepartmentUniversidad de los AndesBogotáColombia
| |
Collapse
|
5
|
Gonzalez-Garcia L, Guevara-Barrientos D, Lozano-Arce D, Gil J, Díaz-Riaño J, Duarte E, Andrade G, Bojacá JC, Hoyos-Sanchez MC, Chavarro C, Guayazan N, Chica LA, Buitrago Acosta MC, Bautista E, Trujillo M, Duitama J. New algorithms for accurate and efficient de novo genome assembly from long DNA sequencing reads. Life Sci Alliance 2023; 6:e202201719. [PMID: 36813568 PMCID: PMC9946810 DOI: 10.26508/lsa.202201719] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 02/24/2023] Open
Abstract
Building de novo genome assemblies for complex genomes is possible thanks to long-read DNA sequencing technologies. However, maximizing the quality of assemblies based on long reads is a challenging task that requires the development of specialized data analysis techniques. We present new algorithms for assembling long DNA sequencing reads from haploid and diploid organisms. The assembly algorithm builds an undirected graph with two vertices for each read based on minimizers selected by a hash function derived from the k-mer distribution. Statistics collected during the graph construction are used as features to build layout paths by selecting edges, ranked by a likelihood function. For diploid samples, we integrated a reimplementation of the ReFHap algorithm to perform molecular phasing. We ran the implemented algorithms on PacBio HiFi and Nanopore sequencing data taken from haploid and diploid samples of different species. Our algorithms showed competitive accuracy and computational efficiency, compared with other currently used software. We expect that this new development will be useful for researchers building genome assemblies for different species.
Collapse
Affiliation(s)
- Laura Gonzalez-Garcia
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Daniela Lozano-Arce
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Juanita Gil
- Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR, USA
| | - Jorge Díaz-Riaño
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Erick Duarte
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Germán Andrade
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Juan Camilo Bojacá
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Christian Chavarro
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Natalia Guayazan
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Luis Alberto Chica
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia
| | | | - Edwin Bautista
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Miller Trujillo
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
6
|
Santacruz CA, Vincent JL, Duitama J, Bautista E, Imbault V, Bruneau M, Creteur J, Brimioulle S, Communi D, Taccone FS. vCSF Danger-associated Molecular Patterns After Traumatic and Nontraumatic Acute Brain Injury: A Prospective Study. J Neurosurg Anesthesiol 2023:00008506-990000000-00060. [PMID: 37188652 DOI: 10.1097/ana.0000000000000916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 03/14/2023] [Indexed: 05/17/2023]
Abstract
BACKGROUND Danger-associated molecular patterns (DAMPs) may be implicated in the pathophysiological pathways associated with an unfavorable outcome after acute brain injury (ABI). METHODS We collected samples of ventricular cerebrospinal fluid (vCSF) for 5 days in 50 consecutive patients at risk of intracranial hypertension after traumatic and nontraumatic ABI. Differences in vCSF protein expression over time were evaluated using linear models and selected for functional network analysis using the PANTHER and STRING databases. The primary exposure of interest was the type of brain injury (traumatic vs. nontraumatic), and the primary outcome was the vCSF expression of DAMPs. Secondary exposures of interest included the occurrence of intracranial pressure ≥20 or ≥ 30 mm Hg during the 5 days post-ABI, intensive care unit (ICU) mortality, and neurological outcome (assessed using the Glasgow Outcome Score) at 3 months post-ICU discharge. Secondary outcomes included associations of these exposures with the vCSF expression of DAMPs. RESULTS A network of 6 DAMPs (DAMP_trauma; protein-protein interaction [PPI] P=0.04) was differentially expressed in patients with ABI of traumatic origin compared with those with nontraumatic ABI. ABI patients with intracranial pressure ≥30 mm Hg differentially expressed a set of 38 DAMPS (DAMP_ICP30; PPI P< 0.001). Proteins in DAMP_ICP30 are involved in cellular proteolysis, complement pathway activation, and post-translational modifications. There were no relationships between DAMP expression and ICU mortality or unfavorable versus favorable outcomes. CONCLUSIONS Specific patterns of vCSF DAMP expression differentiated between traumatic and nontraumatic types of ABI and were associated with increased episodes of severe intracranial hypertension.
Collapse
Affiliation(s)
- Carlos A Santacruz
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
- Department of Intensive and Critical Care Medicine, Santa Fe de Bogotá Foundation
| | - Jean-Louis Vincent
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - Jorge Duitama
- Systems and Computing Engineering Department, University of los Andes, Bogotá, Colombia
| | - Edwin Bautista
- Department of Intensive and Critical Care Medicine, Santa Fe de Bogotá Foundation
| | - Virginie Imbault
- Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire, Université Libre de Bruxelles, Brussels, Belgium
| | - Michael Bruneau
- Department of Neurosurgery, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - Jacques Creteur
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - Serge Brimioulle
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - David Communi
- Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire, Université Libre de Bruxelles, Brussels, Belgium
| | - Fabio S Taccone
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
7
|
Tello D, Gonzalez-Garcia LN, Gomez J, Zuluaga-Monares JC, Garcia R, Angel R, Mahecha D, Duarte E, Leon MDR, Reyes F, Escobar-Velásquez C, Linares-Vásquez M, Cardozo N, Duitama J. NGSEP 4: Efficient and accurate identification of orthogroups and whole-genome alignment. Mol Ecol Resour 2023; 23:712-724. [PMID: 36377253 DOI: 10.1111/1755-0998.13737] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/26/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022]
Abstract
Whole-genome alignment allows researchers to understand the genomic structure and variation among genomes. Approaches based on direct pairwise comparisons of DNA sequences require large computational capacities. As a consequence, pipelines combining tools for orthologous gene identification and synteny have been developed. In this manuscript, we present the latest functionalities implemented in NGSEP 4, to identify orthogroups and perform whole genome alignments. NGSEP implements functionalities for identification of clusters of homologus genes, synteny analysis and whole genome alignment. Our results showed that the NGSEP algorithm for orthogroups identification has competitive accuracy and efficiency in comparison to commonly used tools. The implementation also includes a visualization of the whole genome alignment based on synteny of the orthogroups that were identified, and a reconstruction of the pangenome based on frequencies of the orthogroups among the genomes. NGSEP 4 also includes a new graphical user interface based on the JavaFX technology. We expect that these new developments will be very useful for several studies in evolutionary biology and population genomics.
Collapse
Affiliation(s)
- Daniel Tello
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Jorge Gomez
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Rogelio Garcia
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Ricardo Angel
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Daniel Mahecha
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Erick Duarte
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Maria Del Rosario Leon
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Fernando Reyes
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | | | - Mario Linares-Vásquez
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Nicolas Cardozo
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
8
|
Herrera-Rocha F, Fernández-Niño M, Cala MP, Duitama J, Barrios AFG. Omics approaches to understand cocoa processing and chocolate flavor development: A review. Food Res Int 2023; 165:112555. [PMID: 36869541 DOI: 10.1016/j.foodres.2023.112555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 01/25/2023] [Accepted: 01/29/2023] [Indexed: 02/10/2023]
Abstract
The global market of chocolate has increased worldwide during the last decade and is expected to reach a value of USD 200 billion by 2028. Chocolate is obtained from different varieties of Theobroma cacao L, a plant domesticated more than 4000 years ago in the Amazon rainforest. However, chocolate production is a complex process requiring extensive post-harvesting, mainly involving cocoa bean fermentation, drying, and roasting. These steps have a critical impact on chocolate quality. Standardizing and better understanding cocoa processing is, therefore, a current challenge to boost the global production of high-quality cocoa worldwide. This knowledge can also help cocoa producers improve cocoa processing management and obtain a better chocolate. Several recent studies have been conducted to dissect cocoa processing via omics analysis. A vast amount of data has been produced regarding omics studies of cocoa processing performed worldwide. This review systematically analyzes the current data on cocoa omics using data mining techniques and discusses opportunities and gaps for cocoa processing standardization from this data. First, we observed a recurrent report in metagenomics studies of species of the fungi genus Candida and Pichia as well as bacteria from the genus Lactobacillus, Acetobacter, and Bacillus. Second, our analyzes of the available metabolomics data showed clear differences in the identified metabolites in cocoa and chocolate from different geographical origin, cocoa type, and processing stage. Finally, our analysis of peptidomics data revealed characteristic patterns in the gathered data including higher diversity and lower size distribution of peptides in fine-flavor cocoa. In addition, we discuss the current challenges in cocoa omics research. More research is still required to fill gaps in central matter in chocolate production as starter cultures for cocoa fermentation, flavor evolution of cocoa, and the role of peptides in the development of specific flavor notes. We also offer the most comprehensive collection of multi-omics data in cocoa processing gathered from different research articles.
Collapse
Affiliation(s)
- Fabio Herrera-Rocha
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá 111711, Colombia
| | - Miguel Fernández-Niño
- Leibniz-Institute of Plant Biochemistry, Department of Bioorganic Chemistry, Weinberg 3, D-06120 Halle, Germany.
| | - Mónica P Cala
- MetCore - Metabolomics Core Facility, Vice-Presidency for Research, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
| | - Andrés Fernando González Barrios
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá 111711, Colombia.
| |
Collapse
|
9
|
Abstract
The ultimate goal of de novo assembly of reads sequenced from a diploid individual is the separate reconstruction of the sequences corresponding to the two copies of each chromosome. Unfortunately, the allele linkage information needed to perform phased genome assemblies has been difficult to generate. Hence, most current genome assemblies are a haploid mixture of the two underlying chromosome copies present in the sequenced individual. Sequencing technologies providing long (20 kb) and accurate reads are the basis to generate phased genome assemblies. This chapter provides a brief overview of the main milestones in traditional genome assembly, focusing on the bioinformatic techniques developed to generate haplotype information from different specialized protocols. Using these techniques as a knowledge background, the chapter reviews the current algorithms to generate phased assemblies from long reads with low error rates. Current techniques perform haplotype-aware error correction steps to increase the quality of the raw reads. In addition, variations on the traditional overlap-layout-consensus (OLC) graph have been developed in an effort to eliminate edges between reads sequenced from different chromosome copies. This allows for large presence-absence variants between the chromosome copies to be taken into account. The development of these algorithms, along with the improved sequencing technologies has been crucial to finish chromosome-level assemblies of complex genomes.
Collapse
Affiliation(s)
- Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia.
| |
Collapse
|
10
|
Herrera-Rocha F, Cala MP, León-Inga AM, Aguirre Mejía JL, Rodríguez-López CM, Florez SL, Chica MJ, Olarte HH, Duitama J, González Barrios AF, Fernández-Niño M. Lipidomic profiling of bioactive lipids during spontaneous fermentations of fine-flavor cocoa. Food Chem 2022; 397:133845. [PMID: 35940096 DOI: 10.1016/j.foodchem.2022.133845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/24/2022] [Accepted: 07/31/2022] [Indexed: 11/04/2022]
Abstract
The impact of cocoa lipid content on chocolate quality has been extensively described. Nevertheless, few studies have elucidated the cocoa lipid composition and their bioactive properties, focusing only on specific lipids. In the present study the lipidome of fine-flavor cocoa fermentation was analyzed using LC-MS-QTOF and a Machine Learning model to assess potential bioactivity was developed. Our results revealed that the cocoa lipidome, comprised mainly of fatty acyls and glycerophospholipids, remains stable during fine-flavor cocoa fermentations. Also, several Machine Learning algorithms were trained to explore potential biological activity among the identified lipids. We found that K-Nearest Neighbors had the best performance. This model was used to classify the identified lipids as bioactive or non-bioactive, nominating 28 molecules as potential bioactive lipids. None of these compounds have been previously reported as bioactive. Our work is the first untargeted lipidomic study and systematic effort to investigate potential bioactivity in fine-flavor cocoa lipids.
Collapse
Affiliation(s)
- Fabio Herrera-Rocha
- Grupo de Diseño de Productos y Procesos (GDPP), Departamento de Ingeniería Química y de Alimentos, Universidad de los Andes, Bogotá, Colombia
| | - Mónica P Cala
- MetCore - Metabolomics Core Facility. Vice-Presidency for Research, Universidad de los Andes, Bogotá, Colombia
| | - Ana Maria León-Inga
- MetCore - Metabolomics Core Facility. Vice-Presidency for Research, Universidad de los Andes, Bogotá, Colombia
| | | | | | | | | | | | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia
| | - Andrés Fernando González Barrios
- Grupo de Diseño de Productos y Procesos (GDPP), Departamento de Ingeniería Química y de Alimentos, Universidad de los Andes, Bogotá, Colombia.
| | - Miguel Fernández-Niño
- Grupo de Diseño de Productos y Procesos (GDPP), Departamento de Ingeniería Química y de Alimentos, Universidad de los Andes, Bogotá, Colombia; Leibniz-Institute of Plant Biochemistry, Department of Bioorganic Chemistry, Weinberg 3, D-06120 Halle, Germany.
| |
Collapse
|
11
|
Parker TA, Cetz J, de Sousa LL, Kuzay S, Lo S, Floriani TDO, Njau S, Arunga E, Duitama J, Jernstedt J, Myers JR, Llaca V, Herrera-Estrella A, Gepts P. Loss of pod strings in common bean is associated with gene duplication, retrotransposon insertion and overexpression of PvIND. New Phytol 2022; 235:2454-2465. [PMID: 35708662 DOI: 10.1111/nph.18319] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 05/27/2022] [Indexed: 06/15/2023]
Abstract
Fruit development has been central in the evolution and domestication of flowering plants. In common bean (Phaseolus vulgaris), the principal global grain legume staple, two main production categories are distinguished by fibre deposition in pods: dry beans, with fibrous, stringy pods; and stringless snap/green beans, with reduced fibre deposition, which frequently revert to the ancestral stringy state. Here, we identify genetic and developmental patterns associated with pod fibre deposition. Transcriptional, anatomical, epigenetic and genetic regulation of pod strings were explored through RNA-seq, RT-qPCR, fluorescence microscopy, bisulfite sequencing and whole-genome sequencing. Overexpression of the INDEHISCENT ('PvIND') orthologue was observed in stringless types compared with isogenic stringy lines, associated with overspecification of weak dehiscence-zone cells throughout the pod vascular sheath. No differences in DNA methylation were correlated with this phenotype. Nonstringy varieties showed a tandemly direct duplicated PvIND and a Ty1-copia retrotransposon inserted between the two repeats. These sequence features are lost during pod reversion and are predictive of pod phenotype in diverse materials, supporting their role in PvIND overexpression and reversible string phenotype. Our results give insight into reversible gain-of-function mutations and possible genetic solutions to the reversion problem, of considerable economic value for green bean production.
Collapse
Affiliation(s)
- Travis A Parker
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
| | - Jose Cetz
- National Laboratory of Genomics for Biodiversity, CINVESTAV, Irapuato, Guanajuato, C.P. 36821, Mexico
| | - Lorenna Lopes de Sousa
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
| | - Saarah Kuzay
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
| | - Sassoum Lo
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
| | - Talissa de Oliveira Floriani
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
- Department of Genetics, Escola Superior de Agricultura 'Luiz de Queiroz', Universidade de São Paulo, Piracicaba, SP, 13418-900, Brazil
| | - Serah Njau
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
- Department of Water and Agricultural Resource Management, University of Embu, Embu, 60100, Kenya
| | - Esther Arunga
- Department of Water and Agricultural Resource Management, University of Embu, Embu, 60100, Kenya
| | - Jorge Duitama
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Judy Jernstedt
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
| | - James R Myers
- Department of Horticulture, Oregon State University, Corvallis, OR, 97331, USA
| | | | - Alfredo Herrera-Estrella
- National Laboratory of Genomics for Biodiversity, CINVESTAV, Irapuato, Guanajuato, C.P. 36821, Mexico
| | - Paul Gepts
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616-8780, USA
| |
Collapse
|
12
|
Santacruz CA, Vincent JL, Duitama J, Bautista E, Imbault V, Bruneau M, Creteur J, Brimioulle S, Communi D, Taccone FS. The Cerebrospinal Fluid Proteomic Response to Traumatic and Nontraumatic Acute Brain Injury: A Prospective Study. Neurocrit Care 2022; 37:463-470. [PMID: 35523916 DOI: 10.1007/s12028-022-01507-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 04/01/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND Quantitative analysis of ventricular cerebrospinal fluid (vCSF) proteins following acute brain injury (ABI) may help identify pathophysiological pathways and potential biomarkers that can predict unfavorable outcome. METHODS In this prospective proteomic analysis study, consecutive patients with severe ABI expected to require intraventricular catheterization for intracranial pressure (ICP) monitoring for at least 5 days and patients without ABI admitted for elective clipping of an unruptured cerebral aneurysm were included. vCSF samples were collected within the first 24 h after ABI and ventriculostomy insertion and then every 24 h for 5 days. In patients without ABI, a single vCSF sample was collected at the time of elective clipping. Data-independent acquisition and sequential window acquisition of all theoretical spectra (SWATH) mass spectrometry were used to compare differences in protein expression in patients with ABI and patients without ABI and in patients with traumatic and nontraumatic ABI. Differences in protein expression according to different ICP values, intensive care unit outcome, subarachnoid hemorrhage (SAH) versus traumatic brain injury (TBI), and good versus poor 3-month functional status (assessed by using the Glasgow Outcome Scale) were also evaluated. vCSF proteins with significant differences between groups were compared by using linear models and selected for gene ontology analysis using R Language and the Panther database. RESULTS We included 50 patients with ABI (SAH n = 23, TBI n = 15, intracranial hemorrhage n = 6, ischemic stroke n = 3, others n = 3) and 12 patients without ABI. There were significant differences in the expression of 255 proteins between patients with and without ABI (p < 0.01). There were intraday and interday differences in expression of seven proteins related to increased inflammation, apoptosis, oxidative stress, and cellular response to hypoxia and injury. Among these, glial fibrillary acidic protein expression was higher in patients with ABI with severe intracranial hypertension (ICH) (ICP ≥ 30 mm Hg) or death compared to those without (log 2 fold change: + 2.4; p < 0.001), suggesting extensive primary astroglial injury or death. There were differences in the expression of 96 proteins between patients with traumatic and nontraumatic ABI (p < 0.05); intraday and interday differences were observed for six proteins related to structural damage, complement activation, and cholesterol metabolism. Thirty-nine vCSF proteins were associated with an increased risk of severe ICH (ICP ≥ 30 mm Hg) in patients with traumatic compared with nontraumatic ABI (p < 0.05). No significant differences were found in protein expression between patients with SAH versus TBI or between those with good versus poor 3-month Glasgow Outcome Scale score. CONCLUSIONS Dysregulated vCSF protein expression after ABI may be associated with an increased risk of severe ICH and death.
Collapse
Affiliation(s)
- Carlos A Santacruz
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
- Department of Intensive and Critical Care Medicine, Academic Hospital Fundación Santa Fe de Bogota Foundation, Bogota, Colombia
| | - Jean-Louis Vincent
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium.
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogota, Colombia
| | - Edwin Bautista
- Systems and Computing Engineering Department, Universidad de los Andes, Bogota, Colombia
| | - Virginie Imbault
- Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire, Université Libre de Bruxelles, Brussels, Belgium
| | - Michaël Bruneau
- Department of Neurosurgery, Erasme Hospital, Université Libre de Bruxelles, Route De Lennik 808, 1070, Brussels, Belgium
| | - Jacques Creteur
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - Serge Brimioulle
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| | - David Communi
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
- Institut de Recherche Interdisciplinaire en Biologie Humaine et Moléculaire, Université Libre de Bruxelles, Brussels, Belgium
| | - Fabio S Taccone
- Department of Intensive Care, Erasme Hospital, Université Libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
13
|
Duitama J, Bartley LE, Guyot R, Sharma R. Editorial: Grass Genome Evolution and Domestication. Front Plant Sci 2022; 13:866201. [PMID: 35481135 PMCID: PMC9037283 DOI: 10.3389/fpls.2022.866201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/16/2022] [Indexed: 06/14/2023]
Affiliation(s)
- Jorge Duitama
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Laura E. Bartley
- Institute of Biological Chemistry, Washington State University, Pullman, WA, United States
| | - Romain Guyot
- Institut de Recherche pour le Développement, UMR DIADE, Montpellier, France
| | - Rita Sharma
- Department of Biological Sciences, Birla Institute of Technology and Science (BITS), Pilani, India
| |
Collapse
|
14
|
García JM, Udenigwe CC, Duitama J, Barrios AFG. Peptidomic analysis of whey protein hydrolysates and prediction of their antioxidant peptides. Food Science and Human Wellness 2022. [DOI: 10.1016/j.fshw.2021.11.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Mahecha D, Nuñez H, Lattig MC, Duitama J. Machine Learning Models for Accurate Prioritization of Variants of Uncertain Significance. Hum Mutat 2022; 43:449-460. [PMID: 35143088 DOI: 10.1002/humu.24339] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/04/2022] [Accepted: 01/23/2022] [Indexed: 11/08/2022]
Abstract
The growing use of next generation sequencing technologies on genetic diagnosis has produced an exponential increase in the number of Variants of Uncertain Significance (VUS). In this manuscript we compare three machine learning methods to classify VUS as Pathogenic or No pathogenic, implementing a Random Forest (RF), a Support Vector Machine (SVM), and a Multilayer Perceptron (MLP). To train the models, we extracted high quality variants from ClinVar that were previously classified as VUS. For each variant, we retrieved 9 conservation scores, the loss of function tool and allele frequencies. For the RF and SVM models, hyperparameters were tuned using cross validation with a grid search. The three models were tested on a non-overlapping set of variants that had been classified as VUS any time along the last three years but had been reclassified in august 2020. The three models yielded superior accuracy on this set compared to the benchmarked tools. The RF based model yielded the best performance across different variant types and was used to create VusPrize, an open source software tool for prioritization of variants of uncertain significance. We believe that our model can improve the process of genetic diagnosis in research and clinical settings. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Daniel Mahecha
- SIGEN, Alianza Universidad de los Andes - Fundación Santa Fe de Bogota, Colombia.,Systems and Computing Engineering Department, Universidad de los Andes, Colombia
| | - Haydemar Nuñez
- Systems and Computing Engineering Department, Universidad de los Andes, Colombia
| | - Maria C Lattig
- SIGEN, Alianza Universidad de los Andes - Fundación Santa Fe de Bogota, Colombia.,Facultad de Ciencias, Universidad de los Andes
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Colombia
| |
Collapse
|
16
|
Bautista D, Guayazan-Palacios N, Buitrago MC, Cardenas M, Botero D, Duitama J, Bernal AJ, Restrepo S. Comprehensive Time-Series Analysis of the Gene Expression Profile in a Susceptible Cultivar of Tree Tomato ( Solanum betaceum) During the Infection of Phytophthora betacei. Front Plant Sci 2021; 12:730251. [PMID: 34745164 PMCID: PMC8567061 DOI: 10.3389/fpls.2021.730251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/22/2021] [Indexed: 05/30/2023]
Abstract
Solanum betaceum is a tree from the Andean region bearing edible fruits, considered an exotic export. Although there has been renewed interest in its commercialization, sustainability, and disease management have been limiting factors. Phytophthora betacei is a recently described species that causes late blight in S. betaceum. There is no general study of the response of S. betaceum, particularly, in the changes in expression of pathogenesis-related genes. In this manuscript we present a comprehensive RNA-seq time-series study of the plant response to the infection of P. betacei. Following six time points of infection, the differentially expressed genes (DEGs) involved in the defense by the plant were contextualized in a sequential manner. We documented 5,628 DEGs across all time-points. From 6 to 24 h post-inoculation, we highlighted DEGs involved in the recognition of the pathogen by the likely activation of pattern-triggered immunity (PTI) genes. We also describe the possible effect of the pathogen effectors in the host during the effector-triggered response. Finally, we reveal genes related to the susceptible outcome of the interaction caused by the onset of necrotrophy and the sharp transcriptional changes as a response to the pathogen. This is the first report of the transcriptome of the tree tomato in response to the newly described pathogen P. betacei.
Collapse
Affiliation(s)
- Daniel Bautista
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Natalia Guayazan-Palacios
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Biology, University of Washington, Seattle, WA, United States
| | | | - Martha Cardenas
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - David Botero
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Adriana J. Bernal
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Silvia Restrepo
- Department of Chemical and Food Engineering, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
17
|
Vásquez AF, Muñoz AR, Duitama J, González Barrios A. Non-Extensive Fragmentation of Natural Products and Pharmacophore-Based Virtual Screening as a Practical Approach to Identify Novel Promising Chemical Scaffolds. Front Chem 2021; 9:700802. [PMID: 34422762 PMCID: PMC8377161 DOI: 10.3389/fchem.2021.700802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 06/28/2021] [Indexed: 11/25/2022] Open
Abstract
Fragment-based drug design (FBDD) and pharmacophore modeling have proven to be efficient tools to discover novel drugs. However, these approaches may become limited if the collection of fragments is highly repetitive, poorly diverse, or excessively simple. In this article, combining pharmacophore modeling and a non-classical type of fragmentation (herein called non-extensive) to screen a natural product (NP) library may provide fragments predicted as potent, diverse, and developable. Initially, we applied retrosynthetic combinatorial analysis procedure (RECAP) rules in two versions, extensive and non-extensive, in order to deconstruct a virtual library of NPs formed by the databases Traditional Chinese Medicine (TCM), AfroDb (African Medicinal Plants database), NuBBE (Nuclei of Bioassays, Biosynthesis, and Ecophysiology of Natural Products), and UEFS (Universidade Estadual de Feira de Santana). We then developed a virtual screening (VS) using two groups of natural-product-derived fragments (extensive and non-extensive NPDFs) and two overlapping pharmacophore models for each of 20 different proteins of therapeutic interest. Molecular weight, lipophilicity, and molecular complexity were estimated and compared for both types of NPDFs (and their original NPs) before and after the VS proceedings. As a result, we found that non-extensive NPDFs exhibited a much higher number of chemical entities compared to extensive NPDFs (45,355 vs. 11,525 compounds), accounting for the larger part of the hits recovered and being far less repetitive than extensive NPDFs. The structural diversity of both types of NPDFs and the NPs was shown to diminish slightly after VS procedures. Finally, and most interestingly, the pharmacophore fit score of the non-extensive NPDFs proved to be not only higher, on average, than extensive NPDFs (56% of cases) but also higher than their original NPs (69% of cases) when all of them were also recognized as hits after the VS. The findings obtained in this study indicated that the proposed cascade approach was useful to enhance the probability of identifying innovative chemical scaffolds, which deserve further development to become drug-sized candidate compounds. We consider that the knowledge about the deconstruction degree required to produce NPDFs of interest represents a good starting point for eventual synthesis, characterization, and biological activity studies.
Collapse
Affiliation(s)
- Andrés Felipe Vásquez
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical Engineering, Universidad de Los Andes, Bogotá, Colombia.,Naturalius S.A.S, Bogotá, Colombia
| | - Alejandro Reyes Muñoz
- Grupo de Biología Computacional y Ecología Microbiana (BCEM), Department of Biological Sciences, Universidad de Los Andes, Bogotá, Colombia.,Max Planck Tandem Group in Computational Biology, Universidad de Los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá, Colombia
| | - Andrés González Barrios
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical Engineering, Universidad de Los Andes, Bogotá, Colombia
| |
Collapse
|
18
|
Trujillo-Montenegro JH, Rodríguez Cubillos MJ, Loaiza CD, Quintero M, Espitia-Navarro HF, Salazar Villareal FA, Viveros Valens CA, González Barrios AF, De Vega J, Duitama J, Riascos JJ. Unraveling the Genome of a High Yielding Colombian Sugarcane Hybrid. Front Plant Sci 2021; 12:694859. [PMID: 34484261 PMCID: PMC8414525 DOI: 10.3389/fpls.2021.694859] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/07/2021] [Indexed: 05/04/2023]
Abstract
Recent developments in High Throughput Sequencing (HTS) technologies and bioinformatics, including improved read lengths and genome assemblers allow the reconstruction of complex genomes with unprecedented quality and contiguity. Sugarcane has one of the most complicated genomes among grassess with a haploid length of 1Gbp and a ploidies between 8 and 12. In this work, we present a genome assembly of the Colombian sugarcane hybrid CC 01-1940. Three types of sequencing technologies were combined for this assembly: PacBio long reads, Illumina paired short reads, and Hi-C reads. We achieved a median contig length of 34.94 Mbp and a total genome assembly of 903.2 Mbp. We annotated a total of 63,724 protein coding genes and performed a reconstruction and comparative analysis of the sucrose metabolism pathway. Nucleotide evolution measurements between orthologs with close species suggest that divergence between Saccharum officinarum and Saccharum spontaneum occurred <2 million years ago. Synteny analysis between CC 01-1940 and the S. spontaneum genome confirms the presence of translocation events between the species and a random contribution throughout the entire genome in current sugarcane hybrids. Analysis of RNA-Seq data from leaf and root tissue of contrasting sugarcane genotypes subjected to water stress treatments revealed 17,490 differentially expressed genes, from which 3,633 correspond to genes expressed exclusively in tolerant genotypes. We expect the resources presented here to serve as a source of information to improve the selection processes of new varieties of the breeding programs of sugarcane.
Collapse
Affiliation(s)
- Jhon Henry Trujillo-Montenegro
- Centro de Investigación de la Caña de Azúcar de Colombia (CENICAÑA), Cali, Colombia
- Research Group in Bioinformatics, Department of Computer Science, Faculty of Engineering, Universidad Del Valle,Cali, Colombia
| | - María Juliana Rodríguez Cubillos
- Grupo de Diseño de Productos y Procesos, Department of Chemical and Food Engineering, Faculty of Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Manuel Quintero
- Centro de Investigación de la Caña de Azúcar de Colombia (CENICAÑA), Cali, Colombia
| | | | | | | | - Andrés Fernando González Barrios
- Grupo de Diseño de Productos y Procesos, Department of Chemical and Food Engineering, Faculty of Engineering, Universidad de los Andes, Bogotá, Colombia
| | - José De Vega
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - John J. Riascos
- Centro de Investigación de la Caña de Azúcar de Colombia (CENICAÑA), Cali, Colombia
| |
Collapse
|
19
|
Parra-Salazar A, Gomez J, Lozano-Arce D, Reyes-Herrera PH, Duitama J. Robust and efficient software for reference-free genomic diversity analysis of genotyping-by-sequencing data on diploid and polyploid species. Mol Ecol Resour 2021; 22:439-454. [PMID: 34288487 DOI: 10.1111/1755-0998.13477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 07/08/2021] [Accepted: 07/13/2021] [Indexed: 12/14/2022]
Abstract
Genotyping-by-sequencing (GBS) is a widely used and cost-effective technique for obtaining large numbers of genetic markers from populations by sequencing regions adjacent to restriction cut sites. Although a standard reference-based pipeline can be followed to analyse GBS reads, a reference genome is still not available for a large number of species. Hence, reference-free approaches are required to generate the genetic variability information that can be obtained from a GBS experiment. Unfortunately, available tools to perform de novo analysis of GBS reads face issues of usability, accuracy and performance. Furthermore, few available tools are suitable for analysing data sets from polyploid species. In this manuscript, we describe a novel algorithm to perform reference-free variant detection and genotyping from GBS reads. Nonexact searches on a dynamic hash table of consensus sequences allow for efficient read clustering and sorting. This algorithm was integrated in the Next Generation Sequencing Experience Platform (NGSEP) to integrate the state-of-the-art variant detector already implemented in this tool. We performed benchmark experiments with three different empirical data sets of plants and animals with different population structures and ploidies, and sequenced with different GBS protocols at different read depths. These experiments show that NGSEP has comparable and in some cases better accuracy and always better computational efficiency compared to existing solutions. We expect that this new development will be useful for many research groups conducting population genetic studies in a wide variety of species.
Collapse
Affiliation(s)
- Andrea Parra-Salazar
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Gomez
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Daniela Lozano-Arce
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Jorge Duitama
- Department of Systems and Computing Engineering, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
20
|
Chacón-Sánchez MI, Martínez-Castillo J, Duitama J, Debouck DG. Gene Flow in Phaseolus Beans and Its Role as a Plausible Driver of Ecological Fitness and Expansion of Cultigens. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.618709] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The genus Phaseolus, native to the Americas, is composed of more than eighty wild species, five of which were domesticated in pre-Columbian times. Since the beginning of domestication events in this genus, ample opportunities for gene flow with wild relatives have existed. The present work reviews the extent of gene flow in the genus Phaseolus in primary and secondary areas of domestication with the aim of illustrating how this evolutionary force may have conditioned ecological fitness and the widespread adoption of cultigens. We focus on the biological bases of gene flow in the genus Phaseolus from a spatial and time perspective, the dynamics of wild-weedy-crop complexes in the common bean and the Lima bean, the two most important domesticated species of the genus, and the usefulness of genomic tools to detect inter and intraspecific introgression events. In this review we discuss the reproductive strategies of several Phaseolus species, the factors that may favor outcrossing rates and evidence suggesting that interspecific gene flow may increase ecological fitness of wild populations. We also show that wild-weedy-crop complexes generate genetic diversity over which farmers are able to select and expand their cultigens outside primary areas of domestication. Ultimately, we argue that more studies are needed on the reproductive biology of the genus Phaseolus since for most species breeding systems are largely unknown. We also argue that there is an urgent need to preserve wild-weedy-crop complexes and characterize the genetic diversity generated by them, in particular the genome-wide effects of introgressions and their value for breeding programs. Recent technological advances in genomics, coupled with agronomic characterizations, may make a large contribution.
Collapse
|
21
|
Lobaton J, Andrew R, Duitama J, Kirkland L, Macfadyen S, Rader R. Using RNA-seq to characterize pollen-stigma interactions for pollination studies. Sci Rep 2021; 11:6635. [PMID: 33758263 PMCID: PMC7988043 DOI: 10.1038/s41598-021-85887-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/08/2021] [Indexed: 11/18/2022] Open
Abstract
Insects are essential for the reproduction of pollinator-dependent crops and contribute to the pollination of 87% of wild plants and 75% of the world’s food crops. Understanding pollen flow dynamics between plants and pollinators is thus essential to manage and conserve wild plants and ensure yields are maximized in food crops. However, the determination of pollen transfer in the field is complex and laborious. We developed a field experiment in a pollinator-dependent crop and used high throughput RNA sequencing (RNA-seq) to quantify pollen flow by measuring changes in gene expression between pollination treatments across different apple (Malus domestica Borkh.) cultivars. We tested three potential molecular indicators of successful pollination and validated these results with field data by observing single and multiple visits by honey bees (Apis mellifera) to apple flowers and measured fruit set in a commercial apple orchard. The first indicator of successful outcrossing was revealed via differential gene expression in the cross-pollination treatments after 6 h. The second indicator of successful outcrossing was revealed by the expression of specific genes related to pollen tube formation and defense response at three different time intervals in the stigma and the style following cross-pollination (i.e. after 6, 24, and 48 h). Finally, genotyping variants specific to donor pollen could be detected in cross-pollination treatments, providing a third indicator of successful outcrossing. Field data indicated that one or five flower visits by honey bees were insufficient and at least 10 honey bee flower visits were required to achieve a 25% probability of fruit set under orchard conditions. By combining the genotyping data, the differential expression analysis, and the traditional fruit set field experiments, it was possible to evaluate the pollination effectiveness of honey bee visits under orchards conditions. This is the first time that pollen-stigma-style mRNA expression analysis has been conducted after a pollinator visit (honey bee) to a plant (in vivo apple flowers). This study provides evidence that mRNA sequencing can be used to address complex questions related to stigma–pollen interactions over time in pollination ecology.
Collapse
Affiliation(s)
- Juan Lobaton
- School of Environmental and Rural Science, University of New England, Armidale, Australia. .,CSIRO, Clunies Ross St., Acton, ACT, Australia.
| | - Rose Andrew
- School of Environmental and Rural Science, University of New England, Armidale, Australia
| | - Jorge Duitama
- Systems and Computing, Engineering Department, Universidad de Los Andes, Bogota, Colombia
| | - Lindsey Kirkland
- School of Environmental and Rural Science, University of New England, Armidale, Australia
| | | | - Romina Rader
- School of Environmental and Rural Science, University of New England, Armidale, Australia
| |
Collapse
|
22
|
Deparis Q, Duitama J, Foulquié-Moreno MR, Thevelein JM. Whole-Genome Transformation Promotes tRNA Anticodon Suppressor Mutations under Stress. mBio 2021; 12:e03649-20. [PMID: 33758086 PMCID: PMC8092322 DOI: 10.1128/mbio.03649-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 02/16/2021] [Indexed: 11/20/2022] Open
Abstract
tRNAs are encoded by a large gene family, usually with several isogenic tRNAs interacting with the same codon. Mutations in the anticodon region of other tRNAs can overcome specific tRNA deficiencies. Phylogenetic analysis suggests that such mutations have occurred in evolution, but the driving force is unclear. We show that in yeast suppressor mutations in other tRNAs are able to overcome deficiency of the essential TRT2-encoded tRNAThrCGU at high temperature (40°C). Surprisingly, these tRNA suppressor mutations were obtained after whole-genome transformation with DNA from thermotolerant Kluyveromyces marxianus or Ogataea polymorpha strains but from which the mutations did apparently not originate. We suggest that transient presence of donor DNA in the host facilitates proliferation at high temperature and thus increases the chances for occurrence of spontaneous mutations suppressing defective growth at high temperature. Whole-genome sequence analysis of three transformants revealed only four to five nonsynonymous mutations of which one causing TRT2 anticodon stem stabilization and two anticodon mutations in non-threonyl-tRNAs, tRNALysCUU and tRNAeMetCAU, were causative. Both anticodon mutations suppressed lethality of TRT2 deletion and apparently caused the respective tRNAs to become novel substrates for threonyl-tRNA synthetase. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) data could not detect any significant mistranslation, and reverse transcription-quantitative PCR results contradicted induction of the unfolded protein response. We suggest that stress conditions have been a driving force in evolution for the selection of anticodon-switching mutations in tRNAs as revealed by phylogenetic analysis.IMPORTANCE In this work, we have identified for the first time the causative elements in a eukaryotic organism introduced by applying whole-genome transformation and responsible for the selectable trait of interest, i.e., high temperature tolerance. Surprisingly, the whole-genome transformants contained just a few single nucleotide polymorphisms (SNPs), which were unrelated to the sequence of the donor DNA. In each of three independent transformants, we have identified a SNP in a tRNA, either stabilizing the essential tRNAThrCGU at high temperature or switching the anticodon of tRNALysCUU or tRNAeMetCAU into CGU, which is apparently enough for in vivo recognition by threonyl-tRNA synthetase. LC-MS/MS analysis indeed indicated absence of significant mistranslation. Phylogenetic analysis showed that similar mutations have occurred throughout evolution and we suggest that stress conditions may have been a driving force for their selection. The low number of SNPs introduced by whole-genome transformation may favor its application for improvement of industrial yeast strains.
Collapse
Affiliation(s)
- Quinten Deparis
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Belgium
- Center for Microbiology, VIB, Leuven-Heverlee, Flanders, Belgium
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Maria R Foulquié-Moreno
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Belgium
- Center for Microbiology, VIB, Leuven-Heverlee, Flanders, Belgium
| | - Johan M Thevelein
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Belgium
- Center for Microbiology, VIB, Leuven-Heverlee, Flanders, Belgium
- NovelYeast bv, Open Bio-Incubator, Erasmus High School, Brussels (Jette), Belgium
| |
Collapse
|
23
|
Gil J, Andrade-Martínez JS, Duitama J. Accurate, Efficient and User-Friendly Mutation Calling and Sample Identification for TILLING Experiments. Front Genet 2021; 12:624513. [PMID: 33613641 PMCID: PMC7886796 DOI: 10.3389/fgene.2021.624513] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 01/08/2021] [Indexed: 11/13/2022] Open
Abstract
TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful reverse genetics method in plant functional genomics and breeding to identify mutagenized individuals with improved behavior for a trait of interest. Pooled high throughput sequencing (HTS) of the targeted genes allows efficient identification and sample assignment of variants within genes of interest in hundreds of individuals. Although TILLING has been used successfully in different crops and even applied to natural populations, one of the main issues for a successful TILLING experiment is that most currently available bioinformatics tools for variant detection are not designed to identify mutations with low frequencies in pooled samples or to perform sample identification from variants identified in overlapping pools. Our research group maintains the Next Generation Sequencing Experience Platform (NGSEP), an open source solution for analysis of HTS data. In this manuscript, we present three novel components within NGSEP to facilitate the design and analysis of TILLING experiments: a pooled variants detector, a sample identifier from variants detected in overlapping pools and a simulator of TILLING experiments. A new implementation of the NGSEP calling model for variant detection allows accurate detection of low frequency mutations within pools. The samples identifier implements the process to triangulate the mutations called within overlapping pools in order to assign mutations to single individuals whenever possible. Finally, we developed a complete simulator of TILLING experiments to enable benchmarking of different tools and to facilitate the design of experimental alternatives varying the number of pools and individuals per pool. Simulation experiments based on genes from the common bean genome indicate that NGSEP provides similar accuracy and better efficiency than other tools to perform pooled variants detection. To the best of our knowledge, NGSEP is currently the only tool that generates individual assignments of the mutations discovered from the pooled data. We expect that this development will be of great use for different groups implementing TILLING as an alternative for plant breeding and even to research groups performing pooled sequencing for other applications.
Collapse
Affiliation(s)
- Juanita Gil
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá, Colombia
| | - Juan Sebastian Andrade-Martínez
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de Los Andes, Bogotá, Colombia.,Max Planck Tandem Group in Computational Biology, Universidad de Los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá, Colombia
| |
Collapse
|
24
|
Garcia T, Duitama J, Zullo SS, Gil J, Ariani A, Dohle S, Palkovic A, Skeen P, Bermudez-Santana CI, Debouck DG, Martínez-Castillo J, Gepts P, Chacón-Sánchez MI. Comprehensive genomic resources related to domestication and crop improvement traits in Lima bean. Nat Commun 2021; 12:702. [PMID: 33514713 PMCID: PMC7846787 DOI: 10.1038/s41467-021-20921-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 12/22/2020] [Indexed: 01/30/2023] Open
Abstract
Lima bean (Phaseolus lunatus L.), one of the five domesticated Phaseolus bean crops, shows a wide range of ecological adaptations along its distribution range from Mexico to Argentina. These adaptations make it a promising crop for improving food security under predicted scenarios of climate change in Latin America and elsewhere. In this work, we combine long and short read sequencing technologies with a dense genetic map from a biparental population to obtain the chromosome-level genome assembly for Lima bean. Annotation of 28,326 gene models show high diversity among 1917 genes with conserved domains related to disease resistance. Structural comparison across 22,180 orthologs with common bean reveals high genome synteny and five large intrachromosomal rearrangements. Population genomic analyses show that wild Lima bean is organized into six clusters with mostly non-overlapping distributions and that Mesomerican landraces can be further subdivided into three subclusters. RNA-seq data reveal 4275 differentially expressed genes, which can be related to pod dehiscence and seed development. We expect the resources presented here to serve as a solid basis to achieve a comprehensive view of the degree of convergent evolution of Phaseolus species under domestication and provide tools and information for breeding for climate change resiliency.
Collapse
Affiliation(s)
- Tatiana Garcia
- grid.10689.360000 0001 0286 3748Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia ,grid.17088.360000 0001 2150 1785Present Address: Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI USA
| | - Jorge Duitama
- grid.7247.60000000419370714Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Stephanie Smolenski Zullo
- grid.27860.3b0000 0004 1936 9684Department of Plant Sciences/MS1, University of California, Davis, CA USA
| | - Juanita Gil
- grid.7247.60000000419370714Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia ,grid.411017.20000 0001 2151 0999Present Address: Department of Entomology and Plant Pathology, University of Arkansas, Fayetteville, AR USA
| | - Andrea Ariani
- grid.27860.3b0000 0004 1936 9684Department of Plant Sciences/MS1, University of California, Davis, CA USA ,Present Address: BASF BBCC - Innovation Center, Gent, Belgium
| | - Sarah Dohle
- grid.27860.3b0000 0004 1936 9684Department of Plant Sciences/MS1, University of California, Davis, CA USA
| | - Antonia Palkovic
- grid.27860.3b0000 0004 1936 9684Department of Plant Sciences/MS1, University of California, Davis, CA USA
| | - Paola Skeen
- grid.10689.360000 0001 0286 3748Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia ,Present Address: Nunhems USA, Vegetable Seeds BASF, Acampo, CA USA
| | - Clara Isabel Bermudez-Santana
- grid.10689.360000 0001 0286 3748Departamento de Biología, Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Daniel G. Debouck
- grid.418348.20000 0001 0943 556XCentro Internacional de Agricultura Tropical, Cali, Colombia
| | - Jaime Martínez-Castillo
- grid.418270.80000 0004 0428 7635Centro de Investigación Científica de Yucatán, Yucatán, Mexico
| | - Paul Gepts
- grid.27860.3b0000 0004 1936 9684Department of Plant Sciences/MS1, University of California, Davis, CA USA
| | - Maria Isabel Chacón-Sánchez
- grid.10689.360000 0001 0286 3748Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia
| |
Collapse
|
25
|
Diaz S, Ariza-Suarez D, Izquierdo P, Lobaton JD, de la Hoz JF, Acevedo F, Duitama J, Guerrero AF, Cajiao C, Mayor V, Beebe SE, Raatz B. Genetic mapping for agronomic traits in a MAGIC population of common bean (Phaseolus vulgaris L.) under drought conditions. BMC Genomics 2020; 21:799. [PMID: 33198642 PMCID: PMC7670608 DOI: 10.1186/s12864-020-07213-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 11/05/2020] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Common bean is an important staple crop in the tropics of Africa, Asia and the Americas. Particularly smallholder farmers rely on bean as a source for calories, protein and micronutrients. Drought is a major production constraint for common bean, a situation that will be aggravated with current climate change scenarios. In this context, new tools designed to understand the genetic basis governing the phenotypic responses to abiotic stress are required to improve transfer of desirable traits into cultivated beans. RESULTS A multiparent advanced generation intercross (MAGIC) population of common bean was generated from eight Mesoamerican breeding lines representing the phenotypic and genotypic diversity of the CIAT Mesoamerican breeding program. This population was assessed under drought conditions in two field trials for yield, 100 seed weight, iron and zinc accumulation, phenology and pod harvest index. Transgressive segregation was observed for most of these traits. Yield was positively correlated with yield components and pod harvest index (PHI), and negative correlations were found with phenology traits and micromineral contents. Founder haplotypes in the population were identified using Genotyping by Sequencing (GBS). No major population structure was observed in the population. Whole Genome Sequencing (WGS) data from the founder lines was used to impute genotyping data for GWAS. Genetic mapping was carried out with two methods, using association mapping with GWAS, and linkage mapping with haplotype-based interval screening. Thirteen high confidence QTL were identified using both methods and several QTL hotspots were found controlling multiple traits. A major QTL hotspot located on chromosome Pv01 for phenology traits and yield was identified. Further hotspots affecting several traits were observed on chromosomes Pv03 and Pv08. A major QTL for seed Fe content was contributed by MIB778, the founder line with highest micromineral accumulation. Based on imputed WGS data, candidate genes are reported for the identified major QTL, and sequence changes were identified that could cause the phenotypic variation. CONCLUSIONS This work demonstrates the importance of this common bean MAGIC population for genetic mapping of agronomic traits, to identify trait associations for molecular breeding tool design and as a new genetic resource for the bean research community.
Collapse
Affiliation(s)
- Santiago Diaz
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Daniel Ariza-Suarez
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Paulo Izquierdo
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Present Address: Department of Plant Soil and Microbial Sciences, Michigan State University, East Lansing, MI, USA
| | - Juan David Lobaton
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Present Address: School of Environmental and Rural Sciences, University of New England, Armidale, SA, Australia
| | - Juan Fernando de la Hoz
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Present Address: Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Fernando Acevedo
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Bogotá, Colombia
| | - Jorge Duitama
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Present Address: Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Alberto F Guerrero
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Cesar Cajiao
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Victor Mayor
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Present Address: Progeny Breeding, Madrid, Colombia
| | - Stephen E Beebe
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Bodo Raatz
- Bean Program, Agrobiodiversity Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia.
| |
Collapse
|
26
|
Gil J, Herrera M, Duitama J, Sarria G, Restrepo S, Romero HM. Genomic Variability of Phytophthora palmivora Isolates from Different Oil Palm Cultivation Regions in Colombia. Phytopathology 2020; 110:1553-1564. [PMID: 32314947 DOI: 10.1094/phyto-06-19-0209-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Palm oil is the most consumed vegetable oil globally, and Colombia is the largest palm oil producer in South America and fourth worldwide. However, oil palm plantations in Colombia are affected by bud rot disease caused by the oomycete Phytophthora palmivora, leading to significant economic losses. Infection processes by plant pathogens involve the secretion of effector molecules, which alter the functioning or structure of host cells. Current long-read sequencing technologies provide the information needed to produce high-quality genome assemblies, enabling a comprehensive annotation of effectors. Here, we describe the development of genomic resources for P. palmivora, including a high-quality genome assembly based on long and short-read sequencing data, intraspecies variability for 12 isolates from different oil palm cultivation regions in Colombia, and a catalog of over 1,000 candidate effector proteins. A total of 45,416 genes were annotated from the new genome assembled in 2,322 contigs adding to 165.5 Mbp, which represents an improvement of two times more gene models, 33 times better contiguity, and 11 times less fragmentation compared with currently available genomic resources for the species. Analysis of nucleotide evolution in paralogs suggests a recent whole-genome duplication event. Genetic differences were identified among isolates showing variable virulence levels. We expect that these novel genomic resources contribute to the characterization of the species and the understanding of the interaction of P. palmivora with oil palm and could be further exploited as tools for the development of effective strategies for disease control.
Collapse
Affiliation(s)
- Juanita Gil
- Biology and Breeding Program, Colombian Oil Palm Research Center, Cenipalma, Calle 98 No. 70-91, Piso 14, 111121, Bogotá, Colombia
- Systems and Computing Department, Universidad de Los Andes, Carrera 1 No. 18A-12, 111711, Bogotá, Colombia
- Biological Sciences Department, Universidad de Los Andes, Carrera 1 No. 18A-12, 111711, Bogotá, Colombia
| | - Mariana Herrera
- Biology and Breeding Program, Colombian Oil Palm Research Center, Cenipalma, Calle 98 No. 70-91, Piso 14, 111121, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Department, Universidad de Los Andes, Carrera 1 No. 18A-12, 111711, Bogotá, Colombia
| | - Greicy Sarria
- Pests and Diseases Program, Colombian Oil Palm Research Center, Cenipalma, Calle 98 No. 70-91, Piso 14, 111121, Bogotá, Colombia
| | - Silvia Restrepo
- Biological Sciences Department, Universidad de Los Andes, Carrera 1 No. 18A-12, 111711, Bogotá, Colombia
| | - Hernán Mauricio Romero
- Biology and Breeding Program, Colombian Oil Palm Research Center, Cenipalma, Calle 98 No. 70-91, Piso 14, 111121, Bogotá, Colombia
- Department of Biology, Universidad Nacional de Colombia, Carrera 45 No. 26-85, 111321, Bogotá, DC, Colombia
| |
Collapse
|
27
|
Tello D, Gil J, Loaiza CD, Riascos JJ, Cardozo N, Duitama J. NGSEP3: accurate variant calling across species and sequencing protocols. Bioinformatics 2020; 35:4716-4723. [PMID: 31099384 PMCID: PMC6853766 DOI: 10.1093/bioinformatics/btz275] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 03/16/2019] [Accepted: 04/17/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. RESULTS Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. AVAILABILITY AND IMPLEMENTATION NGSEP is available as open source software at http://ngsep.sf.net. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Tello
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia
| | - Juanita Gil
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia
| | - Cristian D Loaiza
- Biotechnology lab, Centro de Investigación de la caña de azúcar de Colombia, CENICAÑA, Cali 760046, Colombia
- Present address: Department of Plants, Soils, and Climate, Utah State University, Logan, UT, USA
| | - John J Riascos
- Biotechnology lab, Centro de Investigación de la caña de azúcar de Colombia, CENICAÑA, Cali 760046, Colombia
| | - Nicolás Cardozo
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia
- Agrobiodiversity Research Area, International Center for Tropical Agriculture, Cali 763537, Colombia
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
28
|
Vásquez AF, Reyes Muñoz A, Duitama J, González Barrios A. Discovery of new potential CDK2/VEGFR2 type II inhibitors by fragmentation and virtual screening of natural products. J Biomol Struct Dyn 2020; 39:3285-3299. [DOI: 10.1080/07391102.2020.1763839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Andrés Felipe Vásquez
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical Engineering, Universidad de los Andes, Bogotá, Colombia
| | - Alejandro Reyes Muñoz
- Grupo de Biología Computacional Ecología Microbiana (BCEM), Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Andrés González Barrios
- Grupo de Diseño de Productos y Procesos (GDPP), Department of Chemical Engineering, Universidad de los Andes, Bogotá, Colombia
| |
Collapse
|
29
|
Fuentes RR, Chebotarov D, Duitama J, Smith S, De la Hoz JF, Mohiyuddin M, Wing RA, McNally KL, Tatarinova T, Grigoriev A, Mauleon R, Alexandrov N. Structural variants in 3000 rice genomes. Genome Res 2019; 29:870-880. [PMID: 30992303 PMCID: PMC6499320 DOI: 10.1101/gr.241240.118] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 03/11/2019] [Indexed: 12/24/2022]
Abstract
Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5′ UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.
Collapse
Affiliation(s)
- Roven Rommel Fuentes
- International Rice Research Institute, Laguna 4031, Philippines.,Bioinformatics Group, Wageningen University and Research, 6708 PB Wageningen, the Netherlands
| | | | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia.,Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali 6713, Colombia
| | - Sean Smith
- Biology Department, Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA
| | - Juan Fernando De la Hoz
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali 6713, Colombia
| | | | - Rod A Wing
- International Rice Research Institute, Laguna 4031, Philippines.,Arizona Genomics Institute, University of Arizona, Tucson, Arizona 85721, USA.,King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | | | - Tatiana Tatarinova
- Department of Biology, University of La Verne, La Verne, California 91750, USA.,Vavilov Institute of General Genetics, Moscow 119333, Russia.,A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow 127051, Russia.,Laboratory of Forest Genomics, Siberian Federal University, Krasnoyarsk 660041, Russia
| | - Andrey Grigoriev
- Biology Department, Center for Computational and Integrative Biology, Rutgers University, Camden, New Jersey 08102, USA
| | - Ramil Mauleon
- International Rice Research Institute, Laguna 4031, Philippines
| | | |
Collapse
|
30
|
Worthington M, Ebina M, Yamanaka N, Heffelfinger C, Quintero C, Zapata YP, Perez JG, Selvaraj M, Ishitani M, Duitama J, de la Hoz JF, Rao I, Dellaporta S, Tohme J, Arango J. Translocation of a parthenogenesis gene candidate to an alternate carrier chromosome in apomictic Brachiaria humidicola. BMC Genomics 2019. [PMID: 30642244 DOI: 10.1186/s12864-018-5392-5394] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND The apomictic reproductive mode of Brachiaria (syn. Urochloa) forage species allows breeders to faithfully propagate heterozygous genotypes through seed over multiple generations. In Brachiaria, reproductive mode segregates as single dominant locus, the apospory-specific genomic region (ASGR). The AGSR has been mapped to an area of reduced recombination on Brachiaria decumbens chromosome 5. A primer pair designed within ASGR-BABY BOOM-like (BBML), the candidate gene for the parthenogenesis component of apomixis in Pennisetum squamulatum, was diagnostic for reproductive mode in the closely related species B. ruziziensis, B. brizantha, and B. decumbens. In this study, we used a mapping population of the distantly related commercial species B. humidicola to map the ASGR and test for conservation of ASGR-BBML sequences across Brachiaria species. RESULTS Dense genetic maps were constructed for the maternal and paternal genomes of a hexaploid (2n = 6x = 36) B. humidicola F1 mapping population (n = 102) using genotyping-by-sequencing, simple sequence repeat, amplified fragment length polymorphism, and transcriptome derived single nucleotide polymorphism markers. Comparative genomics with Setaria italica provided confirmation for x = 6 as the base chromosome number of B. humidicola. High resolution molecular karyotyping indicated that the six homologous chromosomes of the sexual female parent paired at random, whereas preferential pairing of subgenomes was observed in the apomictic male parent. Furthermore, evidence for compensated aneuploidy was found in the apomictic parent, with only five homologous linkage groups identified for chromosome 5 and seven homologous linkage groups of chromosome 6. The ASGR mapped to B. humidicola chromosome 1, a region syntenic with chromosomes 1 and 7 of S. italica. The ASGR-BBML specific PCR product cosegregated with the ASGR in the F1 mapping population, despite its location on a different carrier chromosome than B. decumbens. CONCLUSIONS The first dense molecular maps of B. humidicola provide strong support for cytogenetic evidence indicating a base chromosome number of six in this species. Furthermore, these results show conservation of the ASGR across the Paniceae in different chromosomal backgrounds and support postulation of the ASGR-BBML as candidate genes for the parthenogenesis component of apomixis.
Collapse
Affiliation(s)
- Margaret Worthington
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia.
- Present address: Department of Horticulture, University of Arkansas, 306 Plant Sciences Bldg, Fayetteville, AR, 72701, USA.
| | - Masumi Ebina
- National Agriculture and Food Research Organization (NARO), Institute of Livestock and Grassland Science, Nasushiobara, Tochigi, 392-2793, Japan
| | - Naoki Yamanaka
- Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki, 305-8686, Japan
| | - Christopher Heffelfinger
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, 06520, USA
| | - Constanza Quintero
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | | | | | - Michael Selvaraj
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Manabu Ishitani
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Jorge Duitama
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
- Present address: Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Juan Fernando de la Hoz
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
- Present address: Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Idupulapati Rao
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
- Present address: Plant Polymer Research Unit (PPL), National Center for Agricultural Utilization Research (NCAUR), Agricultural Research Service, United States Department of Agriculture (ARS-USDA), 1815 N. University St., Peoria, IL, 61604, USA
| | - Stephen Dellaporta
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, 06520, USA
| | - Joe Tohme
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Jacobo Arango
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| |
Collapse
|
31
|
Worthington M, Ebina M, Yamanaka N, Heffelfinger C, Quintero C, Zapata YP, Perez JG, Selvaraj M, Ishitani M, Duitama J, de la Hoz JF, Rao I, Dellaporta S, Tohme J, Arango J. Translocation of a parthenogenesis gene candidate to an alternate carrier chromosome in apomictic Brachiaria humidicola. BMC Genomics 2019; 20:41. [PMID: 30642244 PMCID: PMC6332668 DOI: 10.1186/s12864-018-5392-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Accepted: 12/18/2018] [Indexed: 12/05/2022] Open
Abstract
Background The apomictic reproductive mode of Brachiaria (syn. Urochloa) forage species allows breeders to faithfully propagate heterozygous genotypes through seed over multiple generations. In Brachiaria, reproductive mode segregates as single dominant locus, the apospory-specific genomic region (ASGR). The AGSR has been mapped to an area of reduced recombination on Brachiaria decumbens chromosome 5. A primer pair designed within ASGR-BABY BOOM-like (BBML), the candidate gene for the parthenogenesis component of apomixis in Pennisetum squamulatum, was diagnostic for reproductive mode in the closely related species B. ruziziensis, B. brizantha, and B. decumbens. In this study, we used a mapping population of the distantly related commercial species B. humidicola to map the ASGR and test for conservation of ASGR-BBML sequences across Brachiaria species. Results Dense genetic maps were constructed for the maternal and paternal genomes of a hexaploid (2n = 6x = 36) B. humidicola F1 mapping population (n = 102) using genotyping-by-sequencing, simple sequence repeat, amplified fragment length polymorphism, and transcriptome derived single nucleotide polymorphism markers. Comparative genomics with Setaria italica provided confirmation for x = 6 as the base chromosome number of B. humidicola. High resolution molecular karyotyping indicated that the six homologous chromosomes of the sexual female parent paired at random, whereas preferential pairing of subgenomes was observed in the apomictic male parent. Furthermore, evidence for compensated aneuploidy was found in the apomictic parent, with only five homologous linkage groups identified for chromosome 5 and seven homologous linkage groups of chromosome 6. The ASGR mapped to B. humidicola chromosome 1, a region syntenic with chromosomes 1 and 7 of S. italica. The ASGR-BBML specific PCR product cosegregated with the ASGR in the F1 mapping population, despite its location on a different carrier chromosome than B. decumbens. Conclusions The first dense molecular maps of B. humidicola provide strong support for cytogenetic evidence indicating a base chromosome number of six in this species. Furthermore, these results show conservation of the ASGR across the Paniceae in different chromosomal backgrounds and support postulation of the ASGR-BBML as candidate genes for the parthenogenesis component of apomixis. Electronic supplementary material The online version of this article (10.1186/s12864-018-5392-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Margaret Worthington
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia. .,Present address: Department of Horticulture, University of Arkansas, 306 Plant Sciences Bldg, Fayetteville, AR, 72701, USA.
| | - Masumi Ebina
- National Agriculture and Food Research Organization (NARO), Institute of Livestock and Grassland Science, Nasushiobara, Tochigi, 392-2793, Japan
| | - Naoki Yamanaka
- Japan International Research Center for Agricultural Sciences (JIRCAS), 1-1 Ohwashi, Tsukuba, Ibaraki, 305-8686, Japan
| | - Christopher Heffelfinger
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, 06520, USA
| | - Constanza Quintero
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | | | | | - Michael Selvaraj
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Manabu Ishitani
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Jorge Duitama
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia.,Present address: Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Juan Fernando de la Hoz
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia.,Present address: Bioinformatics Interdepartmental Ph.D. Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Idupulapati Rao
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia.,Present address: Plant Polymer Research Unit (PPL), National Center for Agricultural Utilization Research (NCAUR), Agricultural Research Service, United States Department of Agriculture (ARS-USDA), 1815 N. University St., Peoria, IL, 61604, USA
| | - Stephen Dellaporta
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, 06520, USA
| | - Joe Tohme
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Jacobo Arango
- International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| |
Collapse
|
32
|
Lobaton JD, Miller T, Gil J, Ariza D, de la Hoz JF, Soler A, Beebe S, Duitama J, Gepts P, Raatz B. Resequencing of Common Bean Identifies Regions of Inter-Gene Pool Introgression and Provides Comprehensive Resources for Molecular Breeding. Plant Genome 2018; 11. [PMID: 30025029 DOI: 10.3835/plantgenome2017.08.0068] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Common bean ( L.) is the most important grain legume for human consumption and is a major nutrition source in the tropics. Because bean production is reduced by both abiotic and biotic constraints, current breeding efforts are focused on the development of improved varieties with tolerance to these stresses. We characterized materials from different breeding programs spanning three continents to understand their sequence diversity and advance the development of molecular breeding tools. For this, 37 varieties belonging to , (A. Gray), and L. were sequenced by whole-genome sequencing, identifying more than 40 million genomic variants. Evaluation of nuclear DNA content and analysis of copy number variation revealed important differences in genomic content not only between and the two other domesticated species, but also within , affecting hundreds of protein-coding genomic regions. A large number of inter-gene pool introgressions were identified. Furthermore, interspecific introgressions for disease resistance in breeding lines were mapped. Evaluation of newly developed single nucleotide polymorphism markers within previously discovered quantitative trait loci for common bacterial blight and angular leaf spot provides improved specificity to tag sources of resistance to these diseases. We expect that this dataset will provide a deeper molecular understanding of breeding germplasm and deliver molecular tools for germplasm development, aiming to increase the efficiency of bean breeding programs.
Collapse
|
33
|
Duitama J, Kafuri L, Tello D, Leiva AM, Hofinger B, Datta S, Lentini Z, Aranzales E, Till B, Ceballos H. Deep Assessment of Genomic Diversity in Cassava for Herbicide Tolerance and Starch Biosynthesis. Comput Struct Biotechnol J 2017; 15:185-194. [PMID: 28179981 PMCID: PMC5295625 DOI: 10.1016/j.csbj.2017.01.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 12/26/2016] [Accepted: 01/10/2017] [Indexed: 12/16/2022] Open
Abstract
Cassava is one of the most important food security crops in tropical countries, and a competitive resource for the starch, food, feed and ethanol industries. However, genomics research in this crop is much less developed compared to other economically important crops such as rice or maize. The International Center for Tropical Agriculture (CIAT) maintains the largest cassava germplasm collection in the world. Unfortunately, the genetic potential of this diversity for breeding programs remains underexploited due to the difficulties in phenotypic screening and lack of deep genomic information about the different accessions. A chromosome-level assembly of the cassava reference genome was released this year and only a handful of studies have been made, mainly to find quantitative trait loci (QTL) on breeding populations with limited variability. This work presents the results of pooled targeted resequencing of more than 1500 cassava accessions from the CIAT germplasm collection to obtain a dataset of more than 2000 variants within genes related to starch functional properties and herbicide tolerance. Results of twelve bioinformatic pipelines for variant detection in pooled samples were compared to ensure the quality of the variant calling process. Predictions of functional impact were performed using two separate methods to prioritize interesting variation for genotyping and cultivar selection. Targeted resequencing, either by pooled samples or by similar approaches such as Ecotilling or capture, emerges as a cost effective alternative to whole genome sequencing to identify interesting alleles of genes related to relevant traits within large germplasm collections.
Collapse
Affiliation(s)
- Jorge Duitama
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
- Systems and Computing Engineering Department, Universidad de los Andes, Bogotá, Colombia
| | - Lina Kafuri
- Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Division, International Atomic Energy Agency, Seibersdorf, Austria
- Department of Biological Sciences, School of Natural Sciences, Universidad Icesi, Cali, Colombia
| | - Daniel Tello
- Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Division, International Atomic Energy Agency, Seibersdorf, Austria
- Department of Biological Sciences, School of Natural Sciences, Universidad Icesi, Cali, Colombia
| | - Ana María Leiva
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Bernhard Hofinger
- Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Division, International Atomic Energy Agency, Seibersdorf, Austria
| | - Sneha Datta
- Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Division, International Atomic Energy Agency, Seibersdorf, Austria
| | - Zaida Lentini
- Department of Biological Sciences, School of Natural Sciences, Universidad Icesi, Cali, Colombia
| | - Ericson Aranzales
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Bradley Till
- Plant Breeding and Genetics Laboratory, Joint FAO/IAEA Division, International Atomic Energy Agency, Seibersdorf, Austria
| | - Hernán Ceballos
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|
34
|
Ho PW, Swinnen S, Duitama J, Nevoigt E. The sole introduction of two single-point mutations establishes glycerol utilization in Saccharomyces cerevisiae CEN.PK derivatives. Biotechnol Biofuels 2017; 10:10. [PMID: 28053667 PMCID: PMC5209837 DOI: 10.1186/s13068-016-0696-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Accepted: 12/23/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND Glycerol is an abundant by-product of biodiesel production and has several advantages as a substrate in biotechnological applications. Unfortunately, the popular production host Saccharomyces cerevisiae can barely metabolize glycerol by nature. RESULTS In this study, two evolved derivatives of the strain CEN.PK113-1A were created that were able to grow in synthetic glycerol medium (strains PW-1 and PW-2). Their growth performances on glycerol were compared with that of the previously published evolved CEN.PK113-7D derivative JL1. As JL1 showed a higher maximum specific growth rate on glycerol (0.164 h-1 compared to 0.119 h-1 for PW-1 and 0.127 h-1 for PW-2), its genomic DNA was subjected to whole-genome resequencing. Two point mutations in the coding sequences of the genes UBR2 and GUT1 were identified to be crucial for growth in synthetic glycerol medium and subsequently verified by reverse engineering of the wild-type strain CEN.PK113-7D. The growth rate of the resulting reverse-engineered strain was 0.130 h-1. Sanger sequencing of the GUT1 and UBR2 alleles of the above-mentioned evolved strains PW-1 and PW-2 also revealed one single-point mutation in these two genes, and both mutations were demonstrated to be also crucial and sufficient for obtaining a maximum specific growth rate on glycerol of ~0.120 h-1. CONCLUSIONS The current work confirmed the importance of UBR2 and GUT1 as targets for establishing glycerol utilization in strains of the CEN.PK family. In addition, it shows that a growth rate on glycerol of 0.130 h-1 can be established in reverse-engineered CEN.PK strains by solely replacing a single amino acid in the coding sequences of both Ubr2 and Gut1.
Collapse
Affiliation(s)
- Ping-Wei Ho
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| | - Steve Swinnen
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Cra 1 Este No 19A-40, Bogotá, Colombia
| | - Elke Nevoigt
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
35
|
Suk EK, Schulz S, Mentrup B, Huebsch T, Duitama J, Hoehe MR. A Fosmid Pool-Based Next Generation Sequencing Approach to Haplotype-Resolve Whole Genomes. Methods Mol Biol 2017; 1551:223-269. [PMID: 28138850 DOI: 10.1007/978-1-4939-6750-6_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Haplotype resolution of human genomes is essential to describe and interpret genetic variation and its impact on biology and disease. Our approach to haplotyping relies on converting genomic DNA into a fosmid library, which represents the entire diploid genome as a collection of haploid DNA clones of ~40 kb in size. These can be partitioned into pools such that the probability that the same pool contains both parental haplotypes is reduced to ~1 %. This is the key principle of this method, allowing entire pools of fosmids to be massively parallel sequenced, yielding haploid sequence output. Here, we present a detailed protocol for fosmid pool-based next generation sequencing to haplotype-resolve whole genomes including the following steps: (1) generation of high molecular weight DNA fragments of ~40 kb in size from genomic DNA; (2) fosmid cloning and partitioning into 96-well plates; (3) barcoded sequencing library preparation from fosmid pools for next generation sequencing; and (4) computational analysis of fosmid sequences and assembly into contiguous haploid sequences.This method can be used in combination with, but also without, whole genome shotgun sequencing to extensively resolve heterozygous SNPs and structural variants within genomic regions, resulting in haploid contigs of several hundred kb up to several Mb. This method has a broad range of applications including population and ancestry genetics, the clinical interpretation of mutations in personal genomes, the analysis of cancer genomes and highly complex disease gene regions such as MHC. Moreover, haplotype-resolved genome sequencing allows description and interpretation of the diploid nature of genome biology, for example through the analysis of haploid gene forms and allele-specific phenomena. Application of this method has enabled the production of most of the molecular haplotype-resolved genomes reported to date.
Collapse
Affiliation(s)
- Eun-Kyung Suk
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany
| | - Sabrina Schulz
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany
| | - Birgit Mentrup
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany
| | - Thomas Huebsch
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany
| | - Jorge Duitama
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany
- International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Margret R Hoehe
- Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195, Berlin, Germany.
| |
Collapse
|
36
|
Perea C, De La Hoz JF, Cruz DF, Lobaton JD, Izquierdo P, Quintero JC, Raatz B, Duitama J. Bioinformatic analysis of genotype by sequencing (GBS) data with NGSEP. BMC Genomics 2016; 17 Suppl 5:498. [PMID: 27585926 PMCID: PMC5009557 DOI: 10.1186/s12864-016-2827-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Therecent development and availability of different genotype by sequencing (GBS) protocols provided a cost-effective approach to perform high-resolution genomic analysis of entire populations in different species. The central component of all these protocols is the digestion of the initial DNA with known restriction enzymes, to generate sequencing fragments at predictable and reproducible sites. This allows to genotype thousands of genetic markers on populations with hundreds of individuals. Because GBS protocols achieve parallel genotyping through high throughput sequencing (HTS), every GBS protocol must include a bioinformatics pipeline for analysis of HTS data. Our bioinformatics group recently developed the Next Generation Sequencing Eclipse Plugin (NGSEP) for accurate, efficient, and user-friendly analysis of HTS data. Results Here we present the latest functionalities implemented in NGSEP in the context of the analysis of GBS data. We implemented a one step wizard to perform parallel read alignment, variants identification and genotyping from HTS reads sequenced from entire populations. We added different filters for variants, samples and genotype calls as well as calculation of summary statistics overall and per sample, and diversity statistics per site. NGSEP includes a module to translate genotype calls to some of the most widely used input formats for integration with several tools to perform downstream analyses such as population structure analysis, construction of genetic maps, genetic mapping of complex traits and phenotype prediction for genomic selection. We assessed the accuracy of NGSEP on two highly heterozygous F1 cassava populations and on an inbred common bean population, and we showed that NGSEP provides similar or better accuracy compared to other widely used software packages for variants detection such as GATK, Samtools and Tassel. Conclusions NGSEP is a powerful, accurate and efficient bioinformatics software tool for analysis of HTS data, and also one of the best bioinformatic packages to facilitate the analysis and to maximize the genomic variability information that can be obtained from GBS experiments for population genomics. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2827-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Claudia Perea
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia
| | - Juan Fernando De La Hoz
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia
| | - Daniel Felipe Cruz
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, 9052, Belgium
| | - Juan David Lobaton
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia
| | - Paulo Izquierdo
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia
| | - Juan Camilo Quintero
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia.,Gerencia de Procesos, Centro Médico Imbanaco, Cali, 760033, Colombia
| | - Bodo Raatz
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia
| | - Jorge Duitama
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, 763537, Colombia.
| |
Collapse
|
37
|
Vervoort Y, Herrera-Malaver B, Mertens S, Guadalupe Medina V, Duitama J, Michiels L, Derdelinckx G, Voordeckers K, Verstrepen KJ. Characterization of the recombinant Brettanomyces anomalus β-glucosidase and its potential for bioflavouring. J Appl Microbiol 2016; 121:721-33. [PMID: 27277532 PMCID: PMC6680314 DOI: 10.1111/jam.13200] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Revised: 04/11/2016] [Accepted: 06/03/2016] [Indexed: 01/20/2023]
Abstract
AIM Plant materials used in the food industry contain up to five times more aromas bound to glucose (glucosides) than free, unbound aromas, making these bound aromas an unused flavouring potential. The aim of this study was to identify and purify a novel β-glucosidase from Brettanomyces yeasts that are capable of releasing bound aromas present in various food products. METHODS AND RESULTS We screened 428 different yeast strains for β-glucosidase activity and are the first to sequence the whole genome of two Brettanomyces yeasts (Brettanomyces anomalus and Brettanomyces bruxellensis) with exceptionally high β-glucosidase activity. Heterologous expression and purification of the identified B. anomalus β-glucosidase showed that it has an optimal activity at a higher pH (5·75) and lower temperature (37°C) than commercial β-glucosidases. Adding this B. anomalus β-glucosidase to cherry beers and forest fruit milks resulted in increased amounts of benzyl alcohol, eugenol, linalool and methyl salicylate compared to Aspergillus niger and Almond glucosidase. CONCLUSIONS The newly identified B. anomalus β-glucosidase offers new possibilities for food bioflavouring. SIGNIFICANCE AND IMPACT OF THE STUDY This study is the first to sequence the B. anomalus genome and to identify the β-glucosidase-encoding genes of two Brettanomyces species, and reports a new bioflavouring enzyme.
Collapse
Affiliation(s)
- Y Vervoort
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - B Herrera-Malaver
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - S Mertens
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - V Guadalupe Medina
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - J Duitama
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - L Michiels
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - G Derdelinckx
- Leuven Food Science and Nutrition Research Centre, Leuven, Belgium
| | - K Voordeckers
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - K J Verstrepen
- VIB Laboratory of Systems Biology, Leuven, Belgium.,CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| |
Collapse
|
38
|
Pulido-Tamayo S, Duitama J, Marchal K. EXPLoRA-web: linkage analysis of quantitative trait loci using bulk segregant analysis. Nucleic Acids Res 2016; 44:W142-6. [PMID: 27105844 PMCID: PMC4987886 DOI: 10.1093/nar/gkw298] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/11/2016] [Indexed: 11/13/2022] Open
Abstract
Identification of genomic regions associated with a phenotype of interest is a fundamental step toward solving questions in biology and improving industrial research. Bulk segregant analysis (BSA) combined with high-throughput sequencing is a technique to efficiently identify these genomic regions associated with a trait of interest. However, distinguishing true from spuriously linked genomic regions and accurately delineating the genomic positions of these truly linked regions requires the use of complex statistical models currently implemented in software tools that are generally difficult to operate for non-expert users. To facilitate the exploration and analysis of data generated by bulked segregant analysis, we present EXPLoRA-web, a web service wrapped around our previously published algorithm EXPLoRA, which exploits linkage disequilibrium to increase the power and accuracy of quantitative trait loci identification in BSA analysis. EXPLoRA-web provides a user friendly interface that enables easy data upload and parallel processing of different parameter configurations. Results are provided graphically and as BED file and/or text file and the input is expected in widely used formats, enabling straightforward BSA data analysis. The web server is available at http://bioinformatics.intec.ugent.be/explora-web/.
Collapse
Affiliation(s)
- Sergio Pulido-Tamayo
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium Department of Microbial and Molecular Systems, KU Leuven, Kasteelpark Arenberg 20, B-3001 Leuven, Belgium
| | - Jorge Duitama
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), 763537 Cali, Colombia
| | - Kathleen Marchal
- Department of Information Technology, iGent Toren, Technologiepark 15, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, UGent, Technologiepark 927, 9052 Gent, Belgium Bioinformatics Institute Ghent, Technologiepark 927, 9052 Gent, Belgium Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa
| |
Collapse
|
39
|
Abt TD, Souffriau B, Foulquié-Moreno MR, Duitama J, Thevelein JM. Genomic saturation mutagenesis and polygenic analysis identify novel yeast genes affecting ethyl acetate production, a non-selectable polygenic trait. Microb Cell 2016; 3:159-175. [PMID: 28357348 PMCID: PMC5349090 DOI: 10.15698/mic2016.04.491] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Isolation of mutants in populations of microorganisms has been a valuable tool in experimental genetics for decades. The main disadvantage, however, is the inability of isolating mutants in non-selectable polygenic traits. Most traits of organisms, however, are non-selectable and polygenic, including industrially important properties of microorganisms. The advent of powerful technologies for polygenic analysis of complex traits has allowed simultaneous identification of multiple causative mutations among many thousands of irrelevant mutations. We now show that this also applies to haploid strains of which the genome has been loaded with induced mutations so as to affect as many non-selectable, polygenic traits as possible. We have introduced about 900 mutations into single haploid yeast strains using multiple rounds of EMS mutagenesis, while maintaining the mating capacity required for genetic mapping. We screened the strains for defects in flavor production, an important non-selectable, polygenic trait in yeast alcoholic beverage production. A haploid strain with multiple induced mutations showing reduced ethyl acetate production in semi-anaerobic fermentation, was selected and the underlying quantitative trait loci (QTLs) were mapped using pooled-segregant whole-genome sequence analysis after crossing with an unrelated haploid strain. Reciprocal hemizygosity analysis and allele exchange identified PMA1 and CEM1 as causative mutant alleles and TPS1 as a causative genetic background allele. The case of CEM1 revealed that relevant mutations without observable effect in the haploid strain with multiple induced mutations (in this case due to defective mitochondria) can be identified by polygenic analysis as long as the mutations have an effect in part of the segregants (in this case those that regained fully functional mitochondria). Our results show that genomic saturation mutagenesis combined with complex trait polygenic analysis could be used successfully to identify causative alleles underlying many non-selectable, polygenic traits in small collections of haploid strains with multiple induced mutations.
Collapse
Affiliation(s)
- Tom Den Abt
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven. ; Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Ben Souffriau
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven. ; Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Maria R Foulquié-Moreno
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven. ; Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Jorge Duitama
- Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Johan M Thevelein
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven. ; Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium
| |
Collapse
|
40
|
Rebolledo MC, Peña AL, Duitama J, Cruz DF, Dingkuhn M, Grenier C, Tohme J. Combining Image Analysis, Genome Wide Association Studies and Different Field Trials to Reveal Stable Genetic Regions Related to Panicle Architecture and the Number of Spikelets per Panicle in Rice. Front Plant Sci 2016; 7:1384. [PMID: 27703460 PMCID: PMC5029283 DOI: 10.3389/fpls.2016.01384] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Accepted: 08/30/2016] [Indexed: 05/19/2023]
Abstract
Number of spikelets per panicle (NSP) is a key trait to increase yield potential in rice (O. sativa). The architecture of the rice inflorescence which is mainly determined by the length and number of primary (PBL and PBN) and secondary (SBL and SBN) branches can influence NSP. Although several genes controlling panicle architecture and NSP in rice have been identified, there is little evidence of (i) the genetic control of panicle architecture and NSP in different environments and (ii) the presence of stable genetic associations with panicle architecture across environments. This study combines image phenotyping of 225 accessions belonging to a genetic diversity array of indica rice grown under irrigated field condition in two different environments and Genome Wide Association Studies (GWAS) based on the genotyping of the diversity panel, providing 83,374 SNPs. Accessions sown under direct seeding in one environement had reduced Panicle Length (PL), NSP, PBN, PBL, SBN, and SBL compared to those established under transplanting in the second environment. Across environments, NSP was significantly and positively correlated with PBN, SBN and PBL. However, the length of branches (PBL and SBL) was not significantly correlated with variables related to number of branches (PBN and SBN), suggesting independent genetic control. Twenty- three GWAS sites were detected with P ≤ 1.0E-04 and 27 GWAS sites with p ≤ 5.9E-04. We found 17 GWAS sites related to NSP, 10 for PBN and 11 for SBN, 7 for PBL and 11 for SBL. This study revealed new regions related to NSP, but only three associations were related to both branching number (PBN and SBN) and NSP. Two GWAS sites associated with SBL and SBN were stable across contrasting environments and were not related to genes previously reported. The new regions reported in this study can help improving NSP in rice for both direct seeded and transplanted conditions. The integrated approach of high-throughput phenotyping, multi-environment field trials and GWAS has the potential to dissect complex traits, such as NSP, into less complex traits and to match single nucleotide polymorphisms with relevant function under different environments, offering a potential use for molecular breeding.
Collapse
Affiliation(s)
- Maria C. Rebolledo
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
- *Correspondence: Maria C. Rebolledo
| | - Alexandra L. Peña
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
| | - Jorge Duitama
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
| | - Daniel F. Cruz
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
| | - Michael Dingkuhn
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
- Agricultural Research for Development - CIRAD, Unités Mixtes de Recherche - Amélioration Génétique et Adaptation des PlantesMontpellier, France
| | - Cecile Grenier
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
- Agricultural Research for Development - CIRAD, Unités Mixtes de Recherche - Amélioration Génétique et Adaptation des PlantesMontpellier, France
| | - Joe Tohme
- Agrobiodiversity, International Center for Tropical AgriculturePalmira, Colombia
| |
Collapse
|
41
|
Rebolledo MC, Dingkuhn M, Courtois B, Gibon Y, Clément-Vidal A, Cruz DF, Duitama J, Lorieux M, Luquet D. Phenotypic and genetic dissection of component traits for early vigour in rice using plant growth modelling, sugar content analyses and association mapping. J Exp Bot 2015; 66:5555-66. [PMID: 26022255 PMCID: PMC4585419 DOI: 10.1093/jxb/erv258] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Early vigour of rice, defined as seedling capacity to accumulate shoot dry weight (SDW) rapidly, is a complex trait. It depends on a genotype propensity to assimilate, store, and/or use non-structural carbohydrates (NSC) for producing large and/or numerous leaves, involving physiological trade-offs in the expression of component traits and, possibly, physiological and genetic linkages. This study explores a plant-model-assisted phenotyping approach to dissect the genetic architecture of rice early vigour, applying the Genome Wide Association Study (GWAS) to morphological and NSC measurements, as well as fitted parameters for the functional-structural plant model, Ecomeristem. Leaf size, number, SDW, and source-leaf NSC concentration were measured on a panel of 123 japonica accessions. The data were used to estimate Ecomeristem genotypic parameters driving organ appearance rate, size, and carbon dynamics. GWAS was performed based on 12 221 single-nucleotide polymorphisms (SNP). Twenty-three associations were detected at P <1×10(-4) and 64 at P <5×10(-4). Associations for NSC and model parameters revealed new regions related to early vigour that had greater significance than morphological traits, providing additional information on the genetic control of early vigour. Plant model parameters were used to characterize physiological and genetic trade-offs among component traits. Twelve associations were related to loci for cloned genes, with nine related to organogenesis, plant height, cell size or cell number. The potential use of these associations as markers for breeding is discussed.
Collapse
Affiliation(s)
| | - M Dingkuhn
- IRRI, CESD Division, DAPO Box 7777, Metro Manila, Philippines CIRAD, UMR AGAP, F-34398 Montpellier, France
| | - B Courtois
- CIRAD, UMR AGAP, F-34398 Montpellier, France
| | - Y Gibon
- INRA, Metabolome Platform of UMR 1332, Bordeaux, France
| | | | - D F Cruz
- CIAT, Agrobiodiversity, AA 6713, Cali, Colombia
| | - J Duitama
- CIAT, Agrobiodiversity, AA 6713, Cali, Colombia
| | - M Lorieux
- CIAT, Agrobiodiversity, AA 6713, Cali, Colombia IRD, DIADE Research Unit, Institut de Recherche pour le Développement, 34394 Montpellier Cedex 5, France
| | - D Luquet
- CIRAD, UMR AGAP, F-34398 Montpellier, France
| |
Collapse
|
42
|
Duitama J, Silva A, Sanabria Y, Cruz DF, Quintero C, Ballen C, Lorieux M, Scheffler B, Farmer A, Torres E, Oard J, Tohme J. Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection. PLoS One 2015; 10:e0124617. [PMID: 25923345 PMCID: PMC4414565 DOI: 10.1371/journal.pone.0124617] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 03/02/2015] [Indexed: 01/08/2023] Open
Abstract
Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops.
Collapse
Affiliation(s)
- Jorge Duitama
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
- * E-mail:
| | - Alexander Silva
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
| | - Yamid Sanabria
- Rice Research Station, Louisiana State University Agricultural Center, Rayne, Louisiana, United States of America
| | - Daniel Felipe Cruz
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
| | - Constanza Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
| | - Carolina Ballen
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
| | - Mathias Lorieux
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
- Plant Diversity Adaptation and Development Research Unit, Institut de Recherche pour le Développement, Montpellier, France
| | - Brian Scheffler
- Genomics and Bioinformatics Research Unit, Agricultural Research Service, United States Department of Agriculture, Jamie Whitten Delta States Research Center, Stoneville, Mississippi, United States of America
| | - Andrew Farmer
- National Center for Genome Resources, Santa Fe, New Mexico, United States of America
| | - Edgar Torres
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
| | - James Oard
- Rice Research Station, Louisiana State University Agricultural Center, Rayne, Louisiana, United States of America
| | - Joe Tohme
- Agrobiodiversity research area, International Center for Tropical Agriculture, Cali, Colombia
| |
Collapse
|
43
|
Duan F, Duitama J, Al Seesi S, Ayres CM, Corcelli SA, Pawashe AP, Blanchard T, McMahon D, Sidney J, Sette A, Baker BM, Mandoiu II, Srivastava PK. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J Exp Med 2014; 211:2231-48. [PMID: 25245761 PMCID: PMC4203949 DOI: 10.1084/jem.20141308] [Citation(s) in RCA: 263] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 09/05/2014] [Indexed: 12/23/2022] Open
Abstract
The mutational repertoire of cancers creates the neoepitopes that make cancers immunogenic. Here, we introduce two novel tools that identify, with relatively high accuracy, the small proportion of neoepitopes (among the hundreds of potential neoepitopes) that protect the host through an antitumor T cell response. The two tools consist of (a) the numerical difference in NetMHC scores between the mutated sequences and their unmutated counterparts, termed the differential agretopic index, and (b) the conformational stability of the MHC I-peptide interaction. Mechanistically, these tools identify neoepitopes that are mutated to create new anchor residues for MHC binding, and render the overall peptide more rigid. Surprisingly, the protective neoepitopes identified here elicit CD8-dependent immunity, even though their affinity for K(d) is orders of magnitude lower than the 500-nM threshold considered reasonable for such interactions. These results greatly expand the universe of target cancer antigens and identify new tools for human cancer immunotherapy.
Collapse
Affiliation(s)
- Fei Duan
- Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut School of Medicine, Farmington, CT 06030
| | - Jorge Duitama
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269
| | - Sahar Al Seesi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269
| | - Cory M Ayres
- Department of Chemistry and Biochemistry and Harper Cancer Research Institute, University of Notre Dame, Notre Dame, IN 46556
| | - Steven A Corcelli
- Department of Chemistry and Biochemistry and Harper Cancer Research Institute, University of Notre Dame, Notre Dame, IN 46556
| | - Arpita P Pawashe
- Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut School of Medicine, Farmington, CT 06030
| | - Tatiana Blanchard
- Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut School of Medicine, Farmington, CT 06030
| | - David McMahon
- Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut School of Medicine, Farmington, CT 06030
| | - John Sidney
- LaJolla Institute of Allergy and Immunology, La Jolla, CA 92037
| | | | - Brian M Baker
- Department of Chemistry and Biochemistry and Harper Cancer Research Institute, University of Notre Dame, Notre Dame, IN 46556
| | - Ion I Mandoiu
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269
| | - Pramod K Srivastava
- Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut School of Medicine, Farmington, CT 06030
| |
Collapse
|
44
|
Fory PA, Triplett L, Ballen C, Abello JF, Duitama J, Aricapa MG, Prado GA, Correa F, Hamilton J, Leach JE, Tohme J, Mosquera GM. Comparative analysis of two emerging rice seed bacterial pathogens. Phytopathology 2014; 104:436-444. [PMID: 24261408 DOI: 10.1094/phyto-07-13-0186-r] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Seed sterility and grain discoloration limit rice production in Colombia and several Central American countries. In samples of discolored rice seed grown in Colombian fields, the species Burkholderia glumae and B. gladioli were isolated, and field isolates were compared phenotypically. An artificial inoculation assay was used to determine that, although both bacterial species cause symptoms on rice grains, B. glumae is a more aggressive pathogen, causing yield reduction and higher levels of grain sterility. To identify putative virulence genes differing between B. glumae and B. gladioli, four previously sequenced genomes of Asian and U.S. strains of the two pathogens were compared with each other and with two draft genomes of Colombian B. glumae and B. gladioli isolates generated for this study. Whereas previously characterized Burkholderia virulence factors are highly conserved between the two species, B. glumae and B. gladioli strains are predicted to encode distinct groups of genes encoding type VI secretion systems, transcriptional regulators, and membrane-sensing proteins. This study shows that both B. glumae and B. gladioli can threaten grain quality, although only one species affects yield. Furthermore, genotypic differences between the two strains are identified that could contribute to disease phenotypic differences.
Collapse
|
45
|
Duan F, Duitama J, Al Seesi S, Blanchard T, McMahon D, Sidney J, Sette A, Mandoiu I, Srivastava P. A mutation in Transportin3 (Tnpo3) leads to generation of an individually distinct tumor-specific Kd-restricted epitope in the Meth A fibrosarcoma (TUM2P.899). The Journal of Immunology 2014. [DOI: 10.4049/jimmunol.192.supp.71.23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Abstract
The mutational repertoire of cancers creates the epitopes that make cancers immunogenic. Here, through a comprehensive genomic, bioinformatic and immunological analyses, we uncover over a hundred new neo-epitopes in the Meth A fibrosarcoma of BALB/c mice. Interestingly, the predicted affinities of neo-epitopes for MHC I have no bearing on protective anti-tumor immunogenicity; instead, the numerical difference of such affinities between the mutated and un-mutated sequences, named the Differential Agretopic Index (DAI), is a significant although imperfect predictor. The mutated Transportin 3 (Tnpo3)-derived epitope, the highest ranking (by DAI) epitope of Meth A is naturally presented by Meth A cells. Immunization with whole Meth A cells induces mutated Tnpo3 epitope-specific CD8+ T-cell responses. Immunization with mutated Tnpo3 peptide elicits CD8+ T-cell responses that recognize Meth A cells ex vivo, as well as significant protection from a tumor challenge. This tumor immunity is further enhanced by combination of immunization with mutant Tnpo3 with a Toll-like receptor 9 (TLR9) ligand or anti-cytotoxic T-lymphocyte antigen 4 (CTLA-4) blocking antibody.
Collapse
Affiliation(s)
- Fei Duan
- 1Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut Health Center, Farmington, CT
| | - Jorge Duitama
- 3International Center for Tropical Agriculture, Cali, Colombia
| | - Sahar Al Seesi
- 2Department of Computer Science & Engineering, University of Connecticut Health Center, Storrs, CT
| | - Tatiana Blanchard
- 1Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut Health Center, Farmington, CT
| | - David McMahon
- 1Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut Health Center, Farmington, CT
| | - John Sidney
- 4LaJolla Institute of Allergy and Immunology, La Jolla, CA
| | | | - Ion Mandoiu
- 2Department of Computer Science & Engineering, University of Connecticut Health Center, Storrs, CT
| | - Pramod Srivastava
- 1Department of Immunology and Carole and Ray Neag Comprehensive Cancer Center, University of Connecticut Health Center, Farmington, CT
| |
Collapse
|
46
|
Duitama J, Zablotskaya A, Gemayel R, Jansen A, Belet S, Vermeesch JR, Verstrepen KJ, Froyen G. Large-scale analysis of tandem repeat variability in the human genome. Nucleic Acids Res 2014; 42:5728-41. [PMID: 24682812 PMCID: PMC4027155 DOI: 10.1093/nar/gku212] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats.
Collapse
Affiliation(s)
- Jorge Duitama
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium Agrobiodiversity Research Area, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| | - Alena Zablotskaya
- Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Rita Gemayel
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium
| | - An Jansen
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Stefanie Belet
- Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| | - Joris R Vermeesch
- Center for Human Genetics, University Hospitals Leuven, KU Leuven, B-3000 Leuven, Belgium
| | - Kevin J Verstrepen
- VIB lab for Systems Biology & CMPG Lab for Genetics and Genomics, KU Leuven, B-3001 Leuven, Belgium
| | - Guy Froyen
- Human Genome Laboratory, VIB Center for the Biology of Disease, Leuven, Belgium Human Genome Laboratory, Department of Human Genetics, KU Leuven, B-3000 Leuven, Belgium
| |
Collapse
|
47
|
Duitama J, Sánchez-Rodríguez A, Goovaerts A, Pulido-Tamayo S, Hubmann G, Foulquié-Moreno MR, Thevelein JM, Verstrepen KJ, Marchal K. Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast. BMC Genomics 2014; 15:207. [PMID: 24640961 PMCID: PMC4003806 DOI: 10.1186/1471-2164-15-207] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Accepted: 03/10/2014] [Indexed: 12/21/2022] Open
Abstract
Background Bulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors. Results To increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM). Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method. Conclusions EXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio’s i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Johan M Thevelein
- VIB Laboratory of Systems Biology & Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, Leuven B-3001, Belgium.
| | | | | |
Collapse
|
48
|
Duitama J, Quintero JC, Cruz DF, Quintero C, Hubmann G, Foulquié-Moreno MR, Verstrepen KJ, Thevelein JM, Tohme J. An integrated framework for discovery and genotyping of genomic variants from high-throughput sequencing experiments. Nucleic Acids Res 2014; 42:e44. [PMID: 24413664 PMCID: PMC3973327 DOI: 10.1093/nar/gkt1381] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.
Collapse
Affiliation(s)
- Jorge Duitama
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
- *To whom correspondence should be addressed. Tel: +57 2 4450000; Fax: +57 2 4450073;
| | - Juan Camilo Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Daniel Felipe Cruz
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Constanza Quintero
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Georg Hubmann
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Maria R. Foulquié-Moreno
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Kevin J. Verstrepen
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Johan M. Thevelein
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| | - Joe Tohme
- Agrobiodiversity research area, International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali- Palmira, A.A. 6713 Cali, Colombia, Laboratory of Molecular Cell Biology, Department of Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, B-3001 Leuven-Heverlee, Flanders, Belgium, VIB Laboratory of Systems Biology, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium and Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics, KU Leuven, Gaston Geenslaan 1, B-3001 Leuven-Heverlee, Flanders, Belgium
| |
Collapse
|
49
|
Ballén-Taborda C, Plata G, Ayling S, Rodríguez-Zapata F, Becerra Lopez-Lavalle LA, Duitama J, Tohme J. Identification of Cassava MicroRNAs under Abiotic Stress. Int J Genomics 2013; 2013:857986. [PMID: 24328029 PMCID: PMC3845235 DOI: 10.1155/2013/857986] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 10/11/2013] [Indexed: 11/18/2022] Open
Abstract
The study of microRNAs (miRNAs) in plants has gained significant attention in recent years due to their regulatory role during development and in response to biotic and abiotic stresses. Although cassava (Manihot esculenta Crantz) is tolerant to drought and other adverse conditions, most cassava miRNAs have been predicted using bioinformatics alone or through sequencing of plants challenged by biotic stress. Here, we use high-throughput sequencing and different bioinformatics methods to identify potential cassava miRNAs expressed in different tissues subject to heat and drought conditions. We identified 60 miRNAs conserved in other plant species and 821 potential cassava-specific miRNAs. We also predicted 134 and 1002 potential target genes for these two sets of sequences. Using real time PCR, we verified the condition-specific expression of 5 cassava small RNAs relative to a non-stress control. We also found, using publicly available expression data, a significantly lower expression of the predicted target genes of conserved and nonconserved miRNAs under drought stress compared to other cassava genes. Gene Ontology enrichment analysis along with condition specific expression of predicted miRNA targets, allowed us to identify several interesting miRNAs which may play a role in stress-induced posttranscriptional regulation in cassava and other plants.
Collapse
Affiliation(s)
- Carolina Ballén-Taborda
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Germán Plata
- Department of Systems Biology, Columbia University, 1130 Saint Nicholas Avenue, New York, NY 10032, USA
| | - Sarah Ayling
- The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Fausto Rodríguez-Zapata
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | | | - Jorge Duitama
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Joe Tohme
- Agrobiodiversity and Biotechnology Project, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| |
Collapse
|
50
|
Hubmann G, Mathé L, Foulquié-Moreno MR, Duitama J, Nevoigt E, Thevelein JM. Identification of multiple interacting alleles conferring low glycerol and high ethanol yield in Saccharomyces cerevisiae ethanolic fermentation. Biotechnol Biofuels 2013; 6:87. [PMID: 23759206 PMCID: PMC3687583 DOI: 10.1186/1754-6834-6-87] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 05/29/2013] [Indexed: 05/09/2023]
Abstract
BACKGROUND Genetic engineering of industrial microorganisms often suffers from undesirable side effects on essential functions. Reverse engineering is an alternative strategy to improve multifactorial traits like low glycerol/high ethanol yield in yeast fermentation. Previous rational engineering of this trait always affected essential functions like growth and stress tolerance. We have screened Saccharomyces cerevisiae biodiversity for specific alleles causing lower glycerol/higher ethanol yield, assuming higher compatibility with normal cellular functionality. Previous work identified ssk1E330N…K356N as causative allele in strain CBS6412, which displayed the lowest glycerol/ethanol ratio. RESULTS We have now identified a unique segregant, 26B, that shows similar low glycerol/high ethanol production as the superior parent, but lacks the ssk1E330N…K356N allele. Using segregants from the backcross of 26B with the inferior parent strain, we applied pooled-segregant whole-genome sequence analysis and identified three minor quantitative trait loci (QTLs) linked to low glycerol/high ethanol production. Within these QTLs, we identified three novel alleles of known regulatory and structural genes of glycerol metabolism, smp1R110Q,P269Q, hot1P107S,H274Y and gpd1L164P as causative genes. All three genes separately caused a significant drop in the glycerol/ethanol production ratio, while gpd1L164P appeared to be epistatically suppressed by other alleles in the superior parent. The order of potency in reducing the glycerol/ethanol ratio of the three alleles was: gpd1L164P > hot1P107S,H274Y ≥ smp1R110Q,P269Q. CONCLUSIONS Our results show that natural yeast strains harbor multiple specific alleles of genes controlling essential functions, that are apparently compatible with survival in the natural environment. These newly identified alleles can be used as gene tools for engineering industrial yeast strains with multiple subtle changes, minimizing the risk of negatively affecting other essential functions. The gene tools act at the transcriptional, regulatory or structural gene level, distributing the impact over multiple targets and thus further minimizing possible side-effects. In addition, the results suggest polygenic analysis of complex traits as a promising new avenue to identify novel components involved in cellular functions, including those important in industrial applications.
Collapse
Affiliation(s)
- Georg Hubmann
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
- Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
| | - Lotte Mathé
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
- Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
| | - Maria R Foulquié-Moreno
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
- Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
| | - Jorge Duitama
- Agrobiodiversity reasearch area, International Center for Tropical Agriculture (CIAT), A.A. 6713, Cali, Colombia
| | - Elke Nevoigt
- School of Engineering and Science, Jacobs University Bremen gGmbH, Campus Ring 1, Bremen 28759, Germany
| | - Johan M Thevelein
- Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
- Department of Molecular Microbiology, VIB, Kasteelpark Arenberg 31, Leuven-Heverlee, Flanders B-3001, Belgium
| |
Collapse
|