1
|
Noll NW, Scherber C, Schäffler L. taxalogue: a toolkit to create comprehensive CO1 reference databases. PeerJ 2023; 11:e16253. [PMID: 38077427 PMCID: PMC10702336 DOI: 10.7717/peerj.16253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 09/18/2023] [Indexed: 12/18/2023] Open
Abstract
Background Taxonomic identification through DNA barcodes gained considerable traction through the invention of next-generation sequencing and DNA metabarcoding. Metabarcoding allows for the simultaneous identification of thousands of organisms from bulk samples with high taxonomic resolution. However, reliable identifications can only be achieved with comprehensive and curated reference databases. Therefore, custom reference databases are often created to meet the needs of specific research questions. Due to taxonomic inconsistencies, formatting issues, and technical difficulties, building a custom reference database requires tremendous effort. Here, we present taxalogue, an easy-to-use software for creating comprehensive and customized reference databases that provide clean and taxonomically harmonized records. In combination with extensive geographical filtering options, taxalogue opens up new possibilities for generating and testing evolutionary hypotheses. Methods taxalogue collects DNA sequences from several online sources and combines them into a reference database. Taxonomic incongruencies between the different data sources can be harmonized according to available taxonomies. Dereplication and various filtering options are available regarding sequence quality or metadata information. taxalogue is implemented in the open-source Ruby programming language, and the source code is available at https://github.com/nwnoll/taxalogue. We benchmark four reference databases by sequence identity against eight queries from different localities and trapping devices. Subsamples from each reference database were used to compare how well another one is covered. Results taxalogue produces reference databases with the best coverage at high identities for most tested queries, enabling more accurate, reliable predictions with higher certainty than the other benchmarked reference databases. Additionally, the performance of taxalogue is more consistent while providing good coverage for a variety of habitats, regions, and sampling methods. taxalogue simplifies the creation of reference databases and makes the process reproducible and transparent. Multiple available output formats for commonly used downstream applications facilitate the easy adoption of taxalogue in many different software pipelines. The resulting reference databases improve the taxonomic classification accuracy through high coverage of the query sequences at high identities.
Collapse
Affiliation(s)
- Niklas W. Noll
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| | - Christoph Scherber
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| | - Livia Schäffler
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| |
Collapse
|
2
|
Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput Biol 2022; 18:e1009123. [PMID: 35639788 PMCID: PMC9286226 DOI: 10.1371/journal.pcbi.1009123] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/15/2022] [Accepted: 04/11/2022] [Indexed: 11/30/2022] Open
Abstract
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies—as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format. Most bioinformatics workflows deal with DNA/RNA variations that are typically represented in the variant call format (VCF)—a file format that describes mutations (SNP and MNP), insertions and deletions (INDEL) against a reference genome. Here we present a wide range of free and open source software tools that are used in biomedical sequencing workflows around the world today.
Collapse
Affiliation(s)
- Erik Garrison
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Zev N. Kronenberg
- Pacific Biosciences, San Diego, California, United States of America
| | - Eric T. Dawson
- NVIDIA Corporation, Santa Clara, California, United States of America
| | - Brent S. Pedersen
- Center for Molecular Medicine, University Medical Center, Utrecht, The Netherlands
| | - Pjotr Prins
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
3
|
Garrison E, Kronenberg ZN, Dawson ET, Pedersen BS, Prins P. A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput Biol 2022. [PMID: 35639788 DOI: 10.1101/2021.05.21.445151] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023] Open
Abstract
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.
Collapse
Affiliation(s)
- Erik Garrison
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Zev N Kronenberg
- Pacific Biosciences, San Diego, California, United States of America
| | - Eric T Dawson
- NVIDIA Corporation, Santa Clara, California, United States of America
| | - Brent S Pedersen
- Center for Molecular Medicine, University Medical Center, Utrecht, The Netherlands
| | - Pjotr Prins
- Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| |
Collapse
|
4
|
Wang S, Luo H. Estimating the Divergence Times of Alphaproteobacteria Based on Mitochondrial Endosymbiosis and Eukaryotic Fossils. Methods Mol Biol 2022; 2569:95-116. [PMID: 36083445 DOI: 10.1007/978-1-0716-2691-7_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Alphaproteobacteria is one of the most abundant bacterial lineages that successfully colonize diverse marine and terrestrial environments on Earth. In addition, many alphaproteobacterial lineages have established close association with eukaryotes. This makes Alphaproteobacteria a promising system to test the link between the emergence of ecologically important bacteria and related geological events and the co-evolution between symbiotic bacteria and their hosts. Understanding the timescale of evolution of Alphaproteobacteria is key to testing these hypotheses, which is limited by the scarcity of bacterial fossils, however. Based on the mitochondrial endosymbiosis which posits that the mitochondrion originated from an alphaproteobacterial lineage, we propose a new strategy to estimate the divergence times of lineages within the Alphaproteobacteria by leveraging the fossil records of eukaryotes. In this chapter, we describe the workflow of the mitochondria-based method to date Alphaproteobacteria evolution by detailing the software, methods, and commands used for each step. Visualization of data and results is also described. We also provide related notes with background information and alternative options. All codes used to build this protocol are made available to the public, and we strive to make this protocol user-friendly in particular to microbiologists with limited practical skills in bioinformatics.
Collapse
Affiliation(s)
- Sishuo Wang
- Simon F. S. Li Marine Science Laboratory, School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Haiwei Luo
- School of Life Sciences, Earth and Environmental Sciences Programme, and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Evolutionary origin and ecological implication of a unique nif island in free-living Bradyrhizobium lineages. THE ISME JOURNAL 2021; 15:3195-3206. [PMID: 33990706 PMCID: PMC8528876 DOI: 10.1038/s41396-021-01002-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/21/2021] [Accepted: 04/28/2021] [Indexed: 02/03/2023]
Abstract
The alphaproteobacterial genus Bradyrhizobium has been best known as N2-fixing members that nodulate legumes, supported by the nif and nod gene clusters. Recent environmental surveys show that Bradyrhizobium represents one of the most abundant free-living bacterial lineages in the world's soils. However, our understanding of Bradyrhizobium comes largely from symbiotic members, biasing the current knowledge of their ecology and evolution. Here, we report the genomes of 88 Bradyrhizobium strains derived from diverse soil samples, including both nif-carrying and non-nif-carrying free-living (nod free) members. Phylogenomic analyses of these and 252 publicly available Bradyrhizobium genomes indicate that nif-carrying free-living members independently evolved from symbiotic ancestors (carrying both nif and nod) multiple times. Intriguingly, the nif phylogeny shows that the vast majority of nif-carrying free-living members comprise an independent cluster, indicating that horizontal gene transfer promotes nif expansion among the free-living Bradyrhizobium. Comparative genomics analysis identifies that the nif genes found in free-living Bradyrhizobium are located on a unique genomic island of ~50 kb equipped with genes potentially involved in coping with oxygen tension. We further analyze amplicon sequencing data to show that Bradyrhizobium members presumably carrying this nif island are widespread in a variety of environments. Given the dominance of Bradyrhizobium in world's soils, our findings have implications for global nitrogen cycles and agricultural research.
Collapse
|
6
|
Simm D, Hatje K, Waack S, Kollmar M. Critical assessment of coiled-coil predictions based on protein structure data. Sci Rep 2021; 11:12439. [PMID: 34127723 PMCID: PMC8203680 DOI: 10.1038/s41598-021-91886-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/28/2021] [Indexed: 02/05/2023] Open
Abstract
Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools' performance is close to random. This implicates that the tools' predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.
Collapse
Affiliation(s)
- Dominic Simm
- grid.418140.80000 0001 2104 4211Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany ,grid.7450.60000 0001 2364 4210Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Göttingen, Germany
| | - Klas Hatje
- grid.418140.80000 0001 2104 4211Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany ,grid.417570.00000 0004 0374 1269Present Address: Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland
| | - Stephan Waack
- grid.7450.60000 0001 2364 4210Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Göttingen, Germany
| | - Martin Kollmar
- grid.418140.80000 0001 2104 4211Group Systems Biology of Motor Proteins, Department of NMR-Based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany ,grid.7450.60000 0001 2364 4210Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Göttingen, Germany
| |
Collapse
|
7
|
Wang S, Luo H. Dating Alphaproteobacteria evolution with eukaryotic fossils. Nat Commun 2021; 12:3324. [PMID: 34083540 PMCID: PMC8175736 DOI: 10.1038/s41467-021-23645-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 05/10/2021] [Indexed: 11/12/2022] Open
Abstract
Elucidating the timescale of the evolution of Alphaproteobacteria, one of the most prevalent microbial lineages in marine and terrestrial ecosystems, is key to testing hypotheses on their co-evolution with eukaryotic hosts and Earth's systems, which, however, is largely limited by the scarcity of bacterial fossils. Here, we incorporate eukaryotic fossils to date the divergence times of Alphaproteobacteria, based on the mitochondrial endosymbiosis that mitochondria evolved from an alphaproteobacterial lineage. We estimate that Alphaproteobacteria arose ~1900 million years (Ma) ago, followed by rapid divergence of their major clades. We show that the origin of Rickettsiales, an order of obligate intracellular bacteria whose hosts are mostly animals, predates the emergence of animals for ~700 Ma but coincides with that of eukaryotes. This, together with reconstruction of ancestral hosts, strongly suggests that early Rickettsiales lineages had established previously underappreciated interactions with unicellular eukaryotes. Moreover, the mitochondria-based approach displays higher robustness to uncertainties in calibrations compared with the traditional strategy using cyanobacterial fossils. Further, our analyses imply the potential of dating the (bacterial) tree of life based on endosymbiosis events, and suggest that previous applications using divergence times of the modern hosts of symbiotic bacteria to date bacterial evolution might need to be revisited.
Collapse
Affiliation(s)
- Sishuo Wang
- Simon F. S. Li Marine Science Laboratory, School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, SAR, Hong Kong
| | - Haiwei Luo
- Simon F. S. Li Marine Science Laboratory, School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, SAR, Hong Kong.
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China.
- Hong Kong Branch of Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, SAR, Hong Kong.
| |
Collapse
|
8
|
Jauss RT, Solf N, Kolora SRR, Schaffer S, Wolf R, Henle K, Fritz U, Schlegel M. Mitogenome evolution in the Lacerta viridis complex (Lacertidae, Squamata) reveals phylogeny of diverging clades. SYST BIODIVERS 2021. [DOI: 10.1080/14772000.2021.1912205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Robin-Tobias Jauss
- Institute of Biology, Biodiversity & Evolution, University of Leipzig, Talstraße 33, Leipzig, 04103, Germany
| | - Nadiné Solf
- Institute of Biology, Biodiversity & Evolution, University of Leipzig, Talstraße 33, Leipzig, 04103, Germany
| | - Sree Rohit Raj Kolora
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Stefan Schaffer
- Institute of Biology, Molecular Evolution & Animal Systematics, University of Leipzig, Talstraße 33, Leipzig, 04103, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle Jena Leipzig, Deutscher Platz 5e, Leipzig, 04103, Germany
| | - Ronny Wolf
- Institute of Biology, Molecular Evolution & Animal Systematics, University of Leipzig, Talstraße 33, Leipzig, 04103, Germany
| | - Klaus Henle
- German Centre for Integrative Biodiversity Research (iDiv) Halle Jena Leipzig, Deutscher Platz 5e, Leipzig, 04103, Germany
- Department of Conservation Biology, UFZ – Helmholtz Centre for Environmental Research, Permoserstr. 15, 04318, Leipzig, Germany
| | - Uwe Fritz
- Museum of Zoology, Senckenberg Dresden, A. B. Meyer Building, 01109, Dresden, Germany
| | - Martin Schlegel
- Institute of Biology, Biodiversity & Evolution, University of Leipzig, Talstraße 33, Leipzig, 04103, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle Jena Leipzig, Deutscher Platz 5e, Leipzig, 04103, Germany
| |
Collapse
|
9
|
Abstract
Chromosome replication is an essential process for cell division. The mode of chromosome replication has important impacts on the structure of the chromosome and replication speed. As typical bacterial replicons, circular chromosomes replicate bidirectionally and circular plasmids replicate either bidirectionally or unidirectionally. Whereas the finding of chromids (plasmid-derived chromosomes) in multiple bacterial lineages provides circumstantial evidence that chromosomes likely evolved from plasmids, all experimentally assayed chromids were shown to use bidirectional replication. Here, we employed a model system, the marine bacterial genus Pseudoalteromonas, members of which consistently carry a chromosome and a chromid. We provide experimental and bioinformatic evidence that while chromids in a few strains replicate bidirectionally, most replicate unidirectionally. This is the first experimental demonstration of the unidirectional replication mode in bacterial chromids. Phylogenomic and comparative genomic analyses showed that the bidirectional replication evolved only once from a unidirectional ancestor and that this transition was associated with insertions of exogenous DNA and relocation of the replication terminus region (ter2) from near the origin site (ori2) to a position roughly opposite it. This process enables a plasmid-derived chromosome to increase its size and expand the bacterium’s metabolic versatility while keeping its replication synchronized with that of the main chromosome. A major implication of our study is that the uni- and bidirectionally replicating chromids may represent two stages on the evolutionary trajectory from unidirectionally replicating plasmids to bidirectionally replicating chromosomes in bacteria. Further bioinformatic analyses predicted unidirectionally replicating chromids in several unrelated bacterial phyla, suggesting that evolution from unidirectionally to bidirectionally replicating replicons occurred multiple times in bacteria.
Collapse
|
10
|
A haplotype-led approach to increase the precision of wheat breeding. Commun Biol 2020; 3:712. [PMID: 33239669 PMCID: PMC7689427 DOI: 10.1038/s42003-020-01413-2] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 10/15/2020] [Indexed: 12/11/2022] Open
Abstract
Crop productivity must increase at unprecedented rates to meet the needs of the growing worldwide population. Exploiting natural variation for the genetic improvement of crops plays a central role in increasing productivity. Although current genomic technologies can be used for high-throughput identification of genetic variation, methods for efficiently exploiting this genetic potential in a targeted, systematic manner are lacking. Here, we developed a haplotype-based approach to identify genetic diversity for crop improvement using genome assemblies from 15 bread wheat (Triticum aestivum) cultivars. We used stringent criteria to identify identical-by-state haplotypes and distinguish these from near-identical sequences (~99.95% identity). We showed that each cultivar shares ~59 % of its genome with other sequenced cultivars and we detected the presence of extended haplotype blocks containing hundreds to thousands of genes across all wheat chromosomes. We found that genic sequence alone was insufficient to fully differentiate between haplotypes, as were commonly used array-based genotyping chips due to their gene centric design. We successfully used this approach for focused discovery of novel haplotypes from a landrace collection and documented their potential for trait improvement in modern bread wheat. This study provides a framework for defining and exploiting haplotypes to increase the efficiency and precision of wheat breeding towards optimising the agronomic performance of this crucial crop. Brinton, Uauy and colleagues utilize genomic data from the 10+ Wheat Genome Project to develop a useful tool for studying and generating new wheat cultivars. This framework uses advanced exploitation of wheat haplotypes to bring newfound precision and efficiency to wheat breeding.
Collapse
|
11
|
Jiménez-García B, Teixeira JMC, Trellet M, Rodrigues JPGLM, Bonvin AMJJ. PDB-tools web: A user-friendly interface for the manipulation of PDB files. Proteins 2020; 89:330-335. [PMID: 33111403 PMCID: PMC7855443 DOI: 10.1002/prot.26018] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 10/20/2020] [Accepted: 10/26/2020] [Indexed: 01/06/2023]
Abstract
The Protein Data Bank (PDB) file format remains a popular format used and supported by many software to represent coordinates of macromolecular structures. It however suffers from drawbacks such as error‐prone manual editing. Because of that, various software toolkits have been developed to facilitate its editing and manipulation, but, to date, there is no online tool available for this purpose. Here we present PDB‐Tools Web, a flexible online service for manipulating PDB files. It offers a rich and user‐friendly graphical user interface that allows users to mix‐and‐match more than 40 individual tools from the pdb‐tools suite. Those can be combined in a few clicks to perform complex pipelines, which can be saved and uploaded. The resulting processed PDB files can be visualized online and downloaded. The web server is freely available at https://wenmr.science.uu.nl/pdbtools.
Collapse
Affiliation(s)
- Brian Jiménez-García
- Bijvoet Centre for Biomolecular Research, Utrecht University, Utrecht, The Netherlands
| | - João M C Teixeira
- Program in Molecular Medicine, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Mikael Trellet
- Bijvoet Centre for Biomolecular Research, Utrecht University, Utrecht, The Netherlands
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, California, USA
| | | |
Collapse
|
12
|
Greener JG, Selvaraj J, Ward BJ. BioStructures.jl: read, write and manipulate macromolecular structures in Julia. Bioinformatics 2020; 36:4206-4207. [PMID: 32407511 DOI: 10.1093/bioinformatics/btaa502] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 04/20/2020] [Accepted: 05/07/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Robust, flexible and fast software to read, write and manipulate macromolecular structures is a prerequisite for productively doing structural bioinformatics. We present BioStructures.jl, the first dedicated package in the Julia programming language for dealing with macromolecular structures and the Protein Data Bank. BioStructures.jl builds on the lessons learned with similar packages to provide a large feature set, a flexible object representation and high performance. AVAILABILITY AND IMPLEMENTATION BioStructures.jl is freely available under the MIT license. Source code and documentation are available at https://github.com/BioJulia/BioStructures.jl. BioStructures.jl is compatible with Julia versions 0.6 and later and is system-independent. CONTACT j.greener@ucl.ac.uk.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Joel Selvaraj
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Ben J Ward
- The Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| |
Collapse
|
13
|
BMT: Bioinformatics mini toolbox for comprehensive DNA and protein analysis. Genomics 2020; 112:4561-4566. [PMID: 32791200 DOI: 10.1016/j.ygeno.2020.08.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 08/01/2020] [Accepted: 08/07/2020] [Indexed: 01/05/2023]
Abstract
Background Bioinformatics tools are of great significance and are used in different spheres of life sciences. There are wide variety of tools available to perform primary analysis of DNA and protein but most of them are available on different platforms and many remain undetected. Accessing these tools separately to perform individual task is uneconomical and inefficient. Objective Our aim is to bring different bioinformatics models on a single platform to ameliorate scientific research. Hence, our objective is to make a tool for comprehensive DNA and protein analysis. Methods To develop a reliable, straight-forward and standalone desktop application we used state of the art python packages and libraries. Bioinformatics Mini Toolbox (BMT) is combination of seven tools including FastqTrimmer, Gene Prediction, DNA Analysis, Translation, Protein analysis and Pairwise and Multiple alignment. Results FastqTrimmer assists in quality assurance of NGS data. Gene prediction predicts the genes by homology from novel genome on the basis of reference sequence. Protein analysis and DNA analysis calculates physiochemical properties of nucleotide and protein sequences, respectively. Translation translates the DNA sequence into six open reading frames. Pairwise alignment performs pairwise global and local alignment of DNA and protein sequences on the basis or multiple matrices. Multiple alignment aligns multiple sequences and generates a phylogenetic tree. Conclusion We developed a tool for comprehensive DNA and protein analysis. The link to download BMT is https://github.com/nasiriqbal012/BMT_SETUP.git.
Collapse
|
14
|
Abstract
Bacteria form diverse interactions with eukaryotic hosts. This is well represented by the Rhizobiales, a clade of Alphaproteobacteria strategically important for their large diversity of lifestyles with implications for agricultural and medical research. To investigate their lifestyle evolution, we compiled a comprehensive data set of genomes and lifestyle information for over 1,000 Rhizobiales genomes. We show that the origins of major host-associated lineages in Rhizobiales broadly coincided with the emergences of their host plants/animals, suggesting bacterium-host interactions as a driving force in the evolution of Rhizobiales. We further found that, in addition to gene gains, preexisting traits and recurrent losses of specific genomic traits may have played underrecognized roles in the origin of host-associated lineages, providing clues to genetic engineering of microbial agricultural inoculants and prevention of the emergence of potential plant/animal pathogens. Members of the order Rhizobiales include those capable of nitrogen fixation in nodules as well as pathogens of animals and plants. This lifestyle diversity has important implications for agricultural and medical research. Leveraging large-scale genomic data, we infer that Rhizobiales originated as a free-living ancestor ∼1,500 million years ago (Mya) and that the later emergence of host-associated lifestyles broadly coincided with the rise of their eukaryotic hosts. In particular, the first nodulating lineage arose from either Azorhizobium or Bradyrhizobium 150 to 80 Mya, a time range in general concurrent with the emergence of legumes. The rates of lifestyle transitions are highly variable; nodule association is more likely to be lost than gained, whereas animal association likely represents an evolutionary dead end. We searched for statistical correlations between gene presence and lifestyle and identified genes likely contributing to the transition and adaptation to the same lifestyle in divergent lineages. Among the genes potentially promoting successful transitions to major nodulation lineages, the nod and nif clusters for nodulation and nitrogen fixation, respectively, were repeatedly acquired during each transition; the fix, dct, and phb clusters involved in energy conservation under micro-oxic conditions were present in the nonnodulating ancestors; and the secretion systems were acquired in lineage-specific patterns. Our study data suggest that increased eukaryote diversity drives lifestyle diversification of bacteria and highlight both acquired and preexisting traits facilitating the origin of host association. IMPORTANCE Bacteria form diverse interactions with eukaryotic hosts. This is well represented by the Rhizobiales, a clade of Alphaproteobacteria strategically important for their large diversity of lifestyles with implications for agricultural and medical research. To investigate their lifestyle evolution, we compiled a comprehensive data set of genomes and lifestyle information for over 1,000 Rhizobiales genomes. We show that the origins of major host-associated lineages in Rhizobiales broadly coincided with the emergences of their host plants/animals, suggesting bacterium-host interactions as a driving force in the evolution of Rhizobiales. We further found that, in addition to gene gains, preexisting traits and recurrent losses of specific genomic traits may have played underrecognized roles in the origin of host-associated lineages, providing clues to genetic engineering of microbial agricultural inoculants and prevention of the emergence of potential plant/animal pathogens.
Collapse
|
15
|
Macnar JM, Szulc NA, Kryś JD, Badaczewska-Dawid AE, Gront D. BioShell 3.0: Library for Processing Structural Biology Data. Biomolecules 2020; 10:biom10030461. [PMID: 32188163 PMCID: PMC7175226 DOI: 10.3390/biom10030461] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2020] [Revised: 03/05/2020] [Accepted: 03/10/2020] [Indexed: 01/11/2023] Open
Abstract
BioShell is an open-source package for processing biological data, particularly focused on structural applications. The package provides parsers, data structures and algorithms for handling and analyzing macromolecular sequences, structures and sequence profiles. The most frequently used routines are accessible by a set of easy-to-use command line utilities for a Linux environment. The full functionality of the package assumes knowledge of C++ or Python to assemble an application using this software library. Since the last publication that announced the version 2.0, the package has been greatly expanded and rewritten in C++ standard 11 (C++11) to improve its modularity and efficiency. A new testing platform has been implemented to continuously test the correctness and integrity of the package. More than two hundred test programs have been published to provide simple examples that can be used as templates. This makes BioShell an easy to use library that greatly speeds up development of bioinformatics applications and web services without compromising computational efficiency.
Collapse
Affiliation(s)
- Joanna M. Macnar
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland; (J.M.M.); (N.A.S.); (J.D.K.); (A.E.B.-D.)
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, Stefana Banacha 2C, 02-097 Warsaw, Poland
| | - Natalia A. Szulc
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland; (J.M.M.); (N.A.S.); (J.D.K.); (A.E.B.-D.)
- Laboratory of Protein Metabolism, International Institute of Molecular and Cell Biology in Warsaw, 4 Ks. Trojdena Street, 02-109 Warsaw, Poland
| | - Justyna D. Kryś
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland; (J.M.M.); (N.A.S.); (J.D.K.); (A.E.B.-D.)
| | - Aleksandra E. Badaczewska-Dawid
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland; (J.M.M.); (N.A.S.); (J.D.K.); (A.E.B.-D.)
| | - Dominik Gront
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland; (J.M.M.); (N.A.S.); (J.D.K.); (A.E.B.-D.)
- Correspondence:
| |
Collapse
|
16
|
Thole V, Bassard JE, Ramírez-González R, Trick M, Ghasemi Afshar B, Breitel D, Hill L, Foito A, Shepherd L, Freitag S, Nunes dos Santos C, Menezes R, Bañados P, Naesby M, Wang L, Sorokin A, Tikhonova O, Shelenga T, Stewart D, Vain P, Martin C. RNA-seq, de novo transcriptome assembly and flavonoid gene analysis in 13 wild and cultivated berry fruit species with high content of phenolics. BMC Genomics 2019; 20:995. [PMID: 31856735 PMCID: PMC6924045 DOI: 10.1186/s12864-019-6183-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 10/15/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Flavonoids are produced in all flowering plants in a wide range of tissues including in berry fruits. These compounds are of considerable interest for their biological activities, health benefits and potential pharmacological applications. However, transcriptomic and genomic resources for wild and cultivated berry fruit species are often limited, despite their value in underpinning the in-depth study of metabolic pathways, fruit ripening as well as in the identification of genotypes rich in bioactive compounds. RESULTS To access the genetic diversity of wild and cultivated berry fruit species that accumulate high levels of phenolic compounds in their fleshy berry(-like) fruits, we selected 13 species from Europe, South America and Asia representing eight genera, seven families and seven orders within three clades of the kingdom Plantae. RNA from either ripe fruits (ten species) or three ripening stages (two species) as well as leaf RNA (one species) were used to construct, assemble and analyse de novo transcriptomes. The transcriptome sequences are deposited in the BacHBerryGEN database (http://jicbio.nbi.ac.uk/berries) and were used, as a proof of concept, via its BLAST portal (http://jicbio.nbi.ac.uk/berries/blast.html) to identify candidate genes involved in the biosynthesis of phenylpropanoid compounds. Genes encoding regulatory proteins of the anthocyanin biosynthetic pathway (MYB and basic helix-loop-helix (bHLH) transcription factors and WD40 repeat proteins) were isolated using the transcriptomic resources of wild blackberry (Rubus genevieri) and cultivated red raspberry (Rubus idaeus cv. Prestige) and were shown to activate anthocyanin synthesis in Nicotiana benthamiana. Expression patterns of candidate flavonoid gene transcripts were also studied across three fruit developmental stages via the BacHBerryEXP gene expression browser (http://www.bachberryexp.com) in R. genevieri and R. idaeus cv. Prestige. CONCLUSIONS We report a transcriptome resource that includes data for a wide range of berry(-like) fruit species that has been developed for gene identification and functional analysis to assist in berry fruit improvement. These resources will enable investigations of metabolic processes in berries beyond the phenylpropanoid biosynthetic pathway analysed in this study. The RNA-seq data will be useful for studies of berry fruit development and to select wild plant species useful for plant breeding purposes.
Collapse
Affiliation(s)
- Vera Thole
- Department of Metabolic Biology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - Jean-Etienne Bassard
- Department of Plant and Environmental Science, University of Copenhagen, 1871 Frederiksberg, Denmark
- Present address: Institute of Plant Molecular Biology, CNRS, University of Strasbourg, 12 Rue General Zimmer, 67084 Strasbourg, France
| | | | - Martin Trick
- Department of Computational and Systems Biology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - Bijan Ghasemi Afshar
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - Dario Breitel
- Department of Metabolic Biology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
- Present address: Tropic Biosciences UK LTD, Norwich Research Park, Norwich, NR4 7UG UK
| | - Lionel Hill
- Department of Metabolic Biology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | | | | | - Sabine Freitag
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA UK
| | - Cláudia Nunes dos Santos
- Instituto de Biologia Experimental e Tecnológica, Av. República, Qta. do Marquês, 2780-157 Oeiras, Portugal
- CEDOC, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Rua Câmara Pestana 6, 1150-082 Lisbon, Portugal
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Regina Menezes
- Instituto de Biologia Experimental e Tecnológica, Av. República, Qta. do Marquês, 2780-157 Oeiras, Portugal
- CEDOC, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Rua Câmara Pestana 6, 1150-082 Lisbon, Portugal
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Pilar Bañados
- Facultad De Agronomía e Ingeniería Forestal, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna Ote, 4860 Macul, Chile
| | | | - Liangsheng Wang
- Institute of Botany, The Chinese Academy of Sciences, 20 Nanxincun, Xiangshan, Beijing, 100093 China
| | - Artem Sorokin
- Fruit Crops Genetic Resources Department, N. I. Vavilov Research Institute of Plant Industry, B. Morskaya Street 42-44, St. Petersburg, 190000 Russia
| | - Olga Tikhonova
- Fruit Crops Genetic Resources Department, N. I. Vavilov Research Institute of Plant Industry, B. Morskaya Street 42-44, St. Petersburg, 190000 Russia
| | - Tatiana Shelenga
- Fruit Crops Genetic Resources Department, N. I. Vavilov Research Institute of Plant Industry, B. Morskaya Street 42-44, St. Petersburg, 190000 Russia
| | - Derek Stewart
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA UK
- Institute of Mechanical, Process and Energy Engineering, School of Engineering and Physical Sciences, Heriot Watt University, Edinburgh, UK
| | - Philippe Vain
- Department of Metabolic Biology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - Cathie Martin
- Department of Metabolic Biology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| |
Collapse
|
17
|
Wang S, Chen Y. Fine-Tuning the Expression of Duplicate Genes by Translational Regulation in Arabidopsis and Maize. FRONTIERS IN PLANT SCIENCE 2019; 10:534. [PMID: 31156655 PMCID: PMC6530396 DOI: 10.3389/fpls.2019.00534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 04/05/2019] [Indexed: 06/01/2023]
Abstract
Plant genomes are extensively shaped by various types of gene duplication. However, in this active area of investigation, the vast majority of studies focus on the sequence and transcription of duplicate genes, leaving open the question of how translational regulation impacts the expression and evolution of duplicate genes. We explored this issue by analyzing the ribo- and mRNA-seq data sets across six tissue types and stress conditions in Arabidopsis thaliana and maize (Zea mays). We dissected the relative contributions of transcriptional and translational regulation to the divergence in the abundance of ribosome footprint (RF) for different types of duplicate genes. We found that the divergence in RF abundance was largely programmed at the transcription level and that translational regulation plays more of a modulatory role. Intriguingly, translational regulation is characterized by its strong directionality, with the divergence in translational efficiency (TE) globally counteracting the divergence in mRNA abundance, indicating partial buffering of the transcriptional divergence between paralogs by translational regulation. Divergence in TE was associated with several sequence features. The faster-evolving copy in a duplicate pair was more likely to show lower RF abundance, which possibly results from relaxed purifying selection compared with its paralog. A considerable proportion of duplicates displayed differential TE across tissue types and stress conditions, most of which were enriched in photosynthesis, energy production, and translation-related processes. Additionally, we constructed a database TDPDG-DB (http://www.plantdupribo.tk), providing an online platform for data exploration. Overall, our study illustrates the roles of translational regulation in fine-tuning duplicate gene expression in plants.
Collapse
Affiliation(s)
- Sishuo Wang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China
- Department of Botany, Faculty of Science, The University of British Columbia, Vancouver, BC, Canada
- School of Life Sciences, The Chinese University of Hong Kong, Sha Tin, Hong Kong
| | - Youhua Chen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, China
| |
Collapse
|
18
|
Katayama T, Kawashima S, Okamoto S, Moriya Y, Chiba H, Naito Y, Fujisawa T, Mori H, Takagi T. TogoGenome/TogoStanza: modularized Semantic Web genome database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5277251. [PMID: 30624651 PMCID: PMC6323299 DOI: 10.1093/database/bay132] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 11/26/2018] [Indexed: 11/12/2022]
Abstract
TogoGenome is a genome database that is purely based on the Semantic Web technology, which enables the integration of heterogeneous data and flexible semantic searches.
All the information is stored as Resource Description Framework (RDF) data, and the reporting web pages are generated on the fly using SPARQL Protocol and RDF Query Language (SPARQL) queries. TogoGenome provides a semantic-faceted search system by gene functional annotation, taxonomy, phenotypes and environment based on the relevant ontologies. TogoGenome also serves as an interface to conduct semantic comparative genomics by which a user can observe pan-organism or organism-specific genes based on the functional aspect of gene annotations and the combinations of organisms from different taxa. The TogoGenome database exhibits a modularized structure, and each module in the report pages is separately served as TogoStanza, which is a generic framework for rendering an information block as IFRAME/Web Components, which can, unlike several other monolithic databases, also be reused to construct other databases. TogoGenome and TogoStanza have been under development since 2012 and are freely available along with their source codes on the GitHub repositories at https://github.com/togogenome/ and https://github.com/togostanza/, respectively, under the MIT license.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Shuichi Kawashima
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Shinobu Okamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Yuki Moriya
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Hirokazu Chiba
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Yuki Naito
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | | | - Hiroshi Mori
- National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Toshihisa Takagi
- National Institute of Genetics, Mishima, Shizuoka, Japan.,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Yayoi, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
19
|
Kohl TA, Utpatel C, Schleusener V, De Filippo MR, Beckert P, Cirillo DM, Niemann S. MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ 2018; 6:e5895. [PMID: 30479891 PMCID: PMC6238766 DOI: 10.7717/peerj.5895] [Citation(s) in RCA: 125] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 10/09/2018] [Indexed: 01/02/2023] Open
Abstract
Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.
Collapse
Affiliation(s)
- Thomas Andreas Kohl
- Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Christian Utpatel
- Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Viola Schleusener
- Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany
| | - Maria Rosaria De Filippo
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Patrick Beckert
- Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany.,German Center for Infection Research (DZIF), partner site Hamburg-Lübeck-Borstel-Riems, Borstel, Germany
| | - Daniela Maria Cirillo
- Emerging Bacterial Pathogens Unit, Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Stefan Niemann
- Molecular and Experimental Mycobacteriology, Research Center Borstel, Borstel, Germany.,German Center for Infection Research (DZIF), partner site Hamburg-Lübeck-Borstel-Riems, Borstel, Germany
| |
Collapse
|
20
|
Ramírez-González RH, Borrill P, Lang D, Harrington SA, Brinton J, Venturini L, Davey M, Jacobs J, van Ex F, Pasha A, Khedikar Y, Robinson SJ, Cory AT, Florio T, Concia L, Juery C, Schoonbeek H, Steuernagel B, Xiang D, Ridout CJ, Chalhoub B, Mayer KFX, Benhamed M, Latrasse D, Bendahmane A, Wulff BBH, Appels R, Tiwari V, Datla R, Choulet F, Pozniak CJ, Provart NJ, Sharpe AG, Paux E, Spannagl M, Bräutigam A, Uauy C. The transcriptional landscape of polyploid wheat. Science 2018; 361:eaar6089. [PMID: 30115782 DOI: 10.1126/science.aar6089] [Citation(s) in RCA: 540] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 07/11/2018] [Indexed: 12/14/2022]
Abstract
The coordinated expression of highly related homoeologous genes in polyploid species underlies the phenotypes of many of the world's major crops. Here we combine extensive gene expression datasets to produce a comprehensive, genome-wide analysis of homoeolog expression patterns in hexaploid bread wheat. Bias in homoeolog expression varies between tissues, with ~30% of wheat homoeologs showing nonbalanced expression. We found expression asymmetries along wheat chromosomes, with homoeologs showing the largest inter-tissue, inter-cultivar, and coding sequence variation, most often located in high-recombination distal ends of chromosomes. These transcriptionally dynamic genes potentially represent the first steps toward neo- or subfunctionalization of wheat homoeologs. Coexpression networks reveal extensive coordination of homoeologs throughout development and, alongside a detailed expression atlas, provide a framework to target candidate genes underpinning agronomic traits in wheat.
Collapse
|
21
|
Khomtchouk BB, Weitz E, Karp PD, Wahlestedt C. How the strengths of Lisp-family languages facilitate building complex and flexible bioinformatics applications. Brief Bioinform 2018; 19:537-543. [PMID: 28040748 PMCID: PMC5952920 DOI: 10.1093/bib/bbw130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 11/16/2016] [Indexed: 11/14/2022] Open
Abstract
We present a rationale for expanding the presence of the Lisp family of programming languages in bioinformatics and computational biology research. Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology. We will point out several important key features that distinguish languages of the Lisp family from other programming languages, and we will explain how these features can aid researchers in becoming more productive and creating better code. We will also show how these features make these languages ideal tools for artificial intelligence and machine learning applications. We will specifically stress the advantages of domain-specific languages (DSLs): languages that are specialized to a particular area, and thus not only facilitate easier research problem formulation, but also aid in the establishment of standards and best programming practices as applied to the specific research field at hand. DSLs are particularly easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred to as the 'programmable programming language'. We are convinced that Lisp grants programmers unprecedented power to build increasingly sophisticated artificial intelligence systems that may ultimately transform machine learning and artificial intelligence research in bioinformatics and computational biology.
Collapse
Affiliation(s)
- Bohdan B Khomtchouk
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Edmund Weitz
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Peter D Karp
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| | - Claes Wahlestedt
- Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1120 NW 14th St., Miami, FL, USA
| |
Collapse
|
22
|
Ohta T, Nakazato T, Bono H. Calculating the quality of public high-throughput sequencing data to obtain a suitable subset for reanalysis from the Sequence Read Archive. Gigascience 2018; 6:1-8. [PMID: 28449062 PMCID: PMC5459929 DOI: 10.1093/gigascience/gix029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 04/11/2017] [Indexed: 11/15/2022] Open
Abstract
It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party.
Collapse
Affiliation(s)
- Tazro Ohta
- Correspondence address. Tazro Ohta, Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, Japan. E-mail: ; Hidemasa Bono, Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, Japan. E-mail:
| | | | - Hidemasa Bono
- Correspondence address. Tazro Ohta, Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, Japan. E-mail: ; Hidemasa Bono, Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, Japan. E-mail:
| |
Collapse
|
23
|
Simm D, Kollmar M. Waggawagga-CLI: A command-line tool for predicting stable single α-helices (SAH-domains), and the SAH-domain distribution across eukaryotes. PLoS One 2018; 13:e0191924. [PMID: 29444145 PMCID: PMC5812594 DOI: 10.1371/journal.pone.0191924] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 01/12/2018] [Indexed: 12/15/2022] Open
Abstract
Stable single-alpha helices (SAH-domains) function as rigid connectors and constant force springs between structural domains, and can provide contact surfaces for protein-protein and protein-RNA interactions. SAH-domains mainly consist of charged amino acids and are monomeric and stable in polar solutions, characteristics which distinguish them from coiled-coil domains and intrinsically disordered regions. Although the number of reported SAH-domains is steadily increasing, genome-wide analyses of SAH-domains in eukaryotic genomes are still missing. Here, we present Waggawagga-CLI, a command-line tool for predicting and analysing SAH-domains in protein sequence datasets. Using Waggawagga-CLI we predicted SAH-domains in 24 datasets from eukaryotes across the tree of life. SAH-domains were predicted in 0.5 to 3.5% of the protein-coding content per species. SAH-domains are particularly present in longer proteins supporting their function as structural building block in multi-domain proteins. In human, SAH-domains are mainly used as alternative building blocks not being present in all transcripts of a gene. Gene ontology analysis showed that yeast proteins with SAH-domains are particular enriched in macromolecular complex subunit organization, cellular component biogenesis and RNA metabolic processes, and that they have a strong nuclear and ribonucleoprotein complex localization and function in ribosome and nucleic acid binding. Human proteins with SAH-domains have roles in all types of RNA processing and cytoskeleton organization, and are predicted to function in RNA binding, protein binding involved in cell and cell-cell adhesion, and cytoskeletal protein binding. Waggawagga-CLI allows the user to adjust the stabilizing and destabilizing contribution of amino acid interactions in i,i+3 and i,i+4 spacings, and provides extensive flexibility for user-designed analyses.
Collapse
Affiliation(s)
- Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany
- Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Göttingen, Germany
- * E-mail:
| |
Collapse
|
24
|
Phylogenomic analysis demonstrates a pattern of rare and long-lasting concerted evolution in prokaryotes. Commun Biol 2018; 1:12. [PMID: 30271899 PMCID: PMC6053082 DOI: 10.1038/s42003-018-0014-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 01/11/2018] [Indexed: 12/15/2022] Open
Abstract
Concerted evolution, where paralogs in the same species show higher sequence similarity to each other than to orthologs in other species, is widely found in many species. However, cases of concerted evolution that last for hundreds of millions of years are very rare. By genome-wide analysis of a broad selection of prokaryotes, we provide strong evidence of recurrent concerted evolution in 26 genes, most of which have lasted more than ~500 million years. We find that most concertedly evolving genes are key members of important pathways, and encode proteins from the same complexes and/or pathways, suggesting coevolution of genes via concerted evolution to maintain gene balance. We also present LRCE-DB, a comprehensive online repository of long-lasting concerted evolution. Collectively, our study reveals that although most duplicated genes may diverge in sequence over a long period, on rare occasions this constraint can be breached, leading to unexpected long-lasting concerted evolution in a recurrent manner. Sishuo Wang and Youhua Chen present an analysis of concerted evolution in prokaryotes using a new computational pipeline, iSeeCE. They find evidence in 26 genes for recurrent concerted evolution, most of which last more than ~500 million years, and provide a database, LRCE-DB, for data exploration.
Collapse
|
25
|
Carmona R, Arroyo M, Jiménez-Quesada MJ, Seoane P, Zafra A, Larrosa R, Alché JDD, Claros MG. Automated identification of reference genes based on RNA-seq data. Biomed Eng Online 2017; 16:65. [PMID: 28830520 PMCID: PMC5568602 DOI: 10.1186/s12938-017-0356-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background Gene expression analyses demand appropriate reference genes (RGs) for normalization, in order to obtain reliable assessments. Ideally, RG expression levels should remain constant in all cells, tissues or experimental conditions under study. Housekeeping genes traditionally fulfilled this requirement, but they have been reported to be less invariant than expected; therefore, RGs should be tested and validated for every particular situation. Microarray data have been used to propose new RGs, but only a limited set of model species and conditions are available; on the contrary, RNA-seq experiments are more and more frequent and constitute a new source of candidate RGs. Results An automated workflow based on mapped NGS reads has been constructed to obtain highly and invariantly expressed RGs based on a normalized expression in reads per mapped million and the coefficient of variation. This workflow has been tested with Roche/454 reads from reproductive tissues of olive tree (Olea europaea L.), as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana and three different human cancers (prostate, small-cell cancer lung and lung adenocarcinoma). Candidate RGs have been proposed for each species and many of them have been previously reported as RGs in literature. Experimental validation of significant RGs in olive tree is provided to support the algorithm. Conclusion Regardless sequencing technology, number of replicates, and library sizes, when RNA-seq experiments are designed and performed, the same datasets can be analyzed with our workflow to extract suitable RGs for subsequent PCR validation. Moreover, different subset of experimental conditions can provide different suitable RGs. Electronic supplementary material The online version of this article (doi:10.1186/s12938-017-0356-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rosario Carmona
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, Granada, Spain
| | - Macarena Arroyo
- Servicio de Neumología, Hospital Regional Universitario de Málaga, Avda Carlos Haya s/n, Malaga, Spain
| | - María José Jiménez-Quesada
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, Granada, Spain
| | - Pedro Seoane
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, Malaga, Spain
| | - Adoración Zafra
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, Granada, Spain
| | - Rafael Larrosa
- Departamento de Arquitectura de Computadores, Universidad de Málaga, Malaga, Spain
| | - Juan de Dios Alché
- Plant Reproductive Biology Laboratory, Department of Biochemistry, Cell and Molecular Biology of Plants, Estación Experimental del Zaidín, CSIC, Granada, Spain
| | - M Gonzalo Claros
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, Malaga, Spain.
| |
Collapse
|
26
|
ProtozoaDB 2.0: A Trypanosoma Brucei Case Study. Pathogens 2017; 6:pathogens6030032. [PMID: 28726736 PMCID: PMC5617989 DOI: 10.3390/pathogens6030032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 07/16/2017] [Accepted: 07/16/2017] [Indexed: 01/12/2023] Open
Abstract
Over the last decade new species of Protozoa have been sequenced and deposited in GenBank. Analyzing large amounts of genomic data, especially using Next Generation Sequencing (NGS), is not a trivial task, considering that researchers used to deal or focus their studies on few genes or gene families or even small genomes. To facilitate the information extraction process from genomic data, we developed a database system called ProtozoaDB that included five genomes of Protozoa in its first version. In the present study, we present a new version of ProtozoaDB called ProtozoaDB 2.0, now with the genomes of 22 pathogenic Protozoa. The system has been fully remodeled to allow for new tools and a more expanded view of data, and now includes a number of analyses such as: (i) similarities with other databases (model organisms, the Conserved Domains Database, and the Protein Data Bank); (ii) visualization of KEGG metabolic pathways; (iii) the protein structure from PDB; (iv) homology inferences; (v) the search for related publications in PubMed; (vi) superfamily classification; and (vii) phenotype inferences based on comparisons with model organisms. ProtozoaDB 2.0 supports RESTful Web Services to make data access easier. Those services were written in Ruby language using Ruby on Rails (RoR). This new version also allows a more detailed analysis of the object of study, as well as expanding the number of genomes and proteomes available to the scientific community. In our case study, a group of prenyltransferase proteinsalready described in the literature was found to be a good drug target for Trypanosomatids.
Collapse
|
27
|
Lipowski D, Popiel M, Perlejewski K, Nakamura S, Bukowska-Osko I, Rzadkiewicz E, Dzieciatkowski T, Milecka A, Wenski W, Ciszek M, Debska-Slizien A, Ignacak E, Cortes KC, Pawelczyk A, Horban A, Radkowski M, Laskus T. A Cluster of Fatal Tick-borne Encephalitis Virus Infection in Organ Transplant Setting. J Infect Dis 2017; 215:896-901. [PMID: 28453842 DOI: 10.1093/infdis/jix040] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 01/17/2017] [Indexed: 12/15/2022] Open
Abstract
Background Tick-borne encephalitis virus (TBEV) infection has become a major health problem in Europe and is currently a common cause of viral brain infection in many countries. Encephalitis in transplant recipients, althrough rare, is becoming a recognized complication. Our study provides the first description of transmission of TBEV through transplantation of solid organs. Methods Three patients who received solid organ transplants from a single donor (2 received kidney, and 1 received liver) developed encephalitis 17-49 days after transplantation and subsequently died. Blood and autopsy tissue samples were tested by next-generation sequencing (NGS) and reverse transcription polymerase chain reaction (RT-PCR). Results All 3 recipients were first analyzed in autopsy brain tissue samples and/or cerebrospinal fluid by NGS, which yielded 24-52 million sequences per sample and 9-988 matched TBEV sequences in each patient. The presence of TBEV was confirmed by RT-PCR in all recipients and in the donor, and direct sequencing of amplification products corroborated the presence of the same viral strain. Conclusions We demonstrated transmission of TBEV by transplantation of solid organs. In such a setting, TBEV infection may be fatal, probably due to pharmacological immunosuppression. Organ donors should be screened for TBEV when coming from or visiting endemic areas.
Collapse
Affiliation(s)
- Dariusz Lipowski
- Department of Infectious Diseases, Warsaw Medical University, Warsaw, Poland
| | - Marta Popiel
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| | - Karol Perlejewski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| | - Shota Nakamura
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Iwona Bukowska-Osko
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| | - Ewa Rzadkiewicz
- Department of Infectious Diseases, Warsaw Medical University, Warsaw, Poland
| | | | - Anna Milecka
- Department of General and Endocrine Surgery and Transplantation Medical University of Gdansk, Gdansk, Poland
| | | | - Michal Ciszek
- Department of Immunology, Warsaw Medical University, Warsaw, Poland
| | - Alicja Debska-Slizien
- Department of Nephrology, Transplantation and Internal Diseases, Gdansk Medical University, Gdansk, Poland
| | - Ewa Ignacak
- Department of Nephrology, Kraków Medical University Hospital, Poland
| | - Kamila Caraballo Cortes
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| | - Agnieszka Pawelczyk
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| | - Andrzej Horban
- Department of Infectious Diseases, Warsaw Medical University, Warsaw, Poland
| | - Marek Radkowski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| | - Tomasz Laskus
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, Warsaw, Poland
| |
Collapse
|
28
|
Ezra Tsur E. Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces. BioData Min 2017; 10:11. [PMID: 28293298 PMCID: PMC5346198 DOI: 10.1186/s13040-017-0130-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 02/14/2017] [Indexed: 11/18/2022] Open
Abstract
Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.
Collapse
Affiliation(s)
- Elishai Ezra Tsur
- Neuro-Biomorphic Engineering lab, Faculty of Engineering, Jerusalem College of Technology, Jerusalem, Israel
| |
Collapse
|
29
|
Sturmberger L, Chappell T, Geier M, Krainer F, Day KJ, Vide U, Trstenjak S, Schiefer A, Richardson T, Soriaga L, Darnhofer B, Birner-Gruenberger R, Glick BS, Tolstorukov I, Cregg J, Madden K, Glieder A. Refined Pichia pastoris reference genome sequence. J Biotechnol 2016; 235:121-31. [PMID: 27084056 PMCID: PMC5089815 DOI: 10.1016/j.jbiotec.2016.04.023] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Revised: 04/08/2016] [Accepted: 04/11/2016] [Indexed: 11/16/2022]
Abstract
Strains of the species Komagataella phaffii are the most frequently used "Pichia pastoris" strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76kbp between two previous gaps on chromosome 1 and another 134kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5kbp found in some strains of P. pastoris.
Collapse
Affiliation(s)
- Lukas Sturmberger
- Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria
| | - Thomas Chappell
- BioGrammatics Inc., 2120 Las Palmas Drive, Carlsbad, CA 92011, United States
| | - Martina Geier
- Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria
| | - Florian Krainer
- Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria
| | - Kasey J Day
- Department of Molecular Genetics and Cell Biology, University of Chicago, 920 East 58th St., Chicago, IL 60637, United States
| | - Ursa Vide
- Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria
| | - Sara Trstenjak
- Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria
| | - Anja Schiefer
- Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria
| | - Toby Richardson
- Synthetic Genomics, Inc., 11149 North Torrey Pines Rd., La Jolla, CA 92037, United States
| | - Leah Soriaga
- Synthetic Genomics, Inc., 11149 North Torrey Pines Rd., La Jolla, CA 92037, United States
| | - Barbara Darnhofer
- Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria; Institute of Pathology, Research Unit Functional Proteomics and Metabolic Pathways, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010 Graz, Austria
| | - Ruth Birner-Gruenberger
- Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria; Institute of Pathology, Research Unit Functional Proteomics and Metabolic Pathways, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010 Graz, Austria
| | - Benjamin S Glick
- Department of Molecular Genetics and Cell Biology, University of Chicago, 920 East 58th St., Chicago, IL 60637, United States
| | - Ilya Tolstorukov
- BioGrammatics Inc., 2120 Las Palmas Drive, Carlsbad, CA 92011, United States; Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, United States
| | - James Cregg
- BioGrammatics Inc., 2120 Las Palmas Drive, Carlsbad, CA 92011, United States; Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, United States
| | - Knut Madden
- BioGrammatics Inc., 2120 Las Palmas Drive, Carlsbad, CA 92011, United States
| | - Anton Glieder
- Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria; Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria; bisy e.U., Wetzawinkel 20, 8200 Hofstaetten/Raab, Austria.
| |
Collapse
|
30
|
Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 2016. [PMID: 27252236 DOI: 10.1101/021626v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2023]
Abstract
TransRate is a tool for reference-free quality assessment of de novo transcriptome assemblies. Using only the sequenced reads and the assembly as input, we show that multiple common artifacts of de novo transcriptome assembly can be readily detected. These include chimeras, structural errors, incomplete assembly, and base errors. TransRate evaluates these errors to produce a diagnostic quality score for each contig, and these contig scores are integrated to evaluate whole assemblies. Thus, TransRate can be used for de novo assembly filtering and optimization as well as comparison of assemblies generated using different methods from the same input reads. Applying the method to a data set of 155 published de novo transcriptome assemblies, we deconstruct the contribution that assembly method, read length, read quantity, and read quality make to the accuracy of de novo transcriptome assemblies and reveal that variance in the quality of the input data explains 43% of the variance in the quality of published de novo transcriptome assemblies. Because TransRate is reference-free, it is suitable for assessment of assemblies of all types of RNA, including assemblies of long noncoding RNA, rRNA, mRNA, and mixed RNA samples.
Collapse
Affiliation(s)
- Richard Smith-Unna
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Chris Boursnell
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794-4400, USA
| | - Julian M Hibberd
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, United Kingdom
| |
Collapse
|
31
|
Smith-Unna R, Boursnell C, Patro R, Hibberd JM, Kelly S. TransRate: reference-free quality assessment of de novo transcriptome assemblies. Genome Res 2016; 26:1134-44. [PMID: 27252236 PMCID: PMC4971766 DOI: 10.1101/gr.196469.115] [Citation(s) in RCA: 460] [Impact Index Per Article: 57.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/27/2016] [Indexed: 11/24/2022]
Abstract
TransRate is a tool for reference-free quality assessment of de novo transcriptome assemblies. Using only the sequenced reads and the assembly as input, we show that multiple common artifacts of de novo transcriptome assembly can be readily detected. These include chimeras, structural errors, incomplete assembly, and base errors. TransRate evaluates these errors to produce a diagnostic quality score for each contig, and these contig scores are integrated to evaluate whole assemblies. Thus, TransRate can be used for de novo assembly filtering and optimization as well as comparison of assemblies generated using different methods from the same input reads. Applying the method to a data set of 155 published de novo transcriptome assemblies, we deconstruct the contribution that assembly method, read length, read quantity, and read quality make to the accuracy of de novo transcriptome assemblies and reveal that variance in the quality of the input data explains 43% of the variance in the quality of published de novo transcriptome assemblies. Because TransRate is reference-free, it is suitable for assessment of assemblies of all types of RNA, including assemblies of long noncoding RNA, rRNA, mRNA, and mixed RNA samples.
Collapse
Affiliation(s)
- Richard Smith-Unna
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Chris Boursnell
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York 11794-4400, USA
| | - Julian M Hibberd
- Department of Plant Sciences, University of Cambridge, Cambridge CB2 3EA, United Kingdom
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, United Kingdom
| |
Collapse
|
32
|
The Widespread Prevalence and Functional Significance of Silk-Like Structural Proteins in Metazoan Biological Materials. PLoS One 2016; 11:e0159128. [PMID: 27415783 PMCID: PMC4944945 DOI: 10.1371/journal.pone.0159128] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 06/28/2016] [Indexed: 01/05/2023] Open
Abstract
In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs) identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the construction of biological materials with diverse morphologies. The SilkSlider predictor software developed here is available at https://github.com/wwood/SilkSlider.
Collapse
|
33
|
Bolleman JT, Mungall CJ, Strozzi F, Baran J, Dumontier M, Bonnal RJP, Buels R, Hoehndorf R, Fujisawa T, Katayama T, Cock PJA. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation. J Biomed Semantics 2016; 7:39. [PMID: 27296299 PMCID: PMC4907002 DOI: 10.1186/s13326-016-0067-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 03/17/2016] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. DESCRIPTION We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. CONCLUSIONS Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
Collapse
Affiliation(s)
- Jerven T Bolleman
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel, Servet, Geneva 4, 1211, Switzerland.
| | | | | | - Joachim Baran
- CODAMONO, 5-121 Marion Street, Toronto, M6R 1E6, Ontario, Canada
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, 1265 Welch Road, Room X223, Stanford, 94305-5479, CA, US
| | - Raoul J P Bonnal
- Integrative Biology Program, Istituto Nazionale Genetica Molecolare, Milan, Italy
| | - Robert Buels
- University of California, Berkeley, Berkeley, CA, USA
| | | | - Takatomo Fujisawa
- Center for Information Biology, National Institute of Genetics, Research Organization of Information and Systems, 1111 Yata, Mishima, Shizuoka, 411-08540, Japan
| | - Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan
| | | |
Collapse
|
34
|
Veltri D, Wight MM, Crouch JA. SimpleSynteny: a web-based tool for visualization of microsynteny across multiple species. Nucleic Acids Res 2016; 44:W41-5. [PMID: 27141960 PMCID: PMC4987899 DOI: 10.1093/nar/gkw330] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 04/17/2016] [Indexed: 11/14/2022] Open
Abstract
Defining syntenic relationships among orthologous gene clusters is a frequent undertaking of biologists studying organismal evolution through comparative genomic approaches. With the increasing availability of genome data made possible through next-generation sequencing technology, there is a growing need for user-friendly tools capable of assessing synteny. Here we present SimpleSynteny, a new web-based platform capable of directly interrogating collinearity of local genomic neighbors across multiple species in a targeted manner. SimpleSynteny provides a pipeline for evaluating the synteny of a preselected set of gene targets across multiple organismal genomes. An emphasis has been placed on ease-of-use, and users are only required to submit FASTA files for their genomes and genes of interest. SimpleSynteny then guides the user through an iterative process of exploring and customizing genomes individually before combining them into a final high-resolution figure. Because the process is iterative, it allows the user to customize the organization of multiple contigs and incorporate knowledge from additional sources, rather than forcing complete dependence on the computational predictions. Additional tools are provided to help the user identify which contigs in a genome assembly contain gene targets and to optimize analyses of circular genomes. SimpleSynteny is freely available at: http://www.SimpleSynteny.com.
Collapse
Affiliation(s)
- Daniel Veltri
- Systematic Mycology and Microbiology Laboratory, U.S. Department of Agriculture (USDA), Agricultural Research Service (ARS), 10300 Baltimore Avenue, Building 10A, Beltsville, MD 20705, USA Oak Ridge Institute for Science and Education ARS Research Program, MC-100-44 P.O. Box 117, Oak Ridge, TN 37831, USA
| | - Martha Malapi Wight
- Systematic Mycology and Microbiology Laboratory, U.S. Department of Agriculture (USDA), Agricultural Research Service (ARS), 10300 Baltimore Avenue, Building 10A, Beltsville, MD 20705, USA
| | - Jo Anne Crouch
- Systematic Mycology and Microbiology Laboratory, U.S. Department of Agriculture (USDA), Agricultural Research Service (ARS), 10300 Baltimore Avenue, Building 10A, Beltsville, MD 20705, USA
| |
Collapse
|
35
|
Milicchio F, Rose R, Bian J, Min J, Prosperi M. Visual programming for next-generation sequencing data analytics. BioData Min 2016; 9:16. [PMID: 27127540 PMCID: PMC4848821 DOI: 10.1186/s13040-016-0095-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 04/21/2016] [Indexed: 11/10/2022] Open
Abstract
Background High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a ‘cultural’ gap between the end user and the developer. Text Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users’ needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations. Conclusion In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare.
Collapse
Affiliation(s)
| | | | - Jiang Bian
- Department of Health Outcomes and Policy, University of Florida, Gainesville, FL USA
| | - Jae Min
- Department of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, 32610-0231 FL USA
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions & College of Medicine, University of Florida, 2004 Mowry Road, Gainesville, 32610-0231 FL USA
| |
Collapse
|
36
|
Cromer D, Schlub TE, Smyth RP, Grimm AJ, Chopra A, Mallal S, Davenport MP, Mak J. HIV-1 Mutation and Recombination Rates Are Different in Macrophages and T-cells. Viruses 2016; 8:118. [PMID: 27110814 PMCID: PMC4848610 DOI: 10.3390/v8040118] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Revised: 04/05/2016] [Accepted: 04/19/2016] [Indexed: 11/16/2022] Open
Abstract
High rates of mutation and recombination help human immunodeficiency virus (HIV) to evade the immune system and develop resistance to antiretroviral therapy. Macrophages and T-cells are the natural target cells of HIV-1 infection. A consensus has not been reached as to whether HIV replication results in differential recombination between primary T-cells and macrophages. Here, we used HIV with silent mutation markers along with next generation sequencing to compare the mutation and the recombination rates of HIV directly in T lymphocytes and macrophages. We observed a more than four-fold higher recombination rate of HIV in macrophages compared to T-cells (p < 0.001) and demonstrated that this difference is not due to different reliance on C-X-C chemokine receptor type 4 (CXCR4) and C-C chemokine receptor type 5 (CCR5) co-receptors between T-cells and macrophages. We also found that the pattern of recombination across the HIV genome (hot and cold spots) remains constant between T-cells and macrophages despite a three-fold increase in the overall recombination rate. This indicates that the difference in rates is a general feature of HIV DNA synthesis during macrophage infection. In contrast to HIV recombination, we found that T-cells have a 30% higher mutation rate than macrophages (p < 0.001) and that the mutational profile is similar between these cell types. Unexpectedly, we found no association between mutation and recombination in macrophages, in contrast to T-cells. Our data highlights some of the fundamental difference of HIV recombination and mutation amongst these two major target cells of infection. Understanding these differences will provide invaluable insights toward HIV evolution and how the virus evades immune surveillance and anti-retroviral therapeutics.
Collapse
Affiliation(s)
- Deborah Cromer
- Infection Analytics Program, Kirby Institute, UNSW Australia, Sydney NSW 2052, Australia.
- Centre for Vascular Research, UNSW Australia, Sydney NSW 2052, Australia.
| | - Timothy E Schlub
- Sydney School of Public Health, Sydney Medical School, University of Sydney, Sydney NSW 2006, Australia.
| | - Redmond P Smyth
- Centre for Virology, Burnet Institute, Melbourne VIC 3004, Australia.
- Architecture et Réactivité de l'ARN, IBMC, CNRS, Université de Strasbourg, 67084 Strasbourg, France.
| | - Andrew J Grimm
- Infection Analytics Program, Kirby Institute, UNSW Australia, Sydney NSW 2052, Australia.
| | - Abha Chopra
- Institute for Immunology and Infectious Diseases (IIID), Murdoch University, Perth WA 6150, Australia.
| | - Simon Mallal
- Institute for Immunology and Infectious Diseases (IIID), Murdoch University, Perth WA 6150, Australia.
| | - Miles P Davenport
- Infection Analytics Program, Kirby Institute, UNSW Australia, Sydney NSW 2052, Australia.
- Centre for Vascular Research, UNSW Australia, Sydney NSW 2052, Australia.
| | - Johnson Mak
- Biosecurity Flagship, CSIRO (AAHL), Geelong VIC 3220, Australia.
- School of Medicine, Deakin University and CSIRO (AAHL), Geelong VIC 3216, Australia.
| |
Collapse
|
37
|
Alves C, Iacovelli F, Falconi M, Cardamone F, Morozzo Della Rocca B, de Oliveira CLP, Desideri A. A Simple and Fast Semiautomatic Procedure for the Atomistic Modeling of Complex DNA Polyhedra. J Chem Inf Model 2016; 56:941-9. [PMID: 27050675 DOI: 10.1021/acs.jcim.5b00586] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
A semiautomatic procedure to build complex atomistic covalently linked DNA nanocages has been implemented in a user-friendly, free, and fast program. As a test set, seven different truncated DNA polyhedra, composed by B-DNA double helices connected through short single-stranded linkers, have been generated. The atomistic structures, including a tetrahedron, a cube, an octahedron, a dodecahedron, a triangular prism, a pentagonal prism, and a hexagonal prism, have been probed through classical molecular dynamics and analyzed to evaluate their structural and dynamical properties and to highlight possible building faults. The analysis of the simulated trajectories also allows us to investigate the role of the different geometries in defining nanocages stability and flexibility. The data indicate that the cages are stable and that their structural and dynamical parameters measured along the trajectories are slightly affected by the different geometries. These results demonstrate that the constraints imposed by the covalent links induce an almost identical conformational variability independently of the three-dimensional geometry and that the program presented here is a reliable and valid tool to engineer DNA nanostructures.
Collapse
Affiliation(s)
- Cassio Alves
- Instituto de Fisica, Grupo de Fluidos Complexos, Universidade de São Paulo , Caixa Postal 66318, 05314-970 Sao Paulo, Brazil.,Department of Engineering and Sciences, Federal University of Paraná , 85950-000 Palotina, Paraná, Brazil
| | - Federico Iacovelli
- Department of Biology, University of Rome "Tor Vergata" , Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Mattia Falconi
- Department of Biology, University of Rome "Tor Vergata" , Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Francesca Cardamone
- Department of Biology, University of Rome "Tor Vergata" , Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Blasco Morozzo Della Rocca
- Department of Biology, University of Rome "Tor Vergata" , Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Cristiano L P de Oliveira
- Instituto de Fisica, Grupo de Fluidos Complexos, Universidade de São Paulo , Caixa Postal 66318, 05314-970 Sao Paulo, Brazil
| | - Alessandro Desideri
- Department of Biology, University of Rome "Tor Vergata" , Via della Ricerca Scientifica, 00133 Rome, Italy
| |
Collapse
|
38
|
Cuadrat RRC, Ferrera I, Grossart HP, Dávila AMR. Picoplankton Bloom in Global South? A High Fraction of Aerobic Anoxygenic Phototrophic Bacteria in Metagenomes from a Coastal Bay (Arraial do Cabo--Brazil). OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2016; 20:76-87. [PMID: 26871866 PMCID: PMC4770915 DOI: 10.1089/omi.2015.0142] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Marine habitats harbor a great diversity of microorganism from the three domains of life, only a small fraction of which can be cultivated. Metagenomic approaches are increasingly popular for addressing microbial diversity without culture, serving as sensitive and relatively unbiased methods for identifying and cataloging the diversity of nucleic acid sequences derived from organisms in environmental samples. Aerobic anoxygenic phototrophic bacteria (AAP) play important roles in carbon and energy cycling in aquatic systems. In oceans, those bacteria are widely distributed; however, their abundance and importance are still poorly understood. The aim of this study was to estimate abundance and diversity of AAPs in metagenomes from an upwelling affected coastal bay in Arraial do Cabo, Brazil, using in silico screening for the anoxygenic photosynthesis core genes. Metagenomes from the Global Ocean Sample Expedition (GOS) were screened for comparative purposes. AAPs were highly abundant in the free-living bacterial fraction from Arraial do Cabo: 23.88% of total bacterial cells, compared with 15% in the GOS dataset. Of the ten most AAP abundant samples from GOS, eight were collected close to the Equator where solar irradiation is high year-round. We were able to assign most retrieved sequences to phylo-groups, with a particularly high abundance of Roseobacter in Arraial do Cabo samples. The high abundance of AAP in this tropical bay may be related to the upwelling phenomenon and subsequent picoplankton bloom. These results suggest a link between upwelling and light abundance and demonstrate AAP even in oligotrophic tropical and subtropical environments. Longitudinal studies in the Arraial do Cabo region are warranted to understand the dynamics of AAP at different locations and seasons, and the ecological role of these unique bacteria for biogeochemical and energy cycling in the ocean.
Collapse
Affiliation(s)
- Rafael R C Cuadrat
- 1 Computational and Systems Biology Laboratory, Oswaldo Cruz Institute , Fiocruz, Brazil .,2 Leibniz-Institute of Freshwater Ecology and Inland Fisheries , Berlin, Germany .,5 Berlin Center for Genomics in Biodiversity Research , Berlin, Germany
| | - Isabel Ferrera
- 2 Leibniz-Institute of Freshwater Ecology and Inland Fisheries , Berlin, Germany .,4 Institut de Ciències del Mar , CSIC, Barcelona, Spain
| | - Hans-Peter Grossart
- 2 Leibniz-Institute of Freshwater Ecology and Inland Fisheries , Berlin, Germany .,3 Potsdam University, Institute for Biochemistry and Biology , Potsdam, Germany
| | - Alberto M R Dávila
- 1 Computational and Systems Biology Laboratory, Oswaldo Cruz Institute , Fiocruz, Brazil
| |
Collapse
|
39
|
Syme RA, Tan KC, Hane JK, Dodhia K, Stoll T, Hastie M, Furuki E, Ellwood SR, Williams AH, Tan YF, Testa AC, Gorman JJ, Oliver RP. Comprehensive Annotation of the Parastagonospora nodorum Reference Genome Using Next-Generation Genomics, Transcriptomics and Proteogenomics. PLoS One 2016; 11:e0147221. [PMID: 26840125 PMCID: PMC4739733 DOI: 10.1371/journal.pone.0147221] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Accepted: 12/30/2015] [Indexed: 11/29/2022] Open
Abstract
Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models.
Collapse
Affiliation(s)
- Robert A. Syme
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | - Kar-Chun Tan
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | - James K. Hane
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
- Curtin Institute for Computation, Curtin University, Bentley, WA, Australia
| | - Kejal Dodhia
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | - Thomas Stoll
- Protein Discovery Centre, QIMR Berghofer Medical Research Institute, Herston, Qld, Australia
| | - Marcus Hastie
- Protein Discovery Centre, QIMR Berghofer Medical Research Institute, Herston, Qld, Australia
| | - Eiko Furuki
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | - Simon R. Ellwood
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | - Angela H. Williams
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | | | - Alison C. Testa
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
| | - Jeffrey J. Gorman
- Protein Discovery Centre, QIMR Berghofer Medical Research Institute, Herston, Qld, Australia
| | - Richard P. Oliver
- Centre for Crop & Disease Management, Department of Environment and Agriculture, Curtin University, Bentley, WA, Australia
- * E-mail:
| |
Collapse
|
40
|
Fukushima A, Nakamura M, Suzuki H, Yamazaki M, Knoch E, Mori T, Umemoto N, Morita M, Hirai G, Sodeoka M, Saito K. Comparative Characterization of the Leaf Tissue of Physalis alkekengi and Physalis peruviana Using RNA-seq and Metabolite Profiling. FRONTIERS IN PLANT SCIENCE 2016; 7:1883. [PMID: 28066454 PMCID: PMC5167740 DOI: 10.3389/fpls.2016.01883] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 11/29/2016] [Indexed: 05/07/2023]
Abstract
The genus Physalis in the Solanaceae family contains several species of benefit to humans. Examples include P. alkekengi (Chinese-lantern plant, hôzuki in Japanese) used for medicinal and for decorative purposes, and P. peruviana, also known as Cape gooseberry, which bears an edible, vitamin-rich fruit. Members of the Physalis genus are a valuable resource for phytochemicals needed for the development of medicines and functional foods. To fully utilize the potential of these phytochemicals we need to understand their biosynthesis, and for this we need genomic data, especially comprehensive transcriptome datasets for gene discovery. We report the de novo assembly of the transcriptome from leaves of P. alkekengi and P. peruviana using Illumina RNA-seq technologies. We identified 75,221 unigenes in P. alkekengi and 54,513 in P. peruviana. All unigenes were annotated with gene ontology (GO), Enzyme Commission (EC) numbers, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG). We classified unigenes encoding enzyme candidates putatively involved in the secondary metabolism and identified more than one unigenes for each step in terpenoid backbone- and steroid biosynthesis in P. alkekengi and P. peruviana. To measure the variability of the withanolides including physalins and provide insights into their chemical diversity in Physalis, we also analyzed the metabolite content in leaves of P. alkekengi and P. peruviana at five different developmental stages by liquid chromatography-mass spectrometry. We discuss that comprehensive transcriptome approaches within a family can yield a clue for gene discovery in Physalis and provide insights into their complex chemical diversity. The transcriptome information we submit here will serve as an important public resource for further studies of the specialized metabolism of Physalis species.
Collapse
Affiliation(s)
- Atsushi Fukushima
- RIKEN Center for Sustainable Resource ScienceYokohama, Japan
- *Correspondence: Atsushi Fukushima, Kazuki Saito,
| | - Michimi Nakamura
- Graduate School of Pharmaceutical Sciences, Chiba UniversityChiba, Japan
| | - Hideyuki Suzuki
- Department of Biotechnology Research, Kazusa DNA Research InstituteChiba, Japan
| | - Mami Yamazaki
- Graduate School of Pharmaceutical Sciences, Chiba UniversityChiba, Japan
| | - Eva Knoch
- RIKEN Center for Sustainable Resource ScienceYokohama, Japan
| | - Tetsuya Mori
- RIKEN Center for Sustainable Resource ScienceYokohama, Japan
| | - Naoyuki Umemoto
- RIKEN Center for Sustainable Resource ScienceYokohama, Japan
| | - Masaki Morita
- Synthetic Organic Chemistry Laboratory, RIKENSaitama, Japan
| | - Go Hirai
- Synthetic Organic Chemistry Laboratory, RIKENSaitama, Japan
- RIKEN Center for Sustainable Resource ScienceSaitama, Japan
| | - Mikiko Sodeoka
- Synthetic Organic Chemistry Laboratory, RIKENSaitama, Japan
- RIKEN Center for Sustainable Resource ScienceSaitama, Japan
| | - Kazuki Saito
- RIKEN Center for Sustainable Resource ScienceYokohama, Japan
- Graduate School of Pharmaceutical Sciences, Chiba UniversityChiba, Japan
- *Correspondence: Atsushi Fukushima, Kazuki Saito,
| |
Collapse
|
41
|
Perlejewski K, Bukowska-Ośko I, Nakamura S, Motooka D, Stokowy T, Płoski R, Rydzanicz M, Zakrzewska-Pniewska B, Podlecka-Piętowska A, Nojszewska M, Gogol A, Caraballo Cortés K, Demkow U, Stępień A, Laskus T, Radkowski M. Metagenomic Analysis of Cerebrospinal Fluid from Patients with Multiple Sclerosis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 935:89-98. [PMID: 27311319 DOI: 10.1007/5584_2016_25] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Multiple sclerosis (MS) is a chronic inflammatory demyelinating disease of central nervous system of unknown etiology. However, some infectious agents have been suggested to play a significant role in its pathogenesis. Next-generation sequencing (NGS) and metagenomics can be employed to characterize microbiome of MS patients and to identify potential causative pathogens. In this study, 12 patients with idiopathic inflammatory demyelinating disorders (IIDD) of the central nervous system were studied: one patient had clinically isolated syndrome, one patient had recurrent optic neuritis, and ten patients had multiple sclerosis (MS). In addition, there was one patient with other non-inflammatory neurological disease. Cerebrospinal fluid (CSF) was sampled from all patients. RNA was extracted from CSF and subjected to a single-primer isothermal amplification followed by NGS and comprehensive data analysis. Altogether 441,608,474 reads were obtained and mapped using blastn. In a CSF sample from the patient with clinically isolated syndrome, 11 varicella-zoster virus reads were found. Other than that similar bacterial, fungal, parasitic, and protozoan reads were identified in all samples, indicating a common presence of contamination in metagenomics. In conclusion, we identified varicella zoster virus sequences in one out of the 12 patients with IIDD, which suggests that this virus could be occasionally related to the MS pathogenesis. A widespread bacterial contamination seems inherent to NGS and complicates the interpretation of results.
Collapse
Affiliation(s)
- Karol Perlejewski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland
| | - Iwona Bukowska-Ośko
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland.
| | - Shota Nakamura
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Osaka, Japan
| | - Daisuke Motooka
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Osaka, Japan
| | - Tomasz Stokowy
- Department of Clinical Science, University of Bergen, Bergen, 5021, Norway
| | - Rafał Płoski
- Department of the Medical Genetics, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland
| | - Małgorzata Rydzanicz
- Department of the Medical Genetics, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland
| | | | | | - Monika Nojszewska
- Department of Neurology, Warsaw Medical University, 1A Banacha, Warsaw, 02-097, Poland
| | - Anna Gogol
- Department of Neurology, Warsaw Medical University, 1A Banacha, Warsaw, 02-097, Poland
| | - Kamila Caraballo Cortés
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland
| | - Urszula Demkow
- Department of Laboratory Medicine and Clinical Immunology of Developmental Age, Medical University of Warsaw, 24 Marszałkowska Street, Warsaw, 00-576, Poland
| | - Adam Stępień
- Department of Neurology, Military Institute of Medicine, 128 Szaserów Street, Warsaw, 04-141, Poland
| | - Tomasz Laskus
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland
| | - Marek Radkowski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawińskiego Street, Warsaw, 02-106, Poland
| |
Collapse
|
42
|
Kensche PR, Hoeijmakers WAM, Toenhake CG, Bras M, Chappell L, Berriman M, Bártfai R. The nucleosome landscape of Plasmodium falciparum reveals chromatin architecture and dynamics of regulatory sequences. Nucleic Acids Res 2015; 44:2110-24. [PMID: 26578577 PMCID: PMC4797266 DOI: 10.1093/nar/gkv1214] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 10/28/2015] [Indexed: 11/13/2022] Open
Abstract
In eukaryotes, the chromatin architecture has a pivotal role in regulating all DNA-associated processes and it is central to the control of gene expression. For Plasmodium falciparum, a causative agent of human malaria, the nucleosome positioning profile of regulatory regions deserves particular attention because of their extreme AT-content. With the aid of a highly controlled MNase-seq procedure we reveal how positioning of nucleosomes provides a structural and regulatory framework to the transcriptional unit by demarcating landmark sites (transcription/translation start and end sites). In addition, our analysis provides strong indications for the function of positioned nucleosomes in splice site recognition. Transcription start sites (TSSs) are bordered by a small nucleosome-depleted region, but lack the stereotypic downstream nucleosome arrays, highlighting a key difference in chromatin organization compared to model organisms. Furthermore, we observe transcription-coupled eviction of nucleosomes on strong TSSs during intraerythrocytic development and demonstrate that nucleosome positioning and dynamics can be predictive for the functionality of regulatory DNA elements. Collectively, the strong nucleosome positioning over splice sites and surrounding putative transcription factor binding sites highlights the regulatory capacity of the nucleosome landscape in this deadly human pathogen.
Collapse
Affiliation(s)
- Philip Reiner Kensche
- Department of Molecular Biology, Radboud University, 6525GA Nijmegen, The Netherlands
| | | | | | - Maaike Bras
- Department of Molecular Biology, Radboud University, 6525GA Nijmegen, The Netherlands
| | - Lia Chappell
- Parasite Genomics Group, Wellcome Trust Sanger Institute, CB10 1SA Hinxton, UK
| | - Matthew Berriman
- Parasite Genomics Group, Wellcome Trust Sanger Institute, CB10 1SA Hinxton, UK
| | - Richárd Bártfai
- Department of Molecular Biology, Radboud University, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
43
|
Ison J, Rapacki K, Ménager H, Kalaš M, Rydza E, Chmura P, Anthon C, Beard N, Berka K, Bolser D, Booth T, Bretaudeau A, Brezovsky J, Casadio R, Cesareni G, Coppens F, Cornell M, Cuccuru G, Davidsen K, Vedova GD, Dogan T, Doppelt-Azeroual O, Emery L, Gasteiger E, Gatter T, Goldberg T, Grosjean M, Grüning B, Helmer-Citterich M, Ienasescu H, Ioannidis V, Jespersen MC, Jimenez R, Juty N, Juvan P, Koch M, Laibe C, Li JW, Licata L, Mareuil F, Mičetić I, Friborg RM, Moretti S, Morris C, Möller S, Nenadic A, Peterson H, Profiti G, Rice P, Romano P, Roncaglia P, Saidi R, Schafferhans A, Schwämmle V, Smith C, Sperotto MM, Stockinger H, Vařeková RS, Tosatto SCE, de la Torre V, Uva P, Via A, Yachdav G, Zambelli F, Vriend G, Rost B, Parkinson H, Løngreen P, Brunak S. Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res 2015; 44:D38-47. [PMID: 26538599 PMCID: PMC4702812 DOI: 10.1093/nar/gkv1116] [Citation(s) in RCA: 94] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 10/13/2015] [Indexed: 01/24/2023] Open
Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Collapse
Affiliation(s)
- Jon Ison
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Kristoffer Rapacki
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Hervé Ménager
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Norway
| | - Emil Rydza
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Piotr Chmura
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Christian Anthon
- Department of Veterinary Clinical and Animal Sciences, Faculty for Health and Medical Sciences, University of Copenhagen, Denmark
| | - Niall Beard
- School of Computer Science, University of Manchester, UK
| | - Karel Berka
- Department of Physical Chemistry, RCPTM, Faculty of Science, Palacky University, Czech Republic
| | - Dan Bolser
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Tim Booth
- NEBC Wallingford, Centre for Ecology and Hydrology, UK
| | - Anthony Bretaudeau
- INRA, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics Platform for Agroecosystems Arthropods (BIPAA), France INRIA, IRISA, GenOuest Core Facility, France
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, Czech Republic
| | - Rita Casadio
- Bologna Biocomputing Group, University of Bologna, Italy
| | | | - Frederik Coppens
- Department of Plant Systems Biology, VIB, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium
| | | | | | - Kristian Davidsen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | - Tunca Dogan
- UniProt, European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Laura Emery
- The European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Thomas Gatter
- Faculty of Technology and Center for Biotechnology, Universität Bielefeld, Germany
| | | | - Marie Grosjean
- Institut Français de Bioinformatique (French Institute of Bioinformatics), CNRS, UMS3601, France
| | - Björn Grüning
- Albert-Ludwigs-Universität Freiburg, Fahnenbergplatz, 79085 Freiburg
| | | | - Hans Ienasescu
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Denmark
| | | | - Martin Closter Jespersen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | - Nick Juty
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Peter Juvan
- Centre for Functional Genomics and Biochips, Faculty of Medicine, University of Ljubljana, Slovenia
| | | | - Camille Laibe
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Jing-Woei Li
- Faculty of Medicine, The Chinese University of Hong Kong, China Hong Kong Bioinformatics Centre, School of Life Sciences,The Chinese University of Hong Kong, China
| | - Luana Licata
- Dept. of Biology, University of Rome Tor Vergata, Italy
| | - Fabien Mareuil
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, France
| | - Ivan Mičetić
- Department of Biomedical Sciences, University of Padua, Italy
| | | | - Sebastien Moretti
- SIB Swiss Institute of Bioinformatics, Switzerland Department of Ecology and Evolution, Biophore, Evolutionary Bioinformatics group, University of Lausanne, Switzerland
| | | | - Steffen Möller
- Department of Dermatology, University of Lübeck, Germany Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Germany
| | | | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Estonia
| | | | - Peter Rice
- Department of Computing, William Penney Laboratory, Imperial College London, UK
| | | | | | - Rabie Saidi
- UniProt, European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Veit Schwämmle
- Protein Research Group, Department for Biochemistry and Molecular Biology, University of Southern Denmark, Denmark
| | | | - Maria Maddalena Sperotto
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | | | | | - Victor de la Torre
- National Bioinformatics Institute Unit (INB), Fundacion Centro Nacional de Investigaciones Oncologicas, Spain
| | | | - Allegra Via
- Dept. of Physics, Sapienza University, Italy
| | - Guy Yachdav
- Department of Informatics, Bioinformatics-I12, TUM, Germany
| | - Federico Zambelli
- Institute of Biomembranes and Bioenergetics, National Research Council (CNR), and Dept. of Biosciences, University of Milano, Italy
| | - Gert Vriend
- Radboud University Medical Centre, CMBI, Netherlands
| | - Burkhard Rost
- Department of Informatics, Bioinformatics-I12, TUM, Germany
| | | | - Peter Løngreen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| |
Collapse
|
44
|
Mathelier A, Fornes O, Arenillas DJ, Chen CY, Denay G, Lee J, Shi W, Shyr C, Tan G, Worsley-Hunt R, Zhang AW, Parcy F, Lenhard B, Sandelin A, Wasserman WW. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2015; 44:D110-5. [PMID: 26531826 PMCID: PMC4702842 DOI: 10.1093/nar/gkv1176] [Citation(s) in RCA: 737] [Impact Index Per Article: 81.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 10/22/2015] [Indexed: 11/28/2022] Open
Abstract
JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release.
Collapse
Affiliation(s)
- Anthony Mathelier
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - David J Arenillas
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Chih-Yu Chen
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Grégoire Denay
- Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France
| | - Jessica Lee
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Wenqiang Shi
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Casper Shyr
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Ge Tan
- Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK
| | - Rebecca Worsley-Hunt
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - Allen W Zhang
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire & Végétale, Université Grenoble Alpes, CNRS, CEA, iRTSV, INRA, 38054 Grenoble, France
| | - Boris Lenhard
- Computational Regulatory Genomics, MRC Clinical Sciences Centre, Imperial College London, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, Ole Maaloes Vej 5, DK-2200, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics at the Child and Family Research Institute, Department of Medical Genetics, University of British Columbia, Vancouver, V5Z 4H4, BC, Canada
| |
Collapse
|
45
|
Köster J. Rust-Bio: a fast and safe bioinformatics library. Bioinformatics 2015; 32:444-6. [DOI: 10.1093/bioinformatics/btv573] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2015] [Accepted: 09/28/2015] [Indexed: 11/13/2022] Open
|
46
|
Diaz A, Enomoto S, Romagosa A, Sreevatsan S, Nelson M, Culhane M, Torremorell M. Genome plasticity of triple-reassortant H1N1 influenza A virus during infection of vaccinated pigs. J Gen Virol 2015; 96:2982-2993. [PMID: 26251306 PMCID: PMC4857448 DOI: 10.1099/jgv.0.000258] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 08/04/2015] [Indexed: 12/18/2022] Open
Abstract
To gain insight into the evolution of influenza A viruses (IAVs) during infection of vaccinated pigs, we experimentally infected a 3-week-old naive pig with a triple-reassortant H1N1 IAV and placed the seeder pig in direct contact with a group of age-matched vaccinated pigs (n = 10). We indexed the genetic diversity and evolution of the virus at an intra-host level by deep sequencing the entire genome directly from nasal swabs collected at two separate samplings during infection. We obtained 13 IAV metagenomes from 13 samples, which included the virus inoculum and two samples from each of the six pigs that tested positive for IAV during the study. The infection produced a population of heterogeneous alleles (sequence variants) that was dynamic over time. Overall, 794 polymorphisms were identified amongst all samples, which yielded 327 alleles, 214 of which were unique sequences. A total of 43 distinct haemagglutinin proteins were translated, two of which were observed in multiple pigs, whereas the neuraminidase (NA) was conserved and only one dominant NA was found throughout the study. The genetic diversity of IAVs changed dynamically within and between pigs. However, most of the substitutions observed in the internal gene segments were synonymous. Our results demonstrated remarkable IAV diversity, and the complex, rapid and dynamic evolution of IAV during infection of vaccinated pigs that can only be appreciated with repeated sampling of individual animals and deep sequence analysis.
Collapse
Affiliation(s)
- Andres Diaz
- College of Veterinary Medicine, University of Minnesota Saint Paul, Minnesota, USA
| | | | - Anna Romagosa
- College of Veterinary Medicine, University of Minnesota Saint Paul, Minnesota, USA
| | - Srinand Sreevatsan
- College of Veterinary Medicine, University of Minnesota Saint Paul, Minnesota, USA
| | - Martha Nelson
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland, USA
| | - Marie Culhane
- College of Veterinary Medicine, University of Minnesota Saint Paul, Minnesota, USA
| | | |
Collapse
|
47
|
Perlejewski K, Popiel M, Laskus T, Nakamura S, Motooka D, Stokowy T, Lipowski D, Pollak A, Lechowicz U, Caraballo Cortés K, Stępień A, Radkowski M, Bukowska-Ośko I. Next-generation sequencing (NGS) in the identification of encephalitis-causing viruses: Unexpected detection of human herpesvirus 1 while searching for RNA pathogens. J Virol Methods 2015; 226:1-6. [PMID: 26424618 DOI: 10.1016/j.jviromet.2015.09.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/24/2015] [Accepted: 09/24/2015] [Indexed: 11/18/2022]
Abstract
BACKGROUND Encephalitis is a severe neurological syndrome usually caused by viruses. Despite significant progress in diagnostic techniques, the causative agent remains unidentified in the majority of cases. The aim of the present study was to test an alternative approach for the detection of putative pathogens in encephalitis using next-generation sequencing (NGS). METHODS RNA was extracted from cerebrospinal fluid (CSF) from a 60-year-old male patient with encephalitis and subjected to isothermal linear nucleic acid amplification (Ribo-SPIA, NuGen) followed by next-generation sequencing using MiSeq (Illumina) system and metagenomics data analysis. RESULTS The sequencing run yielded 1,578,856 reads overall and 2579 reads matched human herpesvirus I (HHV-1) genome; the presence of this pathogen in CSF was confirmed by specific PCR. In subsequent experiments we found that the DNAse I treatment, while lowering the background of host-derived sequences, lowered the number of detectable HHV-1 sequences by a factor of 4. Furthermore, we found that the routine extraction of total RNA by the Chomczynski method could be used for identification of both DNA and RNA pathogens in typical clinical settings, as it results in retention of a significant amount of DNA. CONCLUSION In summary, it seems that NGS preceded by nucleic acid amplification could supplement currently used diagnostic methods in encephalitis.
Collapse
Affiliation(s)
- Karol Perlejewski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Marta Popiel
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland; Postgraduate School of Molecular Medicine, 61 Żwirki i Wigury Street, 02-091 Warsaw, Poland.
| | - Tomasz Laskus
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Shota Nakamura
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University 3-1 Yamadaoka, Suita-City, Osaka, Japan.
| | - Daisuke Motooka
- Department of Infection Metagenomics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University 3-1 Yamadaoka, Suita-City, Osaka, Japan.
| | - Tomasz Stokowy
- Department of Clinical Science, University of Bergen, 5021 Bergen, Norway.
| | - Dariusz Lipowski
- Municipal Hospital for Infectious Diseases, 37 Wolska Street, 01-201 Warsaw, Poland.
| | - Agnieszka Pollak
- Department of Genetics, Institute of Physiology and Pathology of Hearing, Mochnackiego 10, 02-042 Warsaw, Poland.
| | - Urszula Lechowicz
- Department of Genetics, Institute of Physiology and Pathology of Hearing, Mochnackiego 10, 02-042 Warsaw, Poland.
| | - Kamila Caraballo Cortés
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Adam Stępień
- Department of Neurology, Military Institute of Medicine, 128 Szaserów Street, 04-141 Warsaw, Poland.
| | - Marek Radkowski
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| | - Iwona Bukowska-Ośko
- Department of Immunopathology of Infectious and Parasitic Diseases, Warsaw Medical University, 3C Pawinskiego Street, 02-106 Warsaw, Poland.
| |
Collapse
|
48
|
Long-Lasting Gene Conversion Shapes the Convergent Evolution of the Critical Methanogenesis Genes. G3-GENES GENOMES GENETICS 2015; 5:2475-86. [PMID: 26384370 PMCID: PMC4632066 DOI: 10.1534/g3.115.020180] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Methanogenesis and its key small-molecule methyltransferase Mtr complex are poorly understood despite their pivotal role in Earth’s global carbon cycle. Mtr complex is encoded by a conserved mtrEDCBAFGH operon in most methanogens. Here we report that two discrete lineages, Methanococcales and Methanomicrobiales, have a noncanonical mtr operon carrying two copies of mtrA resulting from an ancient duplication. Compared to mtrA-1, mtrA-2 acquires a distinct transmembrane domain through domain shuffling and gene fusion. However, the nontransmembrane domains (MtrA domain) of mtrA-1 and mtrA-2 are homogenized by gene conversion events lasting throughout the long history of these extant methanogens (over 2410 million years). Furthermore, we identified a possible recruitment of ancient nonmethanogenic methyltransferase genes to establish the methanogenesis pathway. These results not only provide novel evolutionary insight into the methanogenesis pathway and methyltransferase superfamily but also suggest an unanticipated long-lasting effect of gene conversion on gene evolution in a convergent pattern.
Collapse
|
49
|
Burt C, Steed A, Gosman N, Lemmens M, Bird N, Ramirez-Gonzalez R, Holdgate S, Nicholson P. Mapping a Type 1 FHB resistance on chromosome 4AS of Triticum macha and deployment in combination with two Type 2 resistances. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:1725-1738. [PMID: 26040404 PMCID: PMC4540761 DOI: 10.1007/s00122-015-2542-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 06/04/2023]
Abstract
Markers closely flanking a Type 1 FHB resistance have been produced and the potential of combining this with Type 2 resistances to improve control of FHB has been demonstrated. Two categories of resistance to Fusarium head blight (FHB) in wheat are generally recognised: resistance to initial infection (Type 1) and resistance to spread within the head (Type 2). While numerous sources of Type 2 resistance have been reported, relatively fewer Type 1 resistances have been characterised. Previous study identified a Type 1 FHB resistance (QFhs.jic-4AS) on chromosome 4A in Triticum macha. Little is known about the effect of combining Type 1 and Type 2 resistances on overall FHB symptoms or accumulation of the mycotoxin deoxynivalenol (DON). QFhs.jic-4AS was combined independently with two Type 2 FHB resistances (Fhb1 and one associated with the 1BL/1RS translocation). While combining Type 1 and Type 2 resistances generally reduced visual symptom development, the effect on DON accumulation was marginal. A lack of polymorphic markers and a limited number of recombinants had originally prevented accurate mapping of the QFhs.jic-4AS resistance. Using an array of recently produced markers in combination with new populations, the position of QFhs.jic-4AS has been determined to allow this resistance to be followed in breeding programmes.
Collapse
Affiliation(s)
- C. Burt
- />John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - A. Steed
- />John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - N. Gosman
- />John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | - M. Lemmens
- />IFA-Tulln, University of Natural Resources and Life Sciences, Konrad Lorenz Strasse 20, 3430 Tulln, Austria
| | - N. Bird
- />John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| | | | - S. Holdgate
- />RAGT, Grange Road, Ickleton, Essex, CB10 1TA UK
| | - P. Nicholson
- />John Innes Centre, Norwich Research Park, Norwich, NR4 7UH UK
| |
Collapse
|
50
|
Cock PJA, Chilton JM, Grüning B, Johnson JE, Soranzo N. NCBI BLAST+ integrated into Galaxy. Gigascience 2015; 4:39. [PMID: 26336600 PMCID: PMC4557756 DOI: 10.1186/s13742-015-0080-7] [Citation(s) in RCA: 148] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 08/18/2015] [Indexed: 01/29/2023] Open
Abstract
Background The NCBI BLAST suite has become ubiquitous in modern molecular biology and is used for small tasks such as checking capillary sequencing results of single PCR products, genome annotation or even larger scale pan-genome analyses. For early adopters of the Galaxy web-based biomedical data analysis platform, integrating BLAST into Galaxy was a natural step for sequence comparison workflows. Findings The command line NCBI BLAST+ tool suite was wrapped for use within Galaxy. Appropriate datatypes were defined as needed. The integration of the BLAST+ tool suite into Galaxy has the goal of making common BLAST tasks easy and advanced tasks possible. Conclusions This project is an informal international collaborative effort, and is deployed and used on Galaxy servers worldwide. Several examples of applications are described here.
Collapse
Affiliation(s)
- Peter J A Cock
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, DD2 5DA Scotland UK
| | - John M Chilton
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant St. SE, 55455 Minneapolis, MN USA
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110 Germany
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant St. SE, 55455 Minneapolis, MN USA
| | | |
Collapse
|