1
|
Leo S, Crusoe MR, Rodríguez-Navas L, Sirvent R, Kanitz A, De Geest P, Wittner R, Pireddu L, Garijo D, Fernández JM, Colonnelli I, Gallo M, Ohta T, Suetake H, Capella-Gutierrez S, de Wit R, Kinoshita BP, Soiland-Reyes S. Recording provenance of workflow runs with RO-Crate. PLoS One 2024; 19:e0309210. [PMID: 39255315 PMCID: PMC11386446 DOI: 10.1371/journal.pone.0309210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 08/08/2024] [Indexed: 09/12/2024] Open
Abstract
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
Collapse
Affiliation(s)
- Simone Leo
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
| | - Michael R. Crusoe
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- DTL Projects, Utrecht, The Netherlands
- Forschungszentrum Jülich, Jülich, Germany
| | | | - Raül Sirvent
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Rudolf Wittner
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Institute of Computer Science, Masaryk University, Brno, Czech Republic
- BBMRI-ERIC, Graz, Austria
| | - Luca Pireddu
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
| | - Daniel Garijo
- Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Iacopo Colonnelli
- Computer Science Department, Università degli Studi di Torino, Torino, Italy
| | - Matej Gallo
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Tazro Ohta
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
- Institute for Advanced Academic Research, Chiba University, Chiba, Japan
| | | | | | - Renske de Wit
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, United Kingdom
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
2
|
Martín del Pico E, Gelpí JL, Capella-Gutierrez S. FAIRsoft-a practical implementation of FAIR principles for research software. Bioinformatics 2024; 40:btae464. [PMID: 39037960 PMCID: PMC11330317 DOI: 10.1093/bioinformatics/btae464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 05/26/2024] [Accepted: 07/20/2024] [Indexed: 07/24/2024] Open
Abstract
MOTIVATION Software plays a crucial and growing role in research. Unfortunately, the computational component in Life Sciences research is often challenging to reproduce and verify. It could be undocumented, opaque, contain unknown errors that affect the outcome, or be directly unavailable and impossible to use for others. These issues are detrimental to the overall quality of scientific research. One step to address this problem is the formulation of principles that research software in the domain should meet to ensure its quality and sustainability, resembling the FAIR (findable, accessible, interoperable, and reusable) data principles. RESULTS We present here a comprehensive series of quantitative indicators based on a pragmatic interpretation of the FAIR Principles and their implementation on OpenEBench, ELIXIR's open platform providing both support for scientific benchmarking and an active observatory of quality-related features for Life Sciences research software. The results serve to understand the current practices around research software quality-related features and provide objective indications for improving them. AVAILABILITY AND IMPLEMENTATION Software metadata, from 11 different sources, collected, integrated, and analysed in the context of this manuscript are available at https://doi.org/10.5281/zenodo.7311067. Code used for software metadata retrieval and processing is available in the following repository: https://gitlab.bsc.es/inb/elixir/software-observatory/FAIRsoft_ETL.
Collapse
Affiliation(s)
| | - Josep Lluís Gelpí
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Biochemistry and Molecular Biomedicine Department, University of Barcelona, 08028 Barcelona, Spain
| | | |
Collapse
|
3
|
Murai T, Yanagi S, Hori Y, Kobayashi T. Replication fork blocking deficiency leads to a reduction of rDNA copy number in budding yeast. iScience 2024; 27:109120. [PMID: 38384843 PMCID: PMC10879690 DOI: 10.1016/j.isci.2024.109120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/27/2023] [Accepted: 01/31/2024] [Indexed: 02/23/2024] Open
Abstract
The ribosomal RNA genes are encoded as hundreds of tandem repeats, known as the rDNA, in eukaryotes. Maintaining these copies seems to be necessary, but copy number changes in an active manner have been reported in only frogs, flies, Neurospora, and yeast. In the best-studied system, yeast, a protein (Fob1) binds to the rDNA and unidirectionally blocks the replication fork. This block stimulates rDNA double-strand breaks (DSBs) leading to recombination and copy number change. To date, copy number maintenance and concerted evolution mediated by rDNA repeat turnover were the proposed benefits of Fob1-dependent replication fork arrest. In this study, we tested whether Fob1 provides these benefits and found that rDNA copy number decreases when FOB1 is deleted, suggesting that Fob1 is important for recovery from low copy number. We suppose that replication fork stalling at rDNA is necessary for recovering from rDNA copy number loss in other species as well.
Collapse
Affiliation(s)
- Taichi Murai
- Laboratory of Genome Regeneration, Institute for Quantitative Biosciences (IQB), The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Shuichi Yanagi
- Laboratory of Genome Regeneration, Institute for Quantitative Biosciences (IQB), The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Yutaro Hori
- Laboratory of Genome Regeneration, Institute for Quantitative Biosciences (IQB), The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Takehiko Kobayashi
- Laboratory of Genome Regeneration, Institute for Quantitative Biosciences (IQB), The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| |
Collapse
|
4
|
Dhandhanya UK, Mukhopadhyay K, Kumar M. An accretive detection method for in silico identification and validation of circular RNAs in wheat (Triticum aestivum L.) using RT-qPCR. Mol Biol Rep 2024; 51:162. [PMID: 38252357 DOI: 10.1007/s11033-023-09138-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 12/11/2023] [Indexed: 01/23/2024]
Abstract
BACKGROUND Circular RNAs (circRNAs) are novel class of non-coding RNAs, which are involved in various functions at the transcriptional and post-transcriptional level in response to a fungal pathogen (Puccinia triticina), including microRNA (miRNA) sponge, RNA binding proteins sponge, regulation of parental gene and biomarkers. Detailed analysis of wheat circRNAs is essential to accelerate the regulated expression of fungal miRNAs. Therefore, we suggest a protocol to aid circRNA identification through RNA-Seq data using various algorithms based on perl script followed by validation through divergent primer designing, standard PCR, and RT-qPCR assays. METHODS AND RESULT The divergent primer has been widely used to detect, validate, and quantify back-spliced junction (BSJ) of circRNAs. The procedure covers index file formation, circRNA identification and BSJ detections. However, the laboratory validation of circRNA includes wheat genomic DNA isolation, RNA isolation and its cDNA conversion upto validation. In this study, we identified 28 circRNAs from RNA-Seq of S0 and R0, wherein six circRNAs are commonly present and 75% of the identified circRNAs were belongs to inter-genic, 14% were exonic and intronic category were 11%. Divergent primer designing method successfully validated the two circRNAs via RT-qPCR assay, where circRNA_2 showed less relative expression pattern than circRNA_1 in contrast with housekeeping genes. CONCLUSION Thus, our results of identified and validated circRNAs showed that, this protocol is quite helpful, relatively easy, reliable, and accurate for large datasets as other algorithms need various dependencies and have complex scripts with high chances of error occurrence. Additionally, analysis time will vary depending on the expertise level and the number of RNA-Seq data. This proposed protocol can also be used for a wide range of monocotyledons belonging to the Poaceae plant family.
Collapse
Affiliation(s)
- Umang Kumar Dhandhanya
- Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, 835215, India
| | - Kunal Mukhopadhyay
- Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, 835215, India
| | - Manish Kumar
- Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, Jharkhand, 835215, India.
| |
Collapse
|
5
|
Liu Q, Hu Q, Liu S, Hutson A, Morgan M. ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management. BMC Bioinformatics 2024; 25:8. [PMID: 38172657 PMCID: PMC10765726 DOI: 10.1186/s12859-023-05626-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/20/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND The increasing volume and complexity of genomic data pose significant challenges for effective data management and reuse. Public genomic data often undergo similar preprocessing across projects, leading to redundant or inconsistent datasets and inefficient use of computing resources. This is especially pertinent for bioinformaticians engaged in multiple projects. Tools have been created to address challenges in managing and accessing curated genomic datasets, however, the practical utility of such tools becomes especially beneficial for users who seek to work with specific types of data or are technically inclined toward a particular programming language. Currently, there exists a gap in the availability of an R-specific solution for efficient data management and versatile data reuse. RESULTS Here we present ReUseData, an R software tool that overcomes some of the limitations of existing solutions and provides a versatile and reproducible approach to effective data management within R. ReUseData facilitates the transformation of ad hoc scripts for data preprocessing into Common Workflow Language (CWL)-based data recipes, allowing for the reproducible generation of curated data files in their generic formats. The data recipes are standardized and self-contained, enabling them to be easily portable and reproducible across various computing platforms. ReUseData also streamlines the reuse of curated data files and their integration into downstream analysis tools and workflows with different frameworks. CONCLUSIONS ReUseData provides a reliable and reproducible approach for genomic data management within the R environment to enhance the accessibility and reusability of genomic data. The package is available at Bioconductor ( https://bioconductor.org/packages/ReUseData/ ) with additional information on the project website ( https://rcwl.org/dataRecipes/ ).
Collapse
Affiliation(s)
- Qian Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| | - Qiang Hu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Alan Hutson
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Martin Morgan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| |
Collapse
|
6
|
Puillandre N, Miralles A, Brouillet S, Fedosov A, Fischell F, Patmanidis S, Vences M. Species Delimitation and Exploration of Species Partitions with ASAP and LIMES. Methods Mol Biol 2024; 2744:313-334. [PMID: 38683328 DOI: 10.1007/978-1-0716-3581-0_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
DNA barcoding plays an important role in exploring undescribed biodiversity and is increasingly used to delimit lineages at the species level (see Chap. 4 by Miralles et al.). Although several approaches and programs have been developed to perform species delimitation from datasets of single-locus DNA sequences, such as DNA barcodes, most of these were not initially provided as user-friendly GUI-driven executables. In spite of their differences, most of these tools share the same goal, i.e., inferring de novo a partition of subsets, potentially each representing a distinct species. More recently, a proposed common exchange format for the resulting species partitions (SPART) has been implemented by several of these tools, paving the way toward developing an interoperable digital environment entirely dedicated to integrative and comparative species delimitation. In this chapter, we provide detailed protocols for the use of two bioinformatic tools, one for single locus molecular species delimitation (ASAP) and one for statistical comparison of species partitions resulting from any kind of species delimitation analyses (LIMES).
Collapse
Affiliation(s)
- Nicolas Puillandre
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Aurélien Miralles
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
| | - Sophie Brouillet
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Alexander Fedosov
- Department of Zoology, Swedish Museum of Natural History, Stockholm, Sweden
| | - Frank Fischell
- Institute of Zoology, University of Cologne, Köln, Germany
| | - Stefanos Patmanidis
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Miguel Vences
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
7
|
Fedosov A, Puillandre N, Fischell F, Patmanidis S, Miralles A, Vences M. DNA Barcode-Based Species Diagnosis with MolD. Methods Mol Biol 2024; 2744:297-311. [PMID: 38683327 DOI: 10.1007/978-1-0716-3581-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Rapid biodiversity loss sets new requirements for taxonomic research, prompting updating some long-established practices to maximize timely documentation of species before they have gone extinct. One of the crucial procedures associated with the description of new taxa in Linnean taxonomy is assigning them a diagnosis, which is an account of the specific features of the taxon, differentiating it from already described species. Traditionally, diagnostic characters have been morphological, but especially in the case of morphologically cryptic species, molecular diagnoses become increasingly important. In this chapter, we provide detailed protocols for molecular taxon diagnosis with the bioinformatic tool MolD which is available as open-source Python code, command-line driven binary, GUI-driven executable for Windows and Mac, and Galaxy implementation. MolD identifies diagnostic combinations of nucleotides (DNCs) in addition to single (pure) diagnostic sites, enabling users to base DNA diagnoses on a minimal number of diagnostic sites necessary for reliable differentiation of taxa.
Collapse
Affiliation(s)
- Alexander Fedosov
- Department of Zoology, Swedish Museum of Natural History, Stockholm, Sweden
| | - Nicolas Puillandre
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Frank Fischell
- Institute of Zoology, University of Cologne, Köln, Germany
| | - Stefanos Patmanidis
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Aurélien Miralles
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
| | - Miguel Vences
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
8
|
Black JG, van Rooyen ARJ, Heinze D, Gaffney R, Hoffmann AA, Schmidt TL, Weeks AR. Heterogeneous patterns of heterozygosity loss in isolated populations of the threatened eastern barred bandicoot (Perameles gunnii). Mol Ecol 2023. [PMID: 38013623 DOI: 10.1111/mec.17224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/06/2023] [Accepted: 11/14/2023] [Indexed: 11/29/2023]
Abstract
Identifying and analysing isolated populations is critical for conservation. Isolation can make populations vulnerable to local extinction due to increased genetic drift and inbreeding, both of which should leave imprints of decreased genome-wide heterozygosity. While decreases in heterozygosity among populations are frequently investigated, fewer studies have analysed how heterozygosity varies among individuals, including whether heterozygosity varies geographically along lines of discrete population structure or with continuous patterns analogous to isolation by distance. Here we explore geographical patterns of differentiation and individual heterozygosity in the threatened eastern barred bandicoot (Perameles gunnii) in Tasmania, Australia, using genomic data from 85 samples collected between 2008 and 2011. Our analyses identified two isolated demes undergoing significant genetic drift, and several areas of fine-scale differentiation across Tasmania. We observed discrete genetic structures across geographical barriers and continuous patterns of isolation by distance, with little evidence of recent or historical migration. Using a recently developed analytical pipeline for estimating autosomal heterozygosity, we found individual heterozygosities varied within demes by up to a factor of two, and demes with low-heterozygosity individuals also still contained those with high heterozygosity. Spatial interpolation of heterozygosity scores clarified these patterns and identified the isolated Tasman Peninsula as a location where low-heterozygosity individuals were more common than elsewhere. Our results provide novel insights into the relationship between isolation-driven genetic structure and local heterozygosity patterns. These may help improve translocation efforts, by identifying populations in need of assistance, and by providing an individualised metric for identifying source animals for translocation.
Collapse
Affiliation(s)
- John G Black
- School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | | | - Dean Heinze
- Research Centre of Applied Alpine Ecology, La Trobe University, Melbourne, Victoria, Australia
| | - Robbie Gaffney
- Department of Natural Resources and Environment, Hobart, Tasmania, Australia
| | - Ary A Hoffmann
- School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Thomas L Schmidt
- School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Andrew R Weeks
- School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
- Cesar Australia, Brunswick, Victoria, Australia
| |
Collapse
|
9
|
Lobiuc A, Pavăl NE, Dimian M, Covașă M. Nanopore Sequencing Assessment of Bacterial Pathogens and Associated Antibiotic Resistance Genes in Environmental Samples. Microorganisms 2023; 11:2834. [PMID: 38137978 PMCID: PMC10745997 DOI: 10.3390/microorganisms11122834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 11/07/2023] [Accepted: 11/17/2023] [Indexed: 12/24/2023] Open
Abstract
As seen in earlier and present pandemics, monitoring pathogens in the environment can offer multiple insights on their spread, evolution, and even future outbreaks. The present paper assesses the opportunity to detect microbial pathogens and associated antibiotic resistance genes, in relation to specific pathogen sources, by using nanopore sequencing in municipal waters and wastewaters in Romania. The main results indicated that waters collecting effluents from a meat processing facility exhibit altered communities' diversity and abundance, with reduced values (101-108 and 0.86-0.91) of Chao1 and, respectively, Simpson diversity indices and Campylobacterales as main order, compared with other types of municipal waters where the same diversity index had much higher values of 172-214 and 0.97-0.98, and Burkholderiaceae and Pseudomonadaceae were the most abundant families. Moreover, the incidence and type of antibiotic resistance genes were significantly influenced by the proximity of antibiotic sources, with either tetracycline (up to 45% of total reads) or neomycin, streptomycin and tobramycin (up to 3.8% total reads) resistance incidence being shaped by the sampling site. As such, nanopore sequencing proves to be an easy-to-use, accessible molecular technique for environmental pathogen surveillance and associated antibiotic resistance genes.
Collapse
Affiliation(s)
- Andrei Lobiuc
- Department of Biomedical Sciences, Faculty of Medicine and Biological Sciences, “Ştefan cel Mare” University, 720229 Suceava, Romania; (N.-E.P.); (M.C.)
| | - Naomi-Eunicia Pavăl
- Department of Biomedical Sciences, Faculty of Medicine and Biological Sciences, “Ştefan cel Mare” University, 720229 Suceava, Romania; (N.-E.P.); (M.C.)
| | - Mihai Dimian
- Department of Computers, Electronics and Automation, Stefan cel Mare University of Suceava, 720229 Suceava, Romania;
| | - Mihai Covașă
- Department of Biomedical Sciences, Faculty of Medicine and Biological Sciences, “Ştefan cel Mare” University, 720229 Suceava, Romania; (N.-E.P.); (M.C.)
| |
Collapse
|
10
|
Biguenet A, Bordy A, Atchon A, Hocquet D, Valot B. Introduction and benchmarking of pyMLST: open-source software for assessing bacterial clonality using core genome MLST. Microb Genom 2023; 9. [PMID: 37966168 DOI: 10.1099/mgen.0.001126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023] Open
Abstract
Core genome multilocus sequence typing (cgMLST) has gained in popularity for bacterial typing since whole-genome sequencing (WGS) has become affordable. We introduce here pyMLST, a new complete, stand-alone, free and open source pipeline for cgMLST analysis. pyMLST can create or import a core genome database. For each gene, the first allele is aligned against the bacterial genome of interest using BLAT. Incomplete genes are aligned using MAFT. All data are stored in a SQLite database. pyMLST accepts assembly genomes or raw data (with the option pyMLST-KMA) as input. To evaluate our new tool, we selected three genome collections of major bacterial pathogens (Escherichia coli, Pseudomonas aeruginosa and Staphylococcus aureus) and compared them with pyMLST, pyMLST-KMA, ChewBBACA, SeqSphere and the variant calling approach. We compared the sensitivity, precision and false-positive rate for each method with those of the variant calling approach. Minimal spanning trees were generated with each type of software to evaluate their interest in the context of a bacterial outbreak. We found that pyMLST-KMA is a convenient screening method to avoid assembling large bacterial collections. Our data showed that pyMLST (free, open source, available in Galaxy and pipeline ready) performed similarly to the commercial SeqSphere and performed better than ChewBBACA and pyMLST-KMA.
Collapse
Affiliation(s)
- Adrien Biguenet
- CHU de Besançon, Hygiène Hospitalière, F-25030 Besançon, France
- Université de Franche-Comté, CNRS, Chrono-environnement, F-25000 Besançon, France
| | - Augustin Bordy
- Université de Franche-Comté, CNRS, Chrono-environnement, F-25000 Besançon, France
| | - Alban Atchon
- Bioinformatique et Big Data Au Service de La Santé, Université de Franche-Comté, F-25000 Besançon, France
| | - Didier Hocquet
- CHU de Besançon, Hygiène Hospitalière, F-25030 Besançon, France
- Université de Franche-Comté, CNRS, Chrono-environnement, F-25000 Besançon, France
| | - Benoit Valot
- Université de Franche-Comté, CNRS, Chrono-environnement, F-25000 Besançon, France
- Bioinformatique et Big Data Au Service de La Santé, Université de Franche-Comté, F-25000 Besançon, France
| |
Collapse
|
11
|
Maroilley T, Rahit KMTH, Chida AR, Cotra F, Rodrigues Alves Barbosa V, Tarailo-Graovac M. Model Organism Modifier (MOM): a user-friendly Galaxy workflow to detect modifiers from genome sequencing data using Caenorhabditis elegans. G3 (BETHESDA, MD.) 2023; 13:jkad184. [PMID: 37585487 PMCID: PMC10627290 DOI: 10.1093/g3journal/jkad184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 04/21/2023] [Accepted: 08/05/2023] [Indexed: 08/18/2023]
Abstract
Genetic modifiers are variants modulating phenotypic outcomes of a primary detrimental variant. They contribute to rare diseases phenotypic variability, but their identification is challenging. Genetic screening with model organisms is a widely used method for demystifying genetic modifiers. Forward genetics screening followed by whole genome sequencing allows the detection of variants throughout the genome but typically produces thousands of candidate variants making the interpretation and prioritization process very time-consuming and tedious. Despite whole genome sequencing is more time and cost-efficient, usage of computational pipelines specific to modifier identification remains a challenge for biological-experiment-focused laboratories doing research with model organisms. To facilitate a broader implementation of whole genome sequencing in genetic screens, we have developed Model Organism Modifier or MOM, a pipeline as a user-friendly Galaxy workflow. Model Organism Modifier analyses raw short-read whole genome sequencing data and implements tailored filtering to provide a Candidate Variant List short enough to be further manually curated. We provide a detailed tutorial to run the Galaxy workflow Model Organism Modifier and guidelines to manually curate the Candidate Variant Lists. We have tested Model Organism Modifier on published and validated Caenorhabditis elegans modifiers screening datasets. As whole genome sequencing facilitates high-throughput identification of genetic modifiers in model organisms, Model Organism Modifier provides a user-friendly solution to implement the bioinformatics analysis of the short-read datasets in laboratories without expertise or support in Bioinformatics.
Collapse
Affiliation(s)
- Tatiana Maroilley
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - K M Tahsin Hassan Rahit
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Afiya Razia Chida
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Filip Cotra
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Victoria Rodrigues Alves Barbosa
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Maja Tarailo-Graovac
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
12
|
Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023; 19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open
Abstract
Data are the most important elements of bioinformatics: Computational analysis of bioinformatics data, in fact, can help researchers infer new knowledge about biology, chemistry, biophysics, and sometimes even medicine, influencing treatments and therapies for patients. Bioinformatics and high-throughput biological data coming from different sources can even be more helpful, because each of these different data chunks can provide alternative, complementary information about a specific biological phenomenon, similar to multiple photos of the same subject taken from different angles. In this context, the integration of bioinformatics and high-throughput biological data gets a pivotal role in running a successful bioinformatics study. In the last decades, data originating from proteomics, metabolomics, metagenomics, phenomics, transcriptomics, and epigenomics have been labelled -omics data, as a unique name to refer to them, and the integration of these omics data has gained importance in all biological areas. Even if this omics data integration is useful and relevant, due to its heterogeneity, it is not uncommon to make mistakes during the integration phases. We therefore decided to present these ten quick tips to perform an omics data integration correctly, avoiding common mistakes we experienced or noticed in published studies in the past. Even if we designed our ten guidelines for beginners, by using a simple language that (we hope) can be understood by anyone, we believe our ten recommendations should be taken into account by all the bioinformaticians performing omics data integration, including experts.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Claudio Angione
- School of Computing Engineering and Digital Technologies, Teesside University, Middlesbrough, United Kingdom
| |
Collapse
|
13
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
14
|
Lewis DC, Stevens DM, Little H, Coaker GL, Bostock RM. Overlapping Local and Systemic Defense Induced by an Oomycete Fatty Acid MAMP and Brown Seaweed Extract in Tomato. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2023; 36:359-371. [PMID: 36802868 PMCID: PMC10754052 DOI: 10.1094/mpmi-09-22-0192-r] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Eicosapolyenoic fatty acids are integral components of oomycete pathogens that can act as microbe-associated molecular patterns to induce disease resistance in plants. Defense-inducing eicosapolyenoic fatty acids include arachidonic acid (AA) and eicosapentaenoic acid and are strong elicitors in solanaceous plants, with bioactivity in other plant families. Similarly, extracts of a brown seaweed, Ascophyllum nodosum, used in sustainable agriculture as a biostimulant of plant growth, may also induce disease resistance. A. nodosum, similar to other macroalgae, is rich in eicosapolyenoic fatty acids, which comprise as much as 25% of total fatty acid composition. We investigated the response of roots and leaves from AA or a commercial A. nodosum extract (ANE) on root-treated tomatoes via RNA sequencing, phytohormone profiling, and disease assays. AA and ANE significantly altered transcriptional profiles relative to control plants, inducing numerous defense-related genes with both substantial overlap and differences in gene expression patterns. Root treatment with AA and, to a lesser extent, ANE also altered both salicylic acid and jasmonic acid levels while inducing local and systemic resistance to oomycete and bacterial pathogen challenge. Thus, our study highlights overlap in both local and systemic defense induced by AA and ANE, with potential for inducing broad-spectrum resistance against pathogens. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Domonique C. Lewis
- Department of Plant Pathology, University of California, Davis, CA 95616, U.S.A
| | - Danielle M. Stevens
- Department of Plant Pathology, University of California, Davis, CA 95616, U.S.A
| | - Holly Little
- Acadian Plant Health, Acadian Seaplants Limited, Dartmouth, Nova Scotia, Canada
| | - Gitta L. Coaker
- Department of Plant Pathology, University of California, Davis, CA 95616, U.S.A
| | - Richard M. Bostock
- Department of Plant Pathology, University of California, Davis, CA 95616, U.S.A
| |
Collapse
|
15
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
16
|
El-Sawalhi S, Revol O, Chamieh A, Lacoste A, Annessi A, La Scola B, Rolain JM, Pagnier I. Epidemiological Description and Detection of Antimicrobial Resistance in Various Aquatic Sites in Marseille, France. Microbiol Spectr 2023; 11:e0142622. [PMID: 36976002 PMCID: PMC10101087 DOI: 10.1128/spectrum.01426-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 12/22/2022] [Indexed: 03/29/2023] Open
Abstract
Antibiotic resistance is a worldwide public health concern and has been associated with reports of elevated mortality. According to the One Health concept, antibiotic resistance genes are transferrable to organisms, and organisms are shared among humans, animals, and the environment. Consequently, aquatic environments are a possible reservoir of bacteria harboring antibiotic resistance genes. In our study, we screened water and wastewater samples for antibiotic resistance genes by culturing samples on different types of agar media. Then, we performed real-time PCR to detect the presence of genes conferring resistance to beta lactams and colistin, followed by standard PCR and gene sequencing for verification. We mainly isolated Enterobacteriaceae from all samples. In water samples, 36 Gram-negative bacterial strains were isolated and identified. We found three extended-spectrum β-lactamase (ESBL)-producing bacteria-Escherichia coli and Enterobacter cloacae strains-harboring the CTX-M and TEM groups. In wastewater samples, we isolated 114 Gram-negative bacterial strains, mainly E. coli, Klebsiella pneumoniae, Citrobacter freundii and Proteus mirabilis strains. Forty-two bacterial strains were ESBL-producing bacteria, and they harbored at least one gene belonging to the CTX-M, SHV, and TEM groups. We also detected carbapenem-resistant genes, including NDM, KPC, and OXA-48, in four isolates of E. coli. This short epidemiological study allowed us to identify new antibiotic resistance genes present in bacterial strains isolated from water in Marseille. This type of surveillance shows the importance of tracking bacterial resistance in aquatic environments. IMPORTANCE Antibiotic-resistant bacteria are involved in serious infections in humans. The dissemination of these bacteria in water, which is in close contact with human activities, is a serious problem, especially under the concept of One Health. This study was done to survey and localize the circulation of bacterial strains, along with their antibiotic resistance genes, in the aquatic environment in Marseille, France. The importance of this study is to monitor the frequency of these circulating bacteria by creating and surveying water treatments.
Collapse
Affiliation(s)
- Sabah El-Sawalhi
- Aix-Marseille Université, IRD, APHM, MEPHI, Faculté de Médecine et de Pharmacie, Marseille CEDEX 05, France
- IHU Méditerranée Infection, Marseille CEDEX 05, France
| | - Océane Revol
- Aix-Marseille Université, IRD, APHM, MEPHI, Faculté de Médecine et de Pharmacie, Marseille CEDEX 05, France
- IHU Méditerranée Infection, Marseille CEDEX 05, France
| | - Amanda Chamieh
- Aix-Marseille Université, IRD, APHM, MEPHI, Faculté de Médecine et de Pharmacie, Marseille CEDEX 05, France
- IHU Méditerranée Infection, Marseille CEDEX 05, France
| | - Alexandre Lacoste
- Bataillon des Marins Pompiers de Marseille, CIS BMPM, Marseille, France
| | - Alexandre Annessi
- Bataillon des Marins Pompiers de Marseille, CIS BMPM, Marseille, France
| | - Bernard La Scola
- Aix-Marseille Université, IRD, APHM, MEPHI, Faculté de Médecine et de Pharmacie, Marseille CEDEX 05, France
- IHU Méditerranée Infection, Marseille CEDEX 05, France
| | - Jean-Marc Rolain
- Aix-Marseille Université, IRD, APHM, MEPHI, Faculté de Médecine et de Pharmacie, Marseille CEDEX 05, France
- IHU Méditerranée Infection, Marseille CEDEX 05, France
| | - Isabelle Pagnier
- Aix-Marseille Université, IRD, APHM, MEPHI, Faculté de Médecine et de Pharmacie, Marseille CEDEX 05, France
- IHU Méditerranée Infection, Marseille CEDEX 05, France
| |
Collapse
|
17
|
Djaffardjy M, Marchment G, Sebe C, Blanchet R, Bellajhame K, Gaignard A, Lemoine F, Cohen-Boulakia S. Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Comput Struct Biotechnol J 2023; 21:2075-2085. [PMID: 36968012 PMCID: PMC10030817 DOI: 10.1016/j.csbj.2023.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 03/03/2023] [Accepted: 03/03/2023] [Indexed: 03/09/2023] Open
Abstract
Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.
Collapse
|
18
|
Bray S, Chilton J, Bernt M, Soranzo N, van den Beek M, Batut B, Rasche H, Čech M, Cock PJA, Grüning B, Nekrutenko A. The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond. Genome Res 2023; 33:261-268. [PMID: 36828587 PMCID: PMC10069471 DOI: 10.1101/gr.276963.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 01/11/2023] [Indexed: 02/26/2023]
Abstract
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.
Collapse
Affiliation(s)
- Simon Bray
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany
| | - John Chilton
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Matthias Bernt
- Department of Computational Biology, Helmholtz Centre for Environmental Research GmbH-UFZ, 04318 Leipzig, Germany
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United Kingdom
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany
| | - Helena Rasche
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN, Rotterdam, The Netherlands; Academie voor de Technologie van Gezondheid en Milieu, Avans Hogeschool, 4818 AJ Breda, The Netherlands
| | - Martin Čech
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Peter J A Cock
- James Hutton Institute, Invergowrie, Dundee DD2 5DA, United Kingdom
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
| |
Collapse
|
19
|
Wafula EK, Zhang H, Von Kuster G, Leebens-Mack JH, Honaas LA, dePamphilis CW. PlantTribes2: Tools for comparative gene family analysis in plant genomics. FRONTIERS IN PLANT SCIENCE 2023; 13:1011199. [PMID: 36798801 PMCID: PMC9928214 DOI: 10.3389/fpls.2022.1011199] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 12/02/2022] [Indexed: 05/12/2023]
Abstract
Plant genome-scale resources are being generated at an increasing rate as sequencing technologies continue to improve and raw data costs continue to fall; however, the cost of downstream analyses remains large. This has resulted in a considerable range of genome assembly and annotation qualities across plant genomes due to their varying sizes, complexity, and the technology used for the assembly and annotation. To effectively work across genomes, researchers increasingly rely on comparative genomic approaches that integrate across plant community resources and data types. Such efforts have aided the genome annotation process and yielded novel insights into the evolutionary history of genomes and gene families, including complex non-model organisms. The essential tools to achieve these insights rely on gene family analysis at a genome-scale, but they are not well integrated for rapid analysis of new data, and the learning curve can be steep. Here we present PlantTribes2, a scalable, easily accessible, highly customizable, and broadly applicable gene family analysis framework with multiple entry points including user provided data. It uses objective classifications of annotated protein sequences from existing, high-quality plant genomes for comparative and evolutionary studies. PlantTribes2 can improve transcript models and then sort them, either genome-scale annotations or individual gene coding sequences, into pre-computed orthologous gene family clusters with rich functional annotation information. Then, for gene families of interest, PlantTribes2 performs downstream analyses and customizable visualizations including, (1) multiple sequence alignment, (2) gene family phylogeny, (3) estimation of synonymous and non-synonymous substitution rates among homologous sequences, and (4) inference of large-scale duplication events. We give examples of PlantTribes2 applications in functional genomic studies of economically important plant families, namely transcriptomics in the weedy Orobanchaceae and a core orthogroup analysis (CROG) in Rosaceae. PlantTribes2 is freely available for use within the main public Galaxy instance and can be downloaded from GitHub or Bioconda. Importantly, PlantTribes2 can be readily adapted for use with genomic and transcriptomic data from any kind of organism.
Collapse
Affiliation(s)
- Eric K Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Huiting Zhang
- Tree Fruit Research Laboratory, United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Wenatchee, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Gregory Von Kuster
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, United States
| | | | - Loren A Honaas
- Tree Fruit Research Laboratory, United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Wenatchee, WA, United States
| | - Claude W dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
20
|
Apollonio N, Blankenberg D, Cumbo F, Franciosa PG, Santoni D. Evaluating homophily in networks via HONTO (HOmophily Network TOol): a case study of chromosomal interactions in human PPI networks. Bioinformatics 2023; 39:6849517. [PMID: 36440918 PMCID: PMC9805585 DOI: 10.1093/bioinformatics/btac763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/04/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022] Open
Abstract
SUMMARY It has been observed in different kinds of networks, such as social or biological ones, a typical behavior inspired by the general principle 'similarity breeds connections'. These networks are defined as homophilic as nodes belonging to the same class preferentially interact with each other. In this work, we present HONTO (HOmophily Network TOol), a user-friendly open-source Python3 package designed to evaluate and analyze homophily in complex networks. The tool takes in input from the network along with a partition of its nodes into classes and yields a matrix whose entries are the homophily/heterophily z-score values. To complement the analysis, the tool also provides z-score values of nodes that do not interact with any other node of the same class. Homophily/heterophily z-scores values are presented as a heatmap allowing a visual at-a-glance interpretation of results. AVAILABILITY AND IMPLEMENTATION Tool's source code is available at https://github.com/cumbof/honto under the MIT license, installable as a package from PyPI (pip install honto) and conda-forge (conda install -c conda-forge honto), and has a wrapper for the Galaxy platform available on the official Galaxy ToolShed (Blankenberg et al., 2014) at https://toolshed.g2.bx.psu.edu/view/fabio/honto.
Collapse
Affiliation(s)
- Nicola Apollonio
- Institute for Applied Mathematics “Mauro Picone”, National Research Council of Italy, Rome 00185, Italy
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | | | - Daniele Santoni
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council of Italy, Rome 00185, Italy
| |
Collapse
|
21
|
Sarwal V, Brito J, Mangul S, Koslicki D. TAMPA: interpretable analysis and visualization of metagenomics-based taxon abundance profiles. Gigascience 2022; 12:giad008. [PMID: 36852763 PMCID: PMC9972184 DOI: 10.1093/gigascience/giad008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 11/12/2022] [Accepted: 02/02/2023] [Indexed: 03/01/2023] Open
Abstract
BACKGROUND Metagenomic taxonomic profiling aims to predict the identity and relative abundance of taxa in a given whole-genome sequencing metagenomic sample. A recent surge in computational methods that aim to accurately estimate taxonomic profiles, called taxonomic profilers, has motivated community-driven efforts to create standardized benchmarking datasets and platforms, standardized taxonomic profile formats, and a benchmarking platform to assess tool performance. While this standardization is essential, there is currently a lack of tools to visualize the standardized output of the many existing taxonomic profilers. Thus, benchmarking studies rely on a single-value metrics to compare performance of tools and compare to benchmarking datasets. This is one of the major problems in analyzing metagenomic profiling data, since single metrics, such as the F1 score, fail to capture the biological differences between the datasets. FINDINGS Here we report the development of TAMPA (Taxonomic metagenome profiling evaluation), a robust and easy-to-use method that allows scientists to easily interpret and interact with taxonomic profiles produced by the many different taxonomic profiler methods beyond the standard metrics used by the scientific community. We demonstrate the unique ability of TAMPA to generate a novel biological hypothesis by highlighting the taxonomic differences between samples otherwise missed by commonly utilized metrics. CONCLUSION In this study, we show that TAMPA can help visualize the output of taxonomic profilers, enabling biologists to effectively choose the most appropriate profiling method to use on their metagenomics data. TAMPA is available on GitHub, Bioconda, and Galaxy Toolshed at https://github.com/dkoslicki/TAMPA and is released under the MIT license.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California–Los Angeles, Los Angeles, CA 90095, USA
| | - Jaqueline Brito
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences,University of Southern California, Los Angeles, CA 90089, USA
| | - Serghei Mangul
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences,University of Southern California, Los Angeles, CA 90089, USA
- Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - David Koslicki
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
22
|
Joshi C, Chaudhari A, Joshi C, Joshi M, Bagatharia S. Repurposing of the herbal formulations: molecular docking and molecular dynamics simulation studies to validate the efficacy of phytocompounds against SARS-CoV-2 proteins. J Biomol Struct Dyn 2022; 40:8405-8419. [PMID: 33988079 PMCID: PMC8127611 DOI: 10.1080/07391102.2021.1922095] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 03/26/2021] [Indexed: 12/15/2022]
Abstract
Herbal formulations mentioned in traditional medicinal texts were investigated for in silico effect against SARS-COV-2 proteins involved in various functions of a virus such as attachment, entry, replication, transcription, etc. To repurpose and validate polyherbal formulations, molecular docking was performed to study the interactions of more than 150 compounds from various formulations against the SARS-CoV-2 proteins. Molecular dynamics (MD) simulation was performed to evaluate the interaction of top scored ligands with the various receptor proteins. The docking results showed that Liquiritic acid, Liquorice acid, Terchebulin, Glabrolide, Casuarinin, Corilagin, Chebulagic acid, Neochebulinic acid, Daturataturin A, and Taraxerol were effective against SARS-COV-2 proteins with higher binding affinities with different proteins. Results of MD simulations validated the stability of ligands from potent formulations with various receptors of SARS-CoV-2. Binding free energy analysis suggested the favourable interactions of phytocompounds with the recpetors. Besides, in silico comparison of the various formulations determined that Pathyadi kwath, Sanjeevani vati, Yashtimadhu, Tribhuvan Keeratiras, and Septillin were more effective than Samshamni vati, AYUSH-64, and Trikatu. Polyherbal formulations having anti-COVID-19 potential can be used for the treatment with adequate monitoring. New formulations may also be developed for systematic trials based on ranking from these studies.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Chinmayi Joshi
- Gujarat Biotechnology Research Centre, Gandhinagar, Gujarat, India
| | - Armi Chaudhari
- Gujarat Biotechnology Research Centre, Gandhinagar, Gujarat, India
| | - Chaitanya Joshi
- Gujarat Biotechnology Research Centre, Gandhinagar, Gujarat, India
| | - Madhvi Joshi
- Gujarat Biotechnology Research Centre, Gandhinagar, Gujarat, India
| | | |
Collapse
|
23
|
Lao J, Lacroix T, Guédon G, Coluzzi C, Payot S, Leblond-Bourget N, Chiapello H. ICEscreen: a tool to detect Firmicute ICEs and IMEs, isolated or enclosed in composite structures. NAR Genom Bioinform 2022; 4:lqac079. [PMID: 36285285 PMCID: PMC9585547 DOI: 10.1093/nargab/lqac079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 10/03/2022] [Accepted: 10/06/2022] [Indexed: 11/23/2022] Open
Abstract
Mobile Genetic Elements (MGEs) are integrated in bacterial genomes and key elements that drive prokaryote genome evolution. Among them are Integrative and Conjugative Elements (ICEs) and Integrative Mobilizable Elements (IMEs) which are important for bacterial fitness since they frequently carry genes participating in important bacterial adaptation phenotypes such as antibiotic resistance, virulence or specialized metabolic pathways. Although ICEs and IMEs are widespread, they are as yet almost never annotated in public bacterial genomes. To address the need of dedicated strategies for the annotation of these elements, we developed ICEscreen, a tool that introduces two new features to detect ICEs and IMEs in Firmicute genomes. First, ICEscreen uses an efficient strategy to detect Signature Proteins of ICEs and IMEs based on a database dedicated to Firmicutes and composed of manually curated proteins and Hidden Markov Models (HMM) profiles. Second, ICEscreen includes a new original algorithm that detects composite structures of ICEs and IMEs that are frequent in genomes of Firmicutes but are currently not resolved by any other tool. We benchmarked ICEscreen on experimentally supported elements and on a public dataset of 246 manually annotated elements including the genomes of 40 Firmicutes and demonstrate its efficiency to detect ICEs and IMEs.
Collapse
Affiliation(s)
| | | | - Gérard Guédon
- Université de Lorraine, INRAE, DynAMic, F-54000 Nancy, France
| | - Charles Coluzzi
- Université Paris-Saclay, INRAE, MaIAGE, F-78350 Jouy-en-Josas, France,Université de Lorraine, INRAE, DynAMic, F-54000 Nancy, France
| | - Sophie Payot
- Université de Lorraine, INRAE, DynAMic, F-54000 Nancy, France
| | | | - Hélène Chiapello
- To whom correspondence should be addressed. Tel: +33 1 34652884; Fax: +33 1 34652217;
| |
Collapse
|
24
|
Vasu K, Khan D, Ramachandiran I, Blankenberg D, Fox P. Analysis of nested alternate open reading frames and their encoded proteins. NAR Genom Bioinform 2022; 4:lqac076. [PMID: 36267124 PMCID: PMC9580016 DOI: 10.1093/nargab/lqac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/14/2022] [Accepted: 09/27/2022] [Indexed: 11/22/2022] Open
Abstract
Transcriptional and post-transcriptional mechanisms diversify the proteome beyond gene number, while maintaining a sequence relationship between original and altered proteins. A new mechanism breaks this paradigm, generating novel proteins by translating alternative open reading frames (Alt-ORFs) within canonical host mRNAs. Uniquely, ‘alt-proteins’ lack sequence homology with host ORF-derived proteins. We show global amino acid frequencies, and consequent biochemical characteristics of Alt-ORFs nested within host ORFs (nAlt-ORFs), are genetically-driven, and predicted by summation of frequencies of hundreds of encompassing host codon-pairs. Analysis of 101 human nAlt-ORFs of length ≥150 codons confirms the theoretical predictions, revealing an extraordinarily high median isoelectric point (pI) of 11.68, due to anomalous charged amino acid levels. Also, nAlt-ORF proteins exhibit a >2-fold preference for reading frame 2 versus 3, predicted mitochondrial and nuclear localization, and elevated codon adaptation index indicative of natural selection. Our results provide a theoretical and conceptual framework for exploration of these largely unannotated, but potentially significant, alternative ORFs and their encoded proteins.
Collapse
Affiliation(s)
- Kommireddy Vasu
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Debjit Khan
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Iyappan Ramachandiran
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Daniel Blankenberg
- Correspondence may also be addressed to Daniel Blankenberg. Tel: +1 216 444 4336;
| | - Paul L Fox
- To whom correspondence should be addressed. Tel: +1 216 444 8053; Fax: +1 216 444 9404;
| |
Collapse
|
25
|
The automated Galaxy-SynBioCAD pipeline for synthetic biology design and engineering. Nat Commun 2022; 13:5082. [PMID: 36038542 PMCID: PMC9424320 DOI: 10.1038/s41467-022-32661-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 08/11/2022] [Indexed: 11/27/2022] Open
Abstract
Here we introduce the Galaxy-SynBioCAD portal, a toolshed for synthetic biology, metabolic engineering, and industrial biotechnology. The tools and workflows currently shared on the portal enables one to build libraries of strains producing desired chemical targets covering an end-to-end metabolic pathway design and engineering process from the selection of strains and targets, the design of DNA parts to be assembled, to the generation of scripts driving liquid handlers for plasmid assembly and strain transformations. Standard formats like SBML and SBOL are used throughout to enforce the compatibility of the tools. In a study carried out at four different sites, we illustrate the link between pathway design and engineering with the building of a library of E. coli lycopene-producing strains. We also benchmark our workflows on literature and expert validated pathways. Overall, we find an 83% success rate in retrieving the validated pathways among the top 10 pathways generated by the workflows.
Collapse
|
26
|
Soudier P, Zúñiga A, Duigou T, Voyvodic PL, Bazi-Kabbaj K, Kushwaha M, Vendrell JA, Solassol J, Bonnet J, Faulon JL. PeroxiHUB: A Modular Cell-Free Biosensing Platform Using H 2O 2 as Signal Integrator. ACS Synth Biol 2022; 11:2578-2588. [PMID: 35913043 DOI: 10.1021/acssynbio.2c00138] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Cell-free systems have great potential for delivering robust, inexpensive, and field-deployable biosensors. Many cell-free biosensors rely on transcription factors responding to small molecules, but their discovery and implementation still remain challenging. Here we report the engineering of PeroxiHUB, an optimized H2O2-centered sensing platform supporting cell-free detection of different metabolites. H2O2 is a central metabolite and a byproduct of numerous enzymatic reactions. PeroxiHUB uses enzymatic transducers to convert metabolites of interest into H2O2, enabling rapid reprogramming of sensor specificity using alternative transducers. We first screen several transcription factors and optimize OxyR for the transcriptional response to H2O2 in a cell-free system, highlighting the need for preincubation steps to obtain suitable signal-to-noise ratios. We then demonstrate modular detection of metabolites of clinical interest─lactate, sarcosine, and choline─using different transducers mined via a custom retrosynthesis workflow publicly available on the SynBioCAD Galaxy portal. We find that expressing the transducer during the preincubation step is crucial for optimal sensor operation. We then show that different reporters can be connected to PeroxiHUB, providing high adaptability for various applications. Finally, we demonstrate that a peroxiHUB lactate biosensor can detect endogenous levels of this metabolite in clinical samples. Given the wide range of enzymatic reactions producing H2O2, the PeroxiHUB platform will support cell-free detection of a large number of metabolites in a modular and scalable fashion.
Collapse
Affiliation(s)
- Paul Soudier
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78352 Jouy-en-Josas, France.,Université de Montpellier, INSERM, CNRS, Centre de Biologie Structurale, 34090 Montpellier, France
| | - Ana Zúñiga
- Université de Montpellier, INSERM, CNRS, Centre de Biologie Structurale, 34090 Montpellier, France
| | - Thomas Duigou
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78352 Jouy-en-Josas, France
| | - Peter L Voyvodic
- Université de Montpellier, INSERM, CNRS, Centre de Biologie Structurale, 34090 Montpellier, France
| | - Kenza Bazi-Kabbaj
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78352 Jouy-en-Josas, France
| | - Manish Kushwaha
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78352 Jouy-en-Josas, France
| | - Julie A Vendrell
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France
| | - Jerome Solassol
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France.,IRCM, INSERM, Univ Montpellier, ICM, 34298 Montpellier, France
| | - Jerome Bonnet
- Université de Montpellier, INSERM, CNRS, Centre de Biologie Structurale, 34090 Montpellier, France
| | - Jean-Loup Faulon
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78352 Jouy-en-Josas, France
| |
Collapse
|
27
|
Lelwala RV, LeBlanc Z, Gauthier MEA, Elliott CE, Constable FE, Murphy G, Tyle C, Dinsdale A, Whattam M, Pattemore J, Barrero RA. Implementation of GA-VirReport, a Web-Based Bioinformatics Toolkit for Post-Entry Quarantine Screening of Virus and Viroids in Plants. Viruses 2022; 14:v14071480. [PMID: 35891459 PMCID: PMC9317486 DOI: 10.3390/v14071480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 06/29/2022] [Accepted: 06/29/2022] [Indexed: 02/01/2023] Open
Abstract
High-throughput sequencing (HTS) of host plant small RNA (sRNA) is a popular approach for plant virus and viroid detection. The major bottlenecks for implementing this approach in routine virus screening of plants in quarantine include lack of computational resources and/or expertise in command-line environments and limited availability of curated plant virus and viroid databases. We developed: (1) virus and viroid report web-based bioinformatics workflows on Galaxy Australia called GA-VirReport and GA-VirReport-Stats for detecting viruses and viroids from host plant sRNA extracts and (2) a curated higher plant virus and viroid database (PVirDB). We implemented sRNA sequencing with unique dual indexing on a set of plants with known viruses. Sequencing data were analyzed using GA-VirReport and PVirDB to validate these resources. We detected all known viruses in this pilot study with no cross-sample contamination. We then conducted a large-scale diagnosis of 105 imported plants processed at the post-entry quarantine facility (PEQ), Australia. We detected various pathogens in 14 imported plants and discovered that de novo assembly using 21–22 nt sRNA fraction and the megablast algorithm yielded better sensitivity and specificity. This study reports the successful, large-scale implementation of HTS and a user-friendly bioinformatics workflow for virus and viroid screening of imported plants at the PEQ.
Collapse
Affiliation(s)
- Ruvini V. Lelwala
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
- Science and Surveillance Group, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (C.E.E.); (J.P.)
| | - Zacharie LeBlanc
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
| | - Marie-Emilie A. Gauthier
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
| | - Candace E. Elliott
- Science and Surveillance Group, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (C.E.E.); (J.P.)
| | - Fiona E. Constable
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083, Australia;
| | - Greg Murphy
- Technology Infrastructure Branch, Information Services Division, Department of Agriculture, Fisheries and Forestry, Canberra, ACT 2601, Australia; (G.M.); (C.T.)
| | - Callum Tyle
- Technology Infrastructure Branch, Information Services Division, Department of Agriculture, Fisheries and Forestry, Canberra, ACT 2601, Australia; (G.M.); (C.T.)
| | - Adrian Dinsdale
- Plant Innovation Centre, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (A.D.); (M.W.)
| | - Mark Whattam
- Plant Innovation Centre, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (A.D.); (M.W.)
| | - Julie Pattemore
- Science and Surveillance Group, Post Entry Quarantine, Department of Agriculture, Fisheries and Forestry, Mickleham, VIC 3064, Australia; (C.E.E.); (J.P.)
| | - Roberto A. Barrero
- eResearch, Research Infrastructure, Academic Division, Queensland University of Technology, Brisbane, QLD 4001, Australia; (R.V.L.); (Z.L.); (M.-E.A.G.)
- Correspondence:
| |
Collapse
|
28
|
PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling. BMC Bioinformatics 2022; 23:197. [PMID: 35643441 PMCID: PMC9148462 DOI: 10.1186/s12859-022-04727-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 05/11/2022] [Indexed: 11/28/2022] Open
Abstract
Background Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods.
Results To address the above mentioned barriers, we have implemented the peptide design and analysis under Galaxy (PDAUG) package, a Galaxy-based Python powered collection of tools, workflows, and datasets for rapid in-silico peptide library analysis. In contrast to existing methods like standard programming libraries or rigid single-function web-based tools, PDAUG offers an integrated GUI-based toolset, providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Finally, we demonstrate the usability of PDAUG in predicting anticancer properties of peptides using four different feature sets and assess the suitability of various ML algorithms. Conclusion PDAUG offers tools for peptide library generation, data visualization, built-in and public database peptide sequence retrieval, peptide feature calculation, and machine learning (ML) modeling. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04727-6.
Collapse
|
29
|
Babu VMP, Sankari S, Ghosal A, Walker GC. A Mutant Era GTPase Suppresses Phenotypes Caused by Loss of Highly Conserved YbeY Protein in Escherichia coli. Front Microbiol 2022; 13:896075. [PMID: 35663862 PMCID: PMC9159920 DOI: 10.3389/fmicb.2022.896075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 04/13/2022] [Indexed: 12/03/2022] Open
Abstract
Ribosome assembly is a complex fundamental cellular process that involves assembling multiple ribosomal proteins and several ribosomal RNA species in a highly coordinated yet flexible and resilient manner. The highly conserved YbeY protein is a single-strand specific endoribonuclease, important for ribosome assembly, 16S rRNA processing, and ribosome quality control. In Escherichia coli, ybeY deletion results in pleiotropic phenotypes including slow growth, temperature sensitivity, accumulation of precursors of 16S rRNA, and impaired formation of fully assembled 70S subunits. Era, an essential highly conserved GTPase protein, interacts with many ribosomal proteins, and its depletion results in ribosome assembly defects. YbeY has been shown to interact with Era together with ribosomal protein S11. In this study, we have analyzed a suppressor mutation, era(T99I), that can partially suppress a subset of the multiple phenotypes of ybeY deletion. The era(T99I) allele was able to improve 16S rRNA processing and ribosome assembly at 37°C. However, it failed to suppress the temperature sensitivity and did not improve 16S rRNA stability. The era(T99I) allele was also unable to improve the 16S rRNA processing defects caused by the loss of ribosome maturation factors. We also show that era(T99I) increases the GroEL levels in the 30S ribosome fractions independent of YbeY. We propose that the mechanism of suppression is that the changes in Era's structure caused by the era(T99I) mutation affect its GTP/GDP cycle in a way that increases the half-life of RNA binding to Era, thereby facilitating alternative processing of the 16S RNA precursor. Taken together, this study offers insights into the role of Era and YbeY in ribosome assembly and 16S rRNA processing events.
Collapse
Affiliation(s)
| | | | | | - Graham C. Walker
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
30
|
Moreira RS, Filho VB, Calomeno NA, Wagner G, Miletti LC. EpiBuilder: A Tool for Assembling, Searching, and Classifying B-Cell Epitopes. Bioinform Biol Insights 2022; 16:11779322221095221. [PMID: 35571557 PMCID: PMC9102138 DOI: 10.1177/11779322221095221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 03/29/2022] [Indexed: 11/16/2022] Open
Abstract
Epitopes are portions of a protein that are recognized by antibodies. These small amino acid sequences represent a significant breakthrough in a branch of bioinformatics called immunoinformatics. Various software are available for linear B-cell epitope (BCE) prediction such as ABCPred, SVMTrip, EpiDope, and EpitopeVec; a well-known BCE predictor is BepiPred-2.0. However, despite the prediction, there are several essential steps, such as epitope assembly, evaluation, and searching for epitopes in other proteomes. Here, we present EpiBuilder (https://epibuilder.sourceforge.io), a user friendly software that assists in epitope assembly, classifying and searching using input results of BepiPred-2.0. EpiBuilder generates several output results from these data and supports a proteome-wide processing approach. In addition, this software provides the following features: Chou & Fasman beta-turn prediction, Emini surface accessibility prediction, Karplus and Schulz flexibility prediction, Kolaskar and Tongaonkar antigenicity, Parker hydrophilicity prediction, N-glycosylation domains, and hydropathy. These information generate a unique topology for each epitope, visually demonstrating its characteristics. The software can search the entire epitope sequence in various FASTA files, and it allows to use BLASTP to identify epitopes that eventually have sequence variations. As an EpiBuilder application, we developed a epitope dataset from the protozoan Trypanosoma brucei gambiense, the gram-positive bacterium Clostridioides difficile, and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
Collapse
Affiliation(s)
- Renato Simões Moreira
- Laboratório de Hemoparasitas e Vetores, Departamento de Produção Animal e Alimentos, Centro de Ciências Agroveterinárias (CAV), Universidade do Estado de Santa Catarina (UDESC), Lages, Brazil
- Instituto Federal de Santa Catarina (IFSC), Lages, Brazil
| | - Vilmar Benetti Filho
- Laboratório de Bioinformática, Universidade Federal de Santa Catarina, Florianópolis, Brazil
| | - Nathália Anderson Calomeno
- Laboratório de Hemoparasitas e Vetores, Departamento de Produção Animal e Alimentos, Centro de Ciências Agroveterinárias (CAV), Universidade do Estado de Santa Catarina (UDESC), Lages, Brazil
| | - Glauber Wagner
- Laboratório de Bioinformática, Universidade Federal de Santa Catarina, Florianópolis, Brazil
| | - Luiz Claudio Miletti
- Laboratório de Hemoparasitas e Vetores, Departamento de Produção Animal e Alimentos, Centro de Ciências Agroveterinárias (CAV), Universidade do Estado de Santa Catarina (UDESC), Lages, Brazil
| |
Collapse
|
31
|
Kolpakov F, Akberdin I, Kiselev I, Kolmykov S, Kondrakhin Y, Kulyashov M, Kutumova E, Pintus S, Ryabova A, Sharipov R, Yevshin I, Zhatchenko S, Kel A. BioUML-towards a universal research platform. Nucleic Acids Res 2022; 50:W124-W131. [PMID: 35536253 PMCID: PMC9252820 DOI: 10.1093/nar/gkac286] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 12/12/2022] Open
Abstract
BioUML (https://www.biouml.org)—is a web-based integrated platform for systems biology and data analysis. It supports visual modelling and construction of hierarchical biological models that allow us to construct the most complex modular models of blood pressure regulation, skeletal muscle metabolism, COVID-19 epidemiology. BioUML has been integrated with git repositories where users can store their models and other data. We have also expanded the capabilities of BioUML for data analysis and visualization of biomedical data: (i) any programs and Jupyter kernels can be plugged into the BioUML platform using Docker technology; (ii) BioUML is integrated with the Galaxy and Galaxy Tool Shed; (iii) BioUML provides two-way integration with R and Python (Jupyter notebooks): scripts can be executed on the BioUML web pages, and BioUML functions can be called from scripts; (iv) using plug-in architecture, specialized viewers and editors can be added. For example, powerful genome browsers as well as viewers for molecular 3D structure are integrated in this way; (v) BioUML supports data analyses using workflows (own format, Galaxy, CWL, BPMN, nextFlow). Using these capabilities, we have initiated a new branch of the BioUML development—u-science—a universal scientific platform that can be configured for specific research requirements.
Collapse
Affiliation(s)
- Fedor Kolpakov
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation.,Budker Institute of Nuclear Physics SB RAS, Novosibirsk 630090, Russian Federation
| | - Ilya Akberdin
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation.,Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Ilya Kiselev
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation.,Budker Institute of Nuclear Physics SB RAS, Novosibirsk 630090, Russian Federation
| | - Semyon Kolmykov
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation
| | - Yury Kondrakhin
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation.,Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation
| | | | - Elena Kutumova
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Sergey Pintus
- Sirius University of Science and Technology, Sochi 354340, Russian Federation
| | - Anna Ryabova
- Sirius University of Science and Technology, Sochi 354340, Russian Federation
| | - Ruslan Sharipov
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation.,Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Ivan Yevshin
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation
| | - Sergey Zhatchenko
- Sirius University of Science and Technology, Sochi 354340, Russian Federation.,Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation
| | - Alexander Kel
- Biosoft.ru, LLC, Novosibirsk 630058, Russian Federation.,geneXplain GmbH, Wolfenbüttel 38302, Germany
| |
Collapse
|
32
|
Pinter N, Glätzer D, Fahrner M, Fröhlich K, Johnson J, Grüning BA, Warscheid B, Drepper F, Schilling O, Föll MC. MaxQuant and MSstats in Galaxy Enable Reproducible Cloud-Based Analysis of Quantitative Proteomics Experiments for Everyone. J Proteome Res 2022; 21:1558-1565. [PMID: 35503992 DOI: 10.1021/acs.jproteome.2c00051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Quantitative mass spectrometry-based proteomics has become a high-throughput technology for the identification and quantification of thousands of proteins in complex biological samples. Two frequently used tools, MaxQuant and MSstats, allow for the analysis of raw data and finding proteins with differential abundance between conditions of interest. To enable accessible and reproducible quantitative proteomics analyses in a cloud environment, we have integrated MaxQuant (including TMTpro 16/18plex), Proteomics Quality Control (PTXQC), MSstats, and MSstatsTMT into the open-source Galaxy framework. This enables the web-based analysis of label-free and isobaric labeling proteomics experiments via Galaxy's graphical user interface on public clouds. MaxQuant and MSstats in Galaxy can be applied in conjunction with thousands of existing Galaxy tools and integrated into standardized, sharable workflows. Galaxy tracks all metadata and intermediate results in analysis histories, which can be shared privately for collaborations or publicly, allowing full reproducibility and transparency of published analysis. To further increase accessibility, we provide detailed hands-on training materials. The integration of MaxQuant and MSstats into the Galaxy framework enables their usage in a reproducible way on accessible large computational infrastructures, hence realizing the foundation for high-throughput proteomics data science for everyone.
Collapse
Affiliation(s)
- Niko Pinter
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Damian Glätzer
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Klemens Fröhlich
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | | | - Bettina Warscheid
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Faculty of Chemistry and Pharmacy, Department of Biochemistry, Julius Maximilian University of Würzburg, 97074 Würzburg, Germany
| | - Friedel Drepper
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), 79106 Freiburg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
33
|
Shao D, Kellogg GD, Nematbakhsh A, Kuntala PK, Mahony S, Pugh BF, Lai WKM. PEGR: a flexible management platform for reproducible epigenomic and genomic research. Genome Biol 2022; 23:99. [PMID: 35440038 PMCID: PMC9016988 DOI: 10.1186/s13059-022-02671-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 04/07/2022] [Indexed: 11/27/2022] Open
Abstract
Reproducibility is a significant challenge in (epi)genomic research due to the complexity of experiments composed of traditional biochemistry and informatics. Recent advances have exacerbated this as high-throughput sequencing data is generated at an unprecedented pace. Here, we report the development of a Platform for Epi-Genomic Research (PEGR), a web-based project management platform that tracks and quality controls experiments from conception to publication-ready figures, compatible with multiple assays and bioinformatic pipelines. It supports rigor and reproducibility for biochemists working at the bench, while fully supporting reproducibility and reliability for bioinformaticians through integration with the Galaxy platform.
Collapse
Affiliation(s)
- Danying Shao
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA, 16802, USA
| | - Gretta D Kellogg
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, 14850, USA
| | - Ali Nematbakhsh
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, 14850, USA
| | - Prashant K Kuntala
- Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Shaun Mahony
- Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, 16802, USA
| | - B Franklin Pugh
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14850, USA
| | - William K M Lai
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14850, USA. .,Department of Computational Biology, Cornell University, Ithaca, NY, 14850, USA.
| |
Collapse
|
34
|
Soiland-Reyes S, Bayarri G, Andrio P, Long R, Lowe D, Niewielska A, Hospital A, Groth P. Making Canonical Workflow Building Blocks Interoperable across Workflow Languages. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
We introduce the concept of Canonical Workflow Building Blocks (CWBB), a methodology of describing and wrapping computational tools, in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms. The concept is implemented and demonstrated with the BioExcel Building Blocks library (BioBB), a collection of tool wrappers in the field of computational biomolecular simulation. Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow, built using this library and run with 5 different Workflow Manager Systems (WfMS). We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research (CWFR) in order to improve widespread adoption and reuse of computational methods across workflow language barriers.
Collapse
Affiliation(s)
- Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, Manchester M13 9PL, UK
- Informatics Institute, University of Amsterdam, Amsterdam 1000 GG, The Nehterlands
| | - Genís Bayarri
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain
| | - Pau Andrio
- The Spanish National Bioinformatics Institute (INB), Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Robin Long
- Data Science Institute, Lancaster University, Lancaster, Lancashire LA1 4YW, UK
- Research IT, IT Services, The University of Manchester, Manchester, Manchester M13 9PL, UK
| | - Douglas Lowe
- Research IT, IT Services, The University of Manchester, Manchester, Manchester M13 9PL, UK
| | - Ania Niewielska
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain
| | - Paul Groth
- Informatics Institute, University of Amsterdam, Amsterdam 1000 GG, The Nehterlands
| |
Collapse
|
35
|
Salazar R, Arbeithuber B, Ivankovic M, Heinzl M, Moura S, Hartl I, Mair T, Lahnsteiner A, Ebner T, Shebl O, Pröll J, Tiemann-Boege I. Discovery of an unusually high number of de novo mutations in sperm of older men using duplex sequencing. Genome Res 2022; 32:499-511. [PMID: 35210354 PMCID: PMC8896467 DOI: 10.1101/gr.275695.121] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 01/14/2022] [Indexed: 11/25/2022]
Abstract
De novo mutations (DNMs) are important players in heritable diseases and evolution. Of particular interest are highly recurrent DNMs associated with congenital disorders that have been described as selfish mutations expanding in the male germline, thus becoming more frequent with age. Here, we have adapted duplex sequencing (DS), an ultradeep sequencing method that renders sequence information on both DNA strands; thus, one mutation can be reliably called in millions of sequenced bases. With DS, we examined ∼4.5 kb of the FGFR3 coding region in sperm DNA from older and younger donors. We identified sites with variant allele frequencies (VAFs) of 10-4 to 10-5, with an overall mutation frequency of the region of ∼6 × 10-7 Some of the substitutions are recurrent and are found at a higher VAF in older donors than in younger ones or are found exclusively in older donors. Also, older donors harbor more mutations associated with congenital disorders. Other mutations are present in both age groups, suggesting that these might result from a different mechanism (e.g., postzygotic mosaicism). We also observe that independent of age, the frequency and deleteriousness of the mutational spectra are more similar to COSMIC than to gnomAD variants. Our approach is an important strategy to identify mutations that could be associated with a gain of function of the receptor tyrosine kinase activity, with unexplored consequences in a society with delayed fatherhood.
Collapse
Affiliation(s)
- Renato Salazar
- Institute of Biophysics, Johannes Kepler University, Linz, Austria 4020
| | | | - Maja Ivankovic
- Institute of Biophysics, Johannes Kepler University, Linz, Austria 4020
| | - Monika Heinzl
- Institute of Biophysics, Johannes Kepler University, Linz, Austria 4020
| | - Sofia Moura
- Institute of Biophysics, Johannes Kepler University, Linz, Austria 4020
| | - Ingrid Hartl
- Institute of Biophysics, Johannes Kepler University, Linz, Austria 4020
| | - Theresa Mair
- Institute of Biophysics, Johannes Kepler University, Linz, Austria 4020
| | | | - Thomas Ebner
- Department of Gynecology, Obstetrics and Gynecological Endocrinology, Kepler University Hospital, Linz, Austria 4020
| | - Omar Shebl
- Department of Gynecology, Obstetrics and Gynecological Endocrinology, Kepler University Hospital, Linz, Austria 4020
| | - Johannes Pröll
- Center for Medical Research, Faculty of Medicine, Johannes Kepler University, Linz, Austria 4020
| | | |
Collapse
|
36
|
Sun Q, Nematbakhsh A, Kuntala PK, Kellogg G, Pugh BF, Lai WKM. STENCIL: A web templating engine for visualizing and sharing life science datasets. PLoS Comput Biol 2022; 18:e1009859. [PMID: 35139076 PMCID: PMC8863220 DOI: 10.1371/journal.pcbi.1009859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 02/22/2022] [Accepted: 01/24/2022] [Indexed: 11/25/2022] Open
Abstract
The ability to aggregate experimental data analysis and results into a concise and interpretable format is a key step in evaluating the success of an experiment. This critical step determines baselines for reproducibility and is a key requirement for data dissemination. However, in practice it can be difficult to consolidate data analyses that encapsulates the broad range of datatypes available in the life sciences. We present STENCIL, a web templating engine designed to organize, visualize, and enable the sharing of interactive data visualizations. STENCIL leverages a flexible web framework for creating templates to render highly customizable visual front ends. This flexibility enables researchers to render small or large sets of experimental outcomes, producing high-quality downloadable and editable figures that retain their original relationship to the source data. REST API based back ends provide programmatic data access and supports easy data sharing. STENCIL is a lightweight tool that can stream data from Galaxy, a popular bioinformatic analysis web platform. STENCIL has been used to support the analysis and dissemination of two large scale genomic projects containing the complete data analysis for over 2,400 distinct datasets. Code and implementation details are available on GitHub: https://github.com/CEGRcode/stencil.
Collapse
Affiliation(s)
- Qi Sun
- Cornell Institute of Biotechnology, Cornell University, Ithaca, New York, United States of America
| | - Ali Nematbakhsh
- Cornell Institute of Biotechnology, Cornell University, Ithaca, New York, United States of America
| | - Prashant K. Kuntala
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Gretta Kellogg
- Cornell Institute of Biotechnology, Cornell University, Ithaca, New York, United States of America
| | - B. Franklin Pugh
- Department of Molecular Biology and Genetics, Cornell University, New York, United States of America
| | - William K. M. Lai
- Department of Molecular Biology and Genetics, Cornell University, New York, United States of America
- Department of Computational Biology, Cornell University, New York, United States of America
| |
Collapse
|
37
|
Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022; 23:6514404. [PMID: 35076693 PMCID: PMC8921630 DOI: 10.1093/bib/bbab563] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/03/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.
Collapse
Affiliation(s)
- Venket Raghavan
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | - Louis Kraft
- Corresponding authors: Venket Raghavan, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail: ; Louis Kraft, Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany. E-mail:
| | | | | |
Collapse
|
38
|
Go AC, Civetta A. Divergence of X-linked trans regulatory proteins and the misexpression of gene targets in sterile Drosophila pseudoobscura hybrids. BMC Genomics 2022; 23:30. [PMID: 34991488 PMCID: PMC8740060 DOI: 10.1186/s12864-021-08267-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/20/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The genetic basis of hybrid incompatibilities is characterized by pervasive cases of gene interactions. Sex chromosomes play a major role in speciation and X-linked hybrid male sterility (HMS) genes have been identified. Interestingly, some of these genes code for proteins with DNA binding domains, suggesting a capability to act as trans-regulatory elements and disturb the expression of a large number of gene targets. To understand how interactions between trans- and cis-regulatory elements contribute to speciation, we aimed to map putative X-linked trans-regulatory elements and to identify gene targets with disrupted gene expression in sterile hybrids between the subspecies Drosophila pseudoobscura pseudoobscura and D. p. bogotana. RESULTS We find six putative trans-regulatory proteins within previously mapped X chromosome HMS loci with sequence changes that differentiate the two subspecies. Among them, the previously characterized HMS gene Overdrive (Ovd) had the largest number of amino acid changes between subspecies, with some substitutions localized within the protein's DNA binding domain. Using an introgression approach, we detected transcriptional responses associated with a sterility/fertility Ovd allele swap. We found a network of 52 targets of Ovd and identified cis-regulatory effects among target genes with disrupted expression in sterile hybrids. However, a combined analysis of polymorphism and divergence in non-coding sequences immediately upstream of target genes found no evidence of changes in candidate regulatory proximal cis-elements. Finally, peptidases were over-represented among target genes. CONCLUSIONS We provide evidence of divergence between subspecies within the DNA binding domain of the HMS protein Ovd and identify trans effects on the expression of 52 gene targets. Our results identify a network of trans-cis interactions with possible effects on HMS. This network provides molecular evidence of gene × gene incompatibilities as contributors to hybrid dysfunction.
Collapse
Affiliation(s)
- Alwyn C Go
- Department of Biology, University of Winnipeg, 515 Portage Ave, Winnipeg, MB, R3B 2E9, Canada
| | - Alberto Civetta
- Department of Biology, University of Winnipeg, 515 Portage Ave, Winnipeg, MB, R3B 2E9, Canada.
| |
Collapse
|
39
|
VijayKrishna N, Joshi J, Coraor N, Hillman-Jackson J, Bouvier D, van den Beek M, Eguinoa I, Coppens F, Davis J, Stolarczyk M, Sheffield NC, Gladman S, Cuccuru G, Grüning B, Soranzo N, Rasche H, Langhorst BW, Bernt M, Fornika D, de Lima Morais DA, Barrette M, van Heusden P, Petrillo M, Puertas-Gallardo A, Patak A, Hotz HR, Blankenberg D. Expanding the Galaxy's reference data. BIOINFORMATICS ADVANCES 2022; 2:vbac030. [PMID: 35669346 PMCID: PMC9155181 DOI: 10.1093/bioadv/vbac030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/01/2022] [Accepted: 04/26/2022] [Indexed: 01/27/2023]
Abstract
Summary Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.
Collapse
Affiliation(s)
| | - Jayadev Joshi
- Genomic Medicine Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Jennifer Hillman-Jackson
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Ignacio Eguinoa
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Frederik Coppens
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - John Davis
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Michał Stolarczyk
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | | | | | - Björn Grüning
- University of Freiburg, Freiburg im Breisgau, Germany
| | | | - Helena Rasche
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN Rotterdam, The Netherlands
| | | | - Matthias Bernt
- Department Computational Biology, Helmholtz Centre for Environmental Research, UFZ, 04318 Leipzig, Germany
| | - Dan Fornika
- BC Centre for Disease Control Public Health Laboratory, Vancouver, BC, Canada
| | | | - Michel Barrette
- Centre de Calcul Scientifique, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
| | - Mauro Petrillo
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | | | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Daniel Blankenberg
- Genomic Medicine Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- To whom correspondence should be addressed.
| |
Collapse
|
40
|
Glogovitis I, Yahubyan G, Würdinger T, Koppers-Lalic D, Baev V. miRGalaxy: Galaxy-Based Framework for Interactive Analysis of microRNA and isomiR Sequencing Data. Cancers (Basel) 2021; 13:cancers13225663. [PMID: 34830818 PMCID: PMC8616193 DOI: 10.3390/cancers13225663] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/25/2021] [Accepted: 11/08/2021] [Indexed: 12/13/2022] Open
Abstract
Simple Summary MicroRNAs are essential regulators of gene expression and potential non-invasive biomarker candidates for various human cancers as they can be detected in bodily fluids. Several tools have been developed to analyze small RNA-sequencing data; however, they have limitations and restrictions such as lack of optimal configuration, parameterization, and interoperability with other tools and platforms. miRGalaxy is an open-source, Galaxy-based framework for analyzing NGS data focusing on microRNAs and their sequence variants—isomiRs. Galaxy is a web-based platform for data-intensive biomedical research, allowing user-friendly analysis and accessibility to hundreds of tools. miRGalaxy is designed specifically for identifying and classifying human microRNAs and isomiRs, as well as detecting deregulated microRNAs and isomiRs between two test groups, summarized by output visualization. By examining the differential expression of individual isomiR species across samples, miRGalaxy can help discover novel biomarkers. Abstract Tools for microRNA (miR) sequencing data analyses are broadly used in biomedical research. However, the complexity of computational approaches still remains a challenge for biologists with scarce experience in data analytics and bioinformatics. Here, we present miRGalaxy, a Galaxy-based framework for comprehensive analysis of miRs and their sequence variants—miR isoforms (isomiRs). Though isomiRs are commonly reported in deep-sequencing experiments, their detailed structure complexity and specific differential expression (DE) remain not fully examined by the majority of the available analysis tools. miRGalaxy encompasses biologist-user-friendly tools and workflows dedicated to the analysis of the isomiR-ome and its complex behavior in various biological samples. miRGalaxy is developed as a modular, accessible, redistributable, shareable, and user-friendly framework for scientists working with small RNA (sRNA)-seq data. Due to its modular workflow, advanced users can customize the steps and tools for their needs. In addition, the framework provides an analysis report where the significant output results are summarized in charts and visualizations. miRGalaxy can be accessed via preconfigured Docker image flavor and a Toolshed installation if the user already has a running Galaxy instance. Over the last decade, studies on the expression of miRs and isomiRs in normal and deregulated tissues have led to the discovery of their potential as diagnostic biomarkers. The detection of miRs in biofluids further expanded the exploration of the miR repertoire as a source of liquid biopsy biomarkers. Here we show the miRGalaxy framework application for in-depth analysis of the sRNA-seq data from two different biofluids, milk and plasma, to identify, annotate, and discover specific differentially expressed miRs and isomiRs.
Collapse
Affiliation(s)
- Ilias Glogovitis
- Faculty of Biology, University of Plovdiv, Tzar Assen 24, 4000 Plovdiv, Bulgaria; (I.G.); (G.Y.)
- Department of Neurosurgery, Cancer Center Amsterdam, Amsterdam University Medical Centers, VU University Medical Center, De Boelelaan 1117, 1081 HV Amsterdam, The Netherlands; (T.W.); (D.K.-L.)
| | - Galina Yahubyan
- Faculty of Biology, University of Plovdiv, Tzar Assen 24, 4000 Plovdiv, Bulgaria; (I.G.); (G.Y.)
| | - Thomas Würdinger
- Department of Neurosurgery, Cancer Center Amsterdam, Amsterdam University Medical Centers, VU University Medical Center, De Boelelaan 1117, 1081 HV Amsterdam, The Netherlands; (T.W.); (D.K.-L.)
| | - Danijela Koppers-Lalic
- Department of Neurosurgery, Cancer Center Amsterdam, Amsterdam University Medical Centers, VU University Medical Center, De Boelelaan 1117, 1081 HV Amsterdam, The Netherlands; (T.W.); (D.K.-L.)
| | - Vesselin Baev
- Faculty of Biology, University of Plovdiv, Tzar Assen 24, 4000 Plovdiv, Bulgaria; (I.G.); (G.Y.)
- Correspondence:
| |
Collapse
|
41
|
Yuen D, Cabansay L, Duncan A, Luu G, Hogue G, Overbeck C, Perez N, Shands W, Steinberg D, Reid C, Olunwa N, Hansen R, Sheets E, O’Farrell A, Cullion K, O’Connor B, Paten B, Stein L. The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols. Nucleic Acids Res 2021; 49:W624-W632. [PMID: 33978761 PMCID: PMC8218198 DOI: 10.1093/nar/gkab346] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/01/2021] [Accepted: 04/26/2021] [Indexed: 11/24/2022] Open
Abstract
Dockstore (https://dockstore.org/) is an open source platform for publishing, sharing, and finding bioinformatics tools and workflows. The platform has facilitated large-scale biomedical research collaborations by using cloud technologies to increase the Findability, Accessibility, Interoperability and Reusability (FAIR) of computational resources, thereby promoting the reproducibility of complex bioinformatics analyses. Dockstore supports a variety of source repositories, analysis frameworks, and language technologies to provide a seamless publishing platform for authors to create a centralized catalogue of scientific software. The ready-to-use packaging of hundreds of tools and workflows, combined with the implementation of interoperability standards, enables users to launch analyses across multiple environments. Dockstore is widely used, more than twenty-five high-profile organizations share analysis collections through the platform in a variety of workflow languages, including the Broad Institute's GATK best practice and COVID-19 workflows (WDL), nf-core workflows (Nextflow), the Intergalactic Workflow Commission tools (Galaxy), and workflows from Seven Bridges (CWL) to highlight just a few. Here we describe the improvements made over the last four years, including the expansion of system integrations supporting authors, the addition of collaboration features and analysis platform integrations supporting users, and other enhancements that improve the overall scientific reproducibility of Dockstore content.
Collapse
Affiliation(s)
- Denis Yuen
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, Ontario M5V 3S1, Canada
| | - Louise Cabansay
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Andrew Duncan
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, Ontario M5V 3S1, Canada
| | - Gary Luu
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, Ontario M5V 3S1, Canada
| | - Gregory Hogue
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, Ontario M5V 3S1, Canada
| | - Charles Overbeck
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Natalie Perez
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Walt Shands
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - David Steinberg
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Chaz Reid
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Nneka Olunwa
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Richard Hansen
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Elizabeth Sheets
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Ash O’Farrell
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Kim Cullion
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, Ontario M5V 3S1, Canada
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95060, USA
| | - Lincoln Stein
- Adaptive Oncology, Ontario Institute for Cancer Research, Toronto, Ontario M5V 3S1, Canada
| |
Collapse
|
42
|
Harthern-Flint SL, Dolfing J, Mrozik W, Meynet P, Eland LE, Sim M, Davenport RJ. Experimental and Genomic Evaluation of the Oestrogen Degrading Bacterium Rhodococcus equi ATCC13557. Front Microbiol 2021; 12:670928. [PMID: 34276604 PMCID: PMC8281962 DOI: 10.3389/fmicb.2021.670928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/27/2021] [Indexed: 12/12/2022] Open
Abstract
Rhodococcus equi ATCC13557 was selected as a model organism to study oestrogen degradation based on its previous ability to degrade 17α-ethinylestradiol (EE2). Biodegradation experiments revealed that R. equi ATCC13557 was unable to metabolise EE2. However, it was able to metabolise E2 with the major metabolite being E1 with no further degradation of E1. However, the conversion of E2 into E1 was incomplete, with 11.2 and 50.6% of E2 degraded in mixed (E1-E2-EE2) and E2-only conditions, respectively. Therefore, the metabolic pathway of E2 degradation by R. equi ATCC13557 may have two possible pathways. The genome of R. equi ATCC13557 was sequenced, assembled, and mapped for the first time. The genome analysis allowed the identification of genes possibly responsible for the observed biodegradation characteristics of R. equi ATCC13557. Several genes within R. equi ATCC13557 are similar, but not identical in sequence, to those identified within the genomes of other oestrogen degrading bacteria, including Pseudomonas putida strain SJTE-1 and Sphingomonas strain KC8. Homologous gene sequences coding for enzymes potentially involved in oestrogen degradation, most commonly a cytochrome P450 monooxygenase (oecB), extradiol dioxygenase (oecC), and 17β-hydroxysteroid dehydrogenase (oecA), were identified within the genome of R. equi ATCC13557. These searches also revealed a gene cluster potentially coding for enzymes involved in steroid/oestrogen degradation; 3-carboxyethylcatechol 2,3-dioxygenase, 2-hydroxymuconic semialdehyde hydrolase, 3-alpha-(or 20-beta)-hydroxysteroid dehydrogenase, 3-(3-hydroxy-phenyl)propionate hydroxylase, cytochrome P450 monooxygenase, and 3-oxosteroid 1-dehydrogenase. Further, the searches revealed steroid hormone metabolism gene clusters from the 9, 10-seco pathway, therefore R. equi ATCC13557 also has the potential to metabolise other steroid hormones such as cholesterol.
Collapse
Affiliation(s)
| | - Jan Dolfing
- School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.,Faculty Engineering and Environment, Northumbria University, Newcastle upon Tyne, United Kingdom
| | - Wojciech Mrozik
- School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.,Department of Inorganic Chemistry, Faculty of Pharmacy, Medical University of Gdańsk, Gdańsk, Poland
| | - Paola Meynet
- School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Lucy E Eland
- School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Martin Sim
- School of Computing Science, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Russell J Davenport
- School of Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
43
|
Gu Q, Kumar A, Bray S, Creason A, Khanteymoori A, Jalili V, Grüning B, Goecks J. Galaxy-ML: An accessible, reproducible, and scalable machine learning toolkit for biomedicine. PLoS Comput Biol 2021; 17:e1009014. [PMID: 34061826 PMCID: PMC8213174 DOI: 10.1371/journal.pcbi.1009014] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 06/18/2021] [Accepted: 04/27/2021] [Indexed: 11/25/2022] Open
Abstract
Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.
Collapse
Affiliation(s)
- Qiang Gu
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Anup Kumar
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Simon Bray
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Allison Creason
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Alireza Khanteymoori
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Vahid Jalili
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, Oregon, United States of America
- The Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
- * E-mail:
| |
Collapse
|
44
|
Whole-Genome Sequencing and Annotation of 10 Endophytic and Epiphytic Bacteria Isolated from Lolium arundinaceum. Microbiol Resour Announc 2021; 10:10/19/e00317-21. [PMID: 33986094 PMCID: PMC8142580 DOI: 10.1128/mra.00317-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We report the whole-genome sequence and annotation of 10 endophytic and epiphytic bacteria isolated from the grass Lolium arundinaceum as part of a laboratory exercise in a Fundamentals of Plant Biochemistry and Pathology undergraduate course (BIOL403) at the Rochester Institute of Technology in Rochester, New York. We report the whole-genome sequence and annotation of 10 endophytic and epiphytic bacteria isolated from the grass Lolium arundinaceum as part of a laboratory exercise in a Fundamentals of Plant Biochemistry and Pathology undergraduate course (BIOL403) at the Rochester Institute of Technology in Rochester, New York.
Collapse
|
45
|
Saif R, Mahmood T, Ejaz A, Zia S, Qureshi AR. Whole genome comparison of Pakistani Corona virus with Chinese and US Strains along with its predictive severity of COVID-19. GENE REPORTS 2021; 23:101139. [PMID: 33875973 PMCID: PMC8046707 DOI: 10.1016/j.genrep.2021.101139] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 03/07/2021] [Accepted: 04/08/2021] [Indexed: 11/27/2022]
Abstract
Initially submitted 784 SARS-nCoV2 whole genome sequences on NCBI Virus database were selected for phylogenetic analysis to look into their similarities with two of Pakistani sequenced coronavirus strains having accessions of MT240479 and MT262993. The MT240479 named (Gilgit1-Pak) was found in close proximity to MT184913 named (CruiseA-USA), while MT262993 named (Manga-Pak) was in neighboring to MT039887 named (WI-USA) strain, which were further chosen for variant calling analysis along with reference genome NC_045512 as out-group to construct concluding cladogram and looked for evolutionary distance with PAUP software in this article. Aforementioned Pakistani strains each of having 29,836 bases were compared with MT263429 (WI-USA) of 29,889 bases and MT259229 (Wuhan-P.R. China) of 29,864 bases. Whole genome variant calling pipeline revealed 31 variants in both Pakistani strains collectively (Manga-Pak vs USA having 2del & 7SNPs, while different from Chinese strain with 2del & 2SNPs, similarly Gilgit1-Pak vs USA having 10SNPs, while different from Chinese strains having 8SNPs). These variants harbour ORF1ab, ORF1a and N genes having their role is viral replication/translation, host innate immunity and viral capsid formation respectively. These novel variants may be one of the reasons for low mortality rate in Pakistan with 385 deaths as compared to USA with 63,871 and P.R. China with 4633 by May 01, 2020. However functional characterization of these variants and their integrations with other viral proteins including variability of human receptors (ACE2 & NRP1) may be the other reasons for unlikely COVID-19 statistics in Pakistan which need further confirmatory studies. Moreover, mutated N and ORF1a proteins in Pakistani strains were also analyzed by 3D structure modeling, which give another dimension of comparing these alterations at amino acid level. In a nutshell, these novel variants are correlated with reduced mortality of COVID-19 severity in Pakistan while more robust results can be obtained by wet lab experimentation. This also gives insight of genomic landscape of these indigenous strains to develop diagnostics kits, vaccines and therapeutic interventions.
Collapse
Affiliation(s)
- Rashid Saif
- Decode Genomics, 323-D, Punjab University Employees Housing Scheme (II), Lahore, Pakistan
| | - Tania Mahmood
- Decode Genomics, 323-D, Punjab University Employees Housing Scheme (II), Lahore, Pakistan
| | - Aniqa Ejaz
- Decode Genomics, 323-D, Punjab University Employees Housing Scheme (II), Lahore, Pakistan
| | - Saeeda Zia
- Department of Sciences and Humanities, National University of Computer and Emerging Sciences, Lahore, Pakistan
| | - Abdul Rasheed Qureshi
- Out Patients Department-Pulmonology, Gulab Devi Chest Hospital, Ferozepur Road, Lahore, Pakistan
| |
Collapse
|
46
|
Cormier MJ, Belyeu JR, Pedersen BS, Brown J, Köster J, Quinlan AR. Go Get Data (GGD) is a framework that facilitates reproducible access to genomic data. Nat Commun 2021; 12:2151. [PMID: 33846313 PMCID: PMC8041854 DOI: 10.1038/s41467-021-22381-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 03/09/2021] [Indexed: 12/05/2022] Open
Abstract
The rapid increase in the amount of genomic data provides researchers with an opportunity to integrate diverse datasets and annotations when addressing a wide range of biological questions. However, genomic datasets are deposited on different platforms and are stored in numerous formats from multiple genome builds, which complicates the task of collecting, annotating, transforming, and integrating data as needed. Here, we developed Go Get Data (GGD) as a fast, reproducible approach to installing standardized data recipes. GGD is available on Github ( https://gogetdata.github.io/ ), is extendable to other data types, and can streamline the complexities typically associated with data integration, saving researchers time and improving research reproducibility.
Collapse
Affiliation(s)
- Michael J Cormier
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Jonathan R Belyeu
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Joseph Brown
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Johannes Köster
- Institute of Human Genetics, University of Duisburg-Essen, Essen, NRW, Germany
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
- Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA.
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
47
|
Direct Nanopore Sequencing of mRNA Reveals Landscape of Transcript Isoforms in Apicomplexan Parasites. mSystems 2021; 6:6/2/e01081-20. [PMID: 33688018 PMCID: PMC8561664 DOI: 10.1128/msystems.01081-20] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Alternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterized in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AU content of Plasmodium RNA, but also the limitations of short-read sequencing in deciphering complex splicing events. In this study, we utilized the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or nearly full-length transcripts with comparable quantification to Illumina sequencing. By comparing these data with available gene models, we find widespread alternative splicing, particularly intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This study highlights a strategy in using long-read sequencing to understand splicing events at the whole-transcript level and has implications in the future interpretation of transcriptome sequencing studies. IMPORTANCE We have used a novel nanopore sequencing technology to directly analyze parasite transcriptomes. The very long reads of this technology reveal the full-length genes of the parasites that cause malaria and toxoplasmosis. Gene transcripts must be processed in a process called splicing before they can be translated to protein. Our analysis reveals that these parasites very frequently only partially process their gene products, in a manner that departs dramatically from their human hosts.
Collapse
|
48
|
|
49
|
Contribution of Mitochondrial DNA Heteroplasmy to the Congenital Cardiac and Palatal Phenotypic Variability in Maternally Transmitted 22q11.2 Deletion Syndrome. Genes (Basel) 2021; 12:genes12010092. [PMID: 33450921 PMCID: PMC7828421 DOI: 10.3390/genes12010092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/24/2020] [Accepted: 01/11/2021] [Indexed: 11/25/2022] Open
Abstract
Congenital heart disease (CHD) and palatal anomalies (PA), are among the most common characteristics of 22q11.2 deletion syndrome (22q11.2DS), but they show incomplete penetrance, suggesting the presence of additional factors. The 22q11.2 deleted region contains nuclear encoded mitochondrial genes, and since mitochondrial function is critical during development, we hypothesized that changes in the mitochondrial DNA (mtDNA) could be involved in the intrafamilial variability of CHD and PA in cases of maternally inherited 22q11.2DS. To investigate this, we studied the transmission of heteroplasmic mtDNA alleles in seventeen phenotypically concordant and discordant mother-offspring 22q11.2DS pairs. We sequenced their mtDNA and identified 26 heteroplasmic variants at >1% frequency, representing 18 transmissions. The median allele frequency change between a mother and her child was twice as much, with a wider distribution range, in PA discordant pairs, p-value = 0.039 (permutation test, 11 concordant vs. 7 discordant variants), but not in CHD discordant pairs, p-value = 0.441 (9 vs. 9). Only the variant m.9507T>C was considered to be pathogenic, but it was unrelated to the structural phenotypes. Our study is novel, yet our results are not consistent with mtDNA variation contributing to PA or CHD in 22q11.2DS. Larger cohorts and additional factors should be considered moving forward.
Collapse
|
50
|
Folding Keratin Gene Clusters during Skin Regional Specification. Dev Cell 2021; 53:561-576.e9. [PMID: 32516596 DOI: 10.1016/j.devcel.2020.05.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 02/19/2020] [Accepted: 05/11/2020] [Indexed: 02/08/2023]
Abstract
Regional specification is critical for skin development, regeneration, and evolution. The contribution of epigenetics in this process remains unknown. Here, using avian epidermis, we find two major strategies regulate β-keratin gene clusters. (1) Over the body, macro-regional specificities (scales, feathers, claws, etc.) established by typical enhancers control five subclusters located within the epidermal differentiation complex on chromosome 25; (2) within a feather, micro-regional specificities are orchestrated by temporospatial chromatin looping of the feather β-keratin gene cluster on chromosome 27. Analyses suggest a three-factor model for regional specification: competence factors (e.g., AP1) make chromatin accessible, regional specifiers (e.g., Zic1) target specific genome regions, and chromatin regulators (e.g., CTCF and SATBs) establish looping configurations. Gene perturbations disrupt morphogenesis and histo-differentiation. This chicken skin paradigm advances our understanding of how regulation of big gene clusters can set up a two-dimensional body surface map.
Collapse
|