1
|
Prediction of polyspecificity from antibody sequence data by machine learning. FRONTIERS IN BIOINFORMATICS 2024; 3:1286883. [PMID: 38651055 PMCID: PMC11033685 DOI: 10.3389/fbinf.2023.1286883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/06/2023] [Indexed: 04/25/2024] Open
Abstract
Antibodies are generated with great diversity in nature resulting in a set of molecules, each optimized to bind a specific target. Taking advantage of their diversity and specificity, antibodies make up for a large part of recently developed biologic drugs. For therapeutic use antibodies need to fulfill several criteria to be safe and efficient. Polyspecific antibodies can bind structurally unrelated molecules in addition to their main target, which can lead to side effects and decreased efficacy in a therapeutic setting, for example via reduction of effective drug levels. Therefore, we created a neural-network-based model to predict polyspecificity of antibodies using the heavy chain variable region sequence as input. We devised a strategy for enriching antibodies from an immunization campaign either for antigen-specific or polyspecific binding properties, followed by generation of a large sequencing data set for training and cross-validation of the model. We identified important physico-chemical features influencing polyspecificity by investigating the behaviour of this model. This work is a machine-learning-based approach to polyspecificity prediction and, besides increasing our understanding of polyspecificity, it might contribute to therapeutic antibody development.
Collapse
|
2
|
ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis. PLoS One 2016; 11:e0149263. [PMID: 26882475 PMCID: PMC4801062 DOI: 10.1371/journal.pone.0149263] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/30/2016] [Indexed: 12/20/2022] Open
Abstract
Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/.
Collapse
|
3
|
JSBML 1.0: providing a smorgasbord of options to encode systems biology models. Bioinformatics 2015; 31:3383-6. [PMID: 26079347 PMCID: PMC4595895 DOI: 10.1093/bioinformatics/btv341] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 05/29/2015] [Indexed: 11/16/2022] Open
Abstract
Summary: JSBML, the official pure Java programming library for the Systems Biology Markup Language (SBML) format, has evolved with the advent of different modeling formalisms in systems biology and their ability to be exchanged and represented via extensions of SBML. JSBML has matured into a major, active open-source project with contributions from a growing, international team of developers who not only maintain compatibility with SBML, but also drive steady improvements to the Java interface and promote ease-of-use with end users. Availability and implementation: Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML. More information about JSBML can be found in the user guide at http://sbml.org/Software/JSBML/docs/. Contact:jsbml-development@googlegroups.com or andraeger@eng.ucsd.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
4
|
Evaluation of toxicogenomics approaches for assessing the risk of nongenotoxic carcinogenicity in rat liver. PLoS One 2014; 9:e97678. [PMID: 24828355 PMCID: PMC4020844 DOI: 10.1371/journal.pone.0097678] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 04/22/2014] [Indexed: 02/03/2023] Open
Abstract
The current gold-standard method for cancer safety assessment of drugs is a rodent two-year bioassay, which is associated with significant costs and requires testing a high number of animals over lifetime. Due to the absence of a comprehensive set of short-term assays predicting carcinogenicity, new approaches are currently being evaluated. One promising approach is toxicogenomics, which by virtue of genome-wide molecular profiling after compound treatment can lead to an increased mechanistic understanding, and potentially allow for the prediction of a carcinogenic potential via mathematical modeling. The latter typically involves the extraction of informative genes from omics datasets, which can be used to construct generalizable models allowing for the early classification of compounds with unknown carcinogenic potential. Here we formally describe and compare two novel methodologies for the reproducible extraction of characteristic mRNA signatures, which were employed to capture specific gene expression changes observed for nongenotoxic carcinogens. While the first method integrates multiple gene rankings, generated by diverse algorithms applied to data from different subsamplings of the training compounds, the second approach employs a statistical ratio for the identification of informative genes. Both methods were evaluated on a dataset obtained from the toxicogenomics database TG-GATEs to predict the outcome of a two-year bioassay based on profiles from 14-day treatments. Additionally, we applied our methods to datasets from previous studies and showed that the derived prediction models are on average more accurate than those built from the original signatures. The selected genes were mostly related to p53 signaling and to specific changes in anabolic processes or energy metabolism, which are typically observed in tumor cells. Among the genes most frequently incorporated into prediction models were Phlda3, Cdkn1a, Akr7a3, Ccng1 and Abcb4.
Collapse
|
5
|
Ha-ras and β-catenin oncoproteins orchestrate metabolic programs in mouse liver tumors. Int J Cancer 2014; 135:1574-85. [PMID: 24535843 DOI: 10.1002/ijc.28798] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 02/06/2014] [Indexed: 01/08/2023]
Abstract
The process of hepatocarcinogenesis in the diethylnitrosamine (DEN) initiation/phenobarbital (PB) promotion mouse model involves the selective clonal outgrowth of cells harboring oncogene mutations in Ctnnb1, while spontaneous or DEN-only-induced tumors are often Ha-ras- or B-raf-mutated. The molecular mechanisms and pathways underlying these different tumor sub-types are not well characterized. Their identification may help identify markers for xenobiotic promoted versus spontaneously occurring liver tumors. Here, we have characterized mouse liver tumors harboring either Ctnnb1 or Ha-ras mutations via integrated molecular profiling at the transcriptional, translational and post-translational levels. In addition, metabolites of the intermediary metabolism were quantified by high resolution (1)H magic angle nuclear magnetic resonance. We have identified tumor genotype-specific differences in mRNA and miRNA expression, protein levels, post-translational modifications, and metabolite levels that facilitate the molecular and biochemical stratification of tumor phenotypes. Bioinformatic integration of these data at the pathway level led to novel insights into tumor genotype-specific aberrant cell signaling and in particular to a better understanding of alterations in pathways of the cell intermediary metabolism, which are driven by the constitutive activation of the β-Catenin and Ha-ras oncoproteins in tumors of the two genotypes.
Collapse
|
6
|
TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One 2013; 8:e82238. [PMID: 24349230 PMCID: PMC3861411 DOI: 10.1371/journal.pone.0082238] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 10/21/2013] [Indexed: 11/18/2022] Open
Abstract
One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.
Collapse
|
7
|
Parkinson's disease: dopaminergic nerve cell model is consistent with experimental finding of increased extracellular transport of α-synuclein. BMC Neurosci 2013; 14:136. [PMID: 24195591 PMCID: PMC3871002 DOI: 10.1186/1471-2202-14-136] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 10/28/2013] [Indexed: 12/13/2022] Open
Abstract
Background Parkinson’s disease is an age-related disease whose pathogenesis is not completely known. Animal models exist for investigating the disease but not all results can be easily transferred to humans. Therefore, mathematical or probabilistic models for the human disease are to be constructed in silico in order to predict specific processes within a cell, such as the dopamine metabolism and transport processes in a neuron. Results We present a Systems Biology Markup Language (SBML) model of a whole dopaminergic nerve cell consisting of 139 reactions and 111 metabolites which includes, among others, the dopamine metabolism and transport, oxidative stress, aggregation of α-synuclein (αSYN), lysosomal and proteasomal degradation, and mitophagy. The predictive power of the model was investigated using flux balance analysis for the identification of steady model states. To this end, we performed six experiments: (i) investigation of the normal cell behavior, (ii) increase of O2, (iii) increase of ATP, (iv) influence of neurotoxins, (v) increase of αSYN in the cell, and (vi) increase of dopamine synthesis. The SBML model is available in the BioModels database with identifier MODEL1302200000. Conclusion It is possible to simulate the normal behavior of an in vivo nerve cell with the developed model. We show that the model is sensitive for neurotoxins and oxidative stress. Further, an increased level of αSYN induces apoptosis and an increased flux of αSYN to the extracellular space was observed.
Collapse
|
8
|
Path2Models: large-scale generation of computational models from biochemical pathway maps. BMC SYSTEMS BIOLOGY 2013; 7:116. [PMID: 24180668 PMCID: PMC4228421 DOI: 10.1186/1752-0509-7-116] [Citation(s) in RCA: 126] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 10/23/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND Systems biology projects and omics technologies have led to a growing number of biochemical pathway models and reconstructions. However, the majority of these models are still created de novo, based on literature mining and the manual processing of pathway data. RESULTS To increase the efficiency of model creation, the Path2Models project has automatically generated mathematical models from pathway representations using a suite of freely available software. Data sources include KEGG, BioCarta, MetaCyc and SABIO-RK. Depending on the source data, three types of models are provided: kinetic, logical and constraint-based. Models from over 2 600 organisms are encoded consistently in SBML, and are made freely available through BioModels Database at http://www.ebi.ac.uk/biomodels-main/path2models. Each model contains the list of participants, their interactions, the relevant mathematical constructs, and initial parameter values. Most models are also available as easy-to-understand graphical SBGN maps. CONCLUSIONS To date, the project has resulted in more than 140 000 freely available models. Such a resource can tremendously accelerate the development of mathematical models by providing initial starting models for simulation and analysis, which can be subsequently curated and further parameterized.
Collapse
|
9
|
Integrative pathway-based approach for genome-wide association studies: identification of new pathways for rheumatoid arthritis and type 1 diabetes. PLoS One 2013; 8:e78577. [PMID: 24205270 PMCID: PMC3808349 DOI: 10.1371/journal.pone.0078577] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 09/14/2013] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies (GWAS) led to the identification of numerous novel loci for a number of complex diseases. Pathway-based approaches using genotypic data provide tangible leads which cannot be identified by single marker approaches as implemented in GWAS. The available pathway analysis approaches mainly differ in the employed databases and in the applied statistics for determining the significance of the associated disease markers. So far, pathway-based approaches using GWAS data failed to consider the overlapping of genes among different pathways or the influence of protein–interactions. We performed a multistage integrative pathway (MIP) analysis on three common diseases - Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) - incorporating genotypic, pathway, protein- and domain-interaction data to identify novel associations between these diseases and pathways. Additionally, we assessed the sensitivity of our method by studying the influence of the most significant SNPs on the pathway analysis by removing those and comparing the corresponding pathway analysis results. Apart from confirming many previously published associations between pathways and RA, CD and T1D, our MIP approach was able to identify three new associations between disease phenotypes and pathways. This includes a relation between the influenza-A pathway and RA, as well as a relation between T1D and the phagosome and toxoplasmosis pathways. These results provide new leads to understand the molecular underpinnings of these diseases. The developed software herein used is available at http://www.cogsys.cs.uni-tuebingen.de/software/GWASPathwayIdentifier/index.htm.
Collapse
|
10
|
Precise generation of systems biology models from KEGG pathways. BMC SYSTEMS BIOLOGY 2013; 7:15. [PMID: 23433509 PMCID: PMC3623889 DOI: 10.1186/1752-0509-7-15] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 01/25/2013] [Indexed: 11/10/2022]
Abstract
Background The KEGG PATHWAY database provides a plethora of pathways for a diversity of organisms. All pathway components are directly linked to other KEGG databases, such as KEGG COMPOUND or KEGG REACTION. Therefore, the pathways can be extended with an enormous amount of information and provide a foundation for initial structural modeling approaches. As a drawback, KGML-formatted KEGG pathways are primarily designed for visualization purposes and often omit important details for the sake of a clear arrangement of its entries. Thus, a direct conversion into systems biology models would produce incomplete and erroneous models. Results Here, we present a precise method for processing and converting KEGG pathways into initial metabolic and signaling models encoded in the standardized community pathway formats SBML (Levels 2 and 3) and BioPAX (Levels 2 and 3). This method involves correcting invalid or incomplete KGML content, creating complete and valid stoichiometric reactions, translating relations to signaling models and augmenting the pathway content with various information, such as cross-references to Entrez Gene, OMIM, UniProt ChEBI, and many more. Finally, we compare several existing conversion tools for KEGG pathways and show that the conversion from KEGG to BioPAX does not involve a loss of information, whilst lossless translations to SBML can only be performed using SBML Level 3, including its recently proposed qualitative models and groups extension packages. Conclusions Building correct BioPAX and SBML signaling models from the KEGG database is a unique characteristic of the proposed method. Further, there is no other approach that is able to appropriately construct metabolic models from KEGG pathways, including correct reactions with stoichiometry. The resulting initial models, which contain valid and comprehensive SBML or BioPAX code and a multitude of cross-references, lay the foundation to facilitate further modeling steps.
Collapse
|
11
|
Abstract
SUMMARY Microarrays are commonly used to detect changes in gene expression between different biological samples. For this purpose, many analysis tools have been developed that offer visualization, statistical analysis and more sophisticated analysis methods. Most of these tools are designed specifically for messenger RNA microarrays. However, today, more and more different microarray platforms are available. Changes in DNA methylation, microRNA expression or even protein phosphorylation states can be detected with specialized arrays. For these microarray technologies, the number of available tools is small compared with mRNA analysis tools. Especially, a joint analysis of different microarray platforms that have been used on the same set of biological samples is hardly supported by most microarray analysis tools. Here, we present InCroMAP, a tool for the analysis and visualization of high-level microarray data from individual or multiple different platforms. Currently, InCroMAP supports mRNA, microRNA, DNA methylation and protein modification datasets. Several methods are offered that allow for an integrated analysis of data from those platforms. The available features of InCroMAP range from visualization of DNA methylation data over annotation of microRNA targets and integrated gene set enrichment analysis to a joint visualization of data from all platforms in the context of metabolic or signalling pathways. AVAILABILITY InCroMAP is freely available as Java™ application at www.cogsys.cs.uni-tuebingen.de/software/InCroMAP, including a comprehensive user's guide and example files.
Collapse
|
12
|
Identification of Dlk1-Dio3 imprinted gene cluster noncoding RNAs as novel candidate biomarkers for liver tumor promotion. Toxicol Sci 2012; 131:375-86. [PMID: 23091169 DOI: 10.1093/toxsci/kfs303] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The molecular events during nongenotoxic carcinogenesis and their temporal order are poorly understood but thought to include long-lasting perturbations of gene expression. Here, we have investigated the temporal sequence of molecular and pathological perturbations at early stages of phenobarbital (PB) mediated liver tumor promotion in vivo. Molecular profiling (mRNA, microRNA [miRNA], DNA methylation, and proteins) of mouse liver during 13 weeks of PB treatment revealed progressive increases in hepatic expression of long noncoding RNAs and miRNAs originating from the Dlk1-Dio3 imprinted gene cluster, a locus that has recently been associated with stem cell pluripotency in mice and various neoplasms in humans. PB induction of the Dlk1-Dio3 cluster noncoding RNA (ncRNA) Meg3 was localized to glutamine synthetase-positive hypertrophic perivenous hepatocytes, suggesting a role for β-catenin signaling in the dysregulation of Dlk1-Dio3 ncRNAs. The carcinogenic relevance of Dlk1-Dio3 locus ncRNA induction was further supported by in vivo genetic dependence on constitutive androstane receptor and β-catenin pathways. Our data identify Dlk1-Dio3 ncRNAs as novel candidate early biomarkers for mouse liver tumor promotion and provide new opportunities for assessing the carcinogenic potential of novel compounds.
Collapse
|
13
|
|
14
|
Abstract
Motivation: The biological pathway exchange language (BioPAX) and the systems biology markup language (SBML) belong to the most popular modeling and data exchange languages in systems biology. The focus of SBML is quantitative modeling and dynamic simulation of models, whereas the BioPAX specification concentrates mainly on visualization and qualitative analysis of pathway maps. BioPAX describes reactions and relations. In contrast, SBML core exclusively describes quantitative processes such as reactions. With the SBML qualitative models extension (qual), it has recently also become possible to describe relations in SBML. Before the development of SBML qual, relations could not be properly translated into SBML. Until now, there exists no BioPAX to SBML converter that is fully capable of translating both reactions and relations. Results: The entire nature pathway interaction database has been converted from BioPAX (Level 2 and Level 3) into SBML (Level 3 Version 1) including both reactions and relations by using the new qual extension package. Additionally, we present the new webtool BioPAX2SBML for further BioPAX to SBML conversions. Compared with previous conversion tools, BioPAX2SBML is more comprehensive, more robust and more exact. Availability: BioPAX2SBML is freely available at http://webservices.cs.uni-tuebingen.de/ and the complete collection of the PID models is available at http://www.cogsys.cs.uni-tuebingen.de/downloads/Qualitative-Models/. Contact:finja.buechel@uni-tuebingen.de Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
|
15
|
Inferring statin-induced gene regulatory relationships in primary human hepatocytes. Bioinformatics 2011; 27:2473-7. [DOI: 10.1093/bioinformatics/btr416] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
16
|
Abstract
Summary: The KEGG PATHWAY database provides a widely used service for metabolic and nonmetabolic pathways. It contains manually drawn pathway maps with information about the genes, reactions and relations contained therein. To store these pathways, KEGG uses KGML, a proprietary XML-format. Parsers and translators are needed to process the pathway maps for usage in other applications and algorithms. We have developed KEGGtranslator, an easy-to-use stand-alone application that can visualize and convert KGML formatted XML-files into multiple output formats. Unlike other translators, KEGGtranslator supports a plethora of output formats, is able to augment the information in translated documents (e.g. MIRIAM annotations) beyond the scope of the KGML document, and amends missing components to fragmentary reactions within the pathway to allow simulations on those. Availability: KEGGtranslator is freely available as a Java™ Web Start application and for download at http://www.cogsys.cs.uni-tuebingen.de/software/KEGGtranslator/. KGML files can be downloaded from within the application. Contact:clemens.wrzodek@uni-tuebingen.de Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
17
|
Abstract
Summary: The specifications of the Systems Biology Markup Language (SBML) define standards for storing and exchanging computer models of biological processes in text files. In order to perform model simulations, graphical visualizations and other software manipulations, an in-memory representation of SBML is required. We developed JSBML for this purpose. In contrast to prior implementations of SBML APIs, JSBML has been designed from the ground up for the Java™ programming language, and can therefore be used on all platforms supported by a Java Runtime Environment. This offers important benefits for Java users, including the ability to distribute software as Java Web Start applications. JSBML supports all SBML Levels and Versions through Level 3 Version 1, and we have strived to maintain the highest possible degree of compatibility with the popular library libSBML. JSBML also supports modules that can facilitate the development of plugins for end user applications, as well as ease migration from a libSBML-based backend. Availability: Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML. Contact:jsbml-team@sbml.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|