1
|
Glez-Peña D, Gómez-Blanco D, Reboiro-Jato M, Fdez-Riverola F, Posada D. ALTER: program-oriented conversion of DNA and protein alignments. Nucleic Acids Res 2010; 38:W14-8. [PMID: 20439312 PMCID: PMC2896128 DOI: 10.1093/nar/gkq321] [Citation(s) in RCA: 306] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2010] [Revised: 04/05/2010] [Accepted: 04/17/2010] [Indexed: 11/14/2022] Open
Abstract
ALTER is an open web-based tool to transform between different multiple sequence alignment formats. The originality of ALTER lies in the fact that it focuses on the specifications of mainstream alignment and analysis programs rather than on the conversion among more or less specific formats. In addition, ALTER is capable of identify and remove identical sequences during the transformation process. Besides its user-friendly environment, ALTER allows access to its functionalities in a programmatic way through a Representational State Transfer web service. ALTER's front-end and its API are freely available at http://sing.ei.uvigo.es/ALTER/ and http://sing.ei.uvigo.es/ALTER/api/, respectively.
Collapse
|
research-article |
15 |
306 |
2
|
López-Fernández H, Santos HM, Capelo JL, Fdez-Riverola F, Glez-Peña D, Reboiro-Jato M. Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery. BMC Bioinformatics 2015; 16:318. [PMID: 26437641 PMCID: PMC4595311 DOI: 10.1186/s12859-015-0752-4] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Accepted: 09/28/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mass spectrometry is one of the most important techniques in the field of proteomics. MALDI-TOF mass spectrometry has become popular during the last decade due to its high speed and sensitivity for detecting proteins and peptides. MALDI-TOF-MS can be also used in combination with Machine Learning techniques and statistical methods for knowledge discovery. Although there are many software libraries and tools that can be combined for these kind of analysis, there is still a need for all-in-one solutions with graphical user-friendly interfaces and avoiding the need of programming skills. RESULTS Mass-Up, an open software multiplatform application for MALDI-TOF-MS knowledge discovery is herein presented. Mass-Up software allows data preprocessing, as well as subsequent analysis including (i) biomarker discovery, (ii) clustering, (iii) biclustering, (iv) three-dimensional PCA visualization and (v) classification of large sets of spectra data. CONCLUSIONS Mass-Up brings knowledge discovery within reach of MALDI-TOF-MS researchers. Mass-Up is distributed under license GPLv3 and it is open and free to all users at http://sing.ei.uvigo.es/mass-up.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
69 |
3
|
Piñeiro-Yáñez E, Reboiro-Jato M, Gómez-López G, Perales-Patón J, Troulé K, Rodríguez JM, Tejero H, Shimamura T, López-Casas PP, Carretero J, Valencia A, Hidalgo M, Glez-Peña D, Al-Shahrour F. PanDrugs: a novel method to prioritize anticancer drug treatments according to individual genomic data. Genome Med 2018; 10:41. [PMID: 29848362 PMCID: PMC5977747 DOI: 10.1186/s13073-018-0546-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 05/04/2018] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Large-sequencing cancer genome projects have shown that tumors have thousands of molecular alterations and their frequency is highly heterogeneous. In such scenarios, physicians and oncologists routinely face lists of cancer genomic alterations where only a minority of them are relevant biomarkers to drive clinical decision-making. For this reason, the medical community agrees on the urgent need of methodologies to establish the relevance of tumor alterations, assisting in genomic profile interpretation, and, more importantly, to prioritize those that could be clinically actionable for cancer therapy. RESULTS We present PanDrugs, a new computational methodology to guide the selection of personalized treatments in cancer patients using the variant lists provided by genome-wide sequencing analyses. PanDrugs offers the largest database of drug-target associations available from well-known targeted therapies to preclinical drugs. Scoring data-driven gene cancer relevance and drug feasibility PanDrugs interprets genomic alterations and provides a prioritized evidence-based list of anticancer therapies. Our tool represents the first drug prescription strategy applying a rational based on pathway context, multi-gene markers impact and information provided by functional experiments. Our approach has been systematically applied to TCGA patients and successfully validated in a cancer case study with a xenograft mouse model demonstrating its utility. CONCLUSIONS PanDrugs is a feasible method to identify potentially druggable molecular alterations and prioritize drugs to facilitate the interpretation of genomic landscape and clinical decision-making in cancer patients. Our approach expands the search of druggable genomic alterations from the concept of cancer driver genes to the druggable pathway context extending anticancer therapeutic options beyond already known cancer genes. The methodology is public and easily integratable with custom pipelines through its programmatic API or its docker image. The PanDrugs webtool is freely accessible at http://www.pandrugs.org .
Collapse
|
research-article |
7 |
49 |
4
|
Graña O, Rubio-Camarillo M, Fdez-Riverola F, Pisano D, Glez-Peña D. Nextpresso: Next Generation Sequencing Expression Analysis Pipeline. Curr Bioinform 2018. [DOI: 10.2174/1574893612666170810153850] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
|
7 |
39 |
5
|
Glez-Peña D, Díaz F, Hernández JM, Corchado JM, Fdez-Riverola F. geneCBR: a translational tool for multiple-microarray analysis and integrative information retrieval for aiding diagnosis in cancer research. BMC Bioinformatics 2009; 10:187. [PMID: 19538727 PMCID: PMC2703634 DOI: 10.1186/1471-2105-10-187] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 06/18/2009] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Bioinformatics and medical informatics are two research fields that serve the needs of different but related communities. Both domains share the common goal of providing new algorithms, methods and technological solutions to biomedical research, and contributing to the treatment and cure of diseases. Although different microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is still a long way off. Within this context, case-based reasoning emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems, given the support and collaboration involved in such a translational development. With the goals of removing barriers against multi-disciplinary collaboration and facilitating the dissemination and transfer of knowledge to real practice, case-based reasoning systems have the potential to be applied to translational research mainly because their computational reasoning paradigm is similar to the way clinicians gather, analyze and process information in their own practice of clinical medicine. RESULTS In addressing the issue of bridging the existing gap between biomedical researchers and clinicians who work in the domain of cancer diagnosis, prognosis and treatment, we have developed and made accessible a common interactive framework. Our geneCBR system implements a freely available software tool that allows the use of combined techniques that can be applied to gene selection, clustering, knowledge extraction and prediction for aiding diagnosis in cancer research. For biomedical researches, geneCBR expert mode offers a core workbench for designing and testing new techniques and experiments. For pathologists or oncologists, geneCBR diagnostic mode implements an effective and reliable system that can diagnose cancer subtypes based on the analysis of microarray data using a CBR architecture. For programmers, geneCBR programming mode includes an advanced edition module for run-time modification of previous coded techniques. CONCLUSION geneCBR is a new translational tool that can effectively support the integrative work of programmers, biomedical researches and clinicians working together in a common framework. The code is freely available under the GPL license and can be obtained at http://www.genecbr.org.
Collapse
|
product-review |
16 |
27 |
6
|
Glez-Peña D, Lourenço A, López-Fernández H, Reboiro-Jato M, Fdez-Riverola F. Web scraping technologies in an API world. Brief Bioinform 2013; 15:788-97. [PMID: 23632294 DOI: 10.1093/bib/bbt026] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents, is still in position to offer a valid and valuable service to a wide range of bioinformatics applications, ranging from simple extraction robots to online meta-servers. This article reviews existing scraping frameworks and tools, identifying their strengths and limitations in terms of extraction capabilities. The main focus is set on showing how straightforward it is today to set up a data scraping pipeline, with minimal programming effort, and answer a number of practical needs. For exemplification purposes, we introduce a biomedical data extraction scenario where the desired data sources, well-known in clinical microbiology and similar domains, do not offer programmatic interfaces yet. Moreover, we describe the operation of WhichGenes and PathJam, two bioinformatics meta-servers that use scraping as means to cope with gene set enrichment analysis.
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
26 |
7
|
López-Cortés R, Oliveira E, Núñez C, Lodeiro C, Páez de la Cadena M, Fdez-Riverola F, López-Fernández H, Reboiro-Jato M, Glez-Peña D, Luis Capelo J, Santos HM. Fast human serum profiling through chemical depletion coupled to gold-nanoparticle-assisted protein separation. Talanta 2012; 100:239-45. [DOI: 10.1016/j.talanta.2012.08.020] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2012] [Revised: 08/09/2012] [Accepted: 08/13/2012] [Indexed: 01/23/2023]
|
|
13 |
25 |
8
|
Glez-Peña D, Gómez-López G, Pisano DG, Fdez-Riverola F. WhichGenes: a web-based tool for gathering, building, storing and exporting gene sets with application in gene set enrichment analysis. Nucleic Acids Res 2009; 37:W329-34. [PMID: 19406925 PMCID: PMC2703947 DOI: 10.1093/nar/gkp263] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
WhichGenes is a web-based interactive gene set building tool offering a very simple interface to extract always-updated gene lists from multiple databases and unstructured biological data sources. While the user can specify new gene sets of interest by following a simple four-step wizard, the tool is able to run several queries in parallel. Every time a new set is generated, it is automatically added to the private gene-set cart and the user is notified by an e-mail containing a direct link to the new set stored in the server. WhichGenes provides functionalities to edit, delete and rename existing sets as well as the capability of generating new ones by combining previous existing sets (intersection, union and difference operators). The user can export his sets configuring the output format and selecting among multiple gene identifiers. In addition to the user-friendly environment, WhichGenes allows programmers to access its functionalities in a programmatic way through a Representational State Transfer web service. WhichGenes front-end is freely available at http://www.whichgenes.org/, WhichGenes API is accessible at http://www.whichgenes.org/api/.
Collapse
|
Research Support, Non-U.S. Gov't |
16 |
23 |
9
|
Glez-Peña D, Reboiro-Jato M, Maia P, Rocha M, Díaz F, Fdez-Riverola F. AIBench: a rapid application development framework for translational research in biomedicine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2010; 98:191-203. [PMID: 20047774 DOI: 10.1016/j.cmpb.2009.12.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2009] [Revised: 11/11/2009] [Accepted: 12/10/2009] [Indexed: 05/28/2023]
Abstract
Applied research in both biomedical discovery and translational medicine today often requires the rapid development of fully featured applications containing both advanced and specific functionalities, for real use in practice. In this context, new tools are demanded that allow for efficient generation, deployment and reutilization of such biomedical applications as well as their associated functionalities. In this context this paper presents AIBench, an open-source Java desktop application framework for scientific software development with the goal of providing support to both fundamental and applied research in the domain of translational biomedicine. AIBench incorporates a powerful plug-in engine, a flexible scripting platform and takes advantage of Java annotations, reflection and various design principles in order to make it easy to use, lightweight and non-intrusive. By following a basic input-processing-output life cycle, it is possible to fully develop multiplatform applications using only three types of concepts: operations, data-types and views. The framework automatically provides functionalities that are present in a typical scientific application including user parameter definition, logging facilities, multi-threading execution, experiment repeatability and user interface workflow management, among others. The proposed framework architecture defines a reusable component model which also allows assembling new applications by the reuse of libraries from past projects or third-party software.
Collapse
|
|
15 |
21 |
10
|
López-Fernández H, de S Pessôa G, Arruda MAZ, Capelo-Martínez JL, Fdez-Riverola F, Glez-Peña D, Reboiro-Jato M. LA-iMageS: a software for elemental distribution bioimaging using LA-ICP-MS data. J Cheminform 2016; 8:65. [PMID: 27917244 PMCID: PMC5116144 DOI: 10.1186/s13321-016-0178-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 11/10/2016] [Indexed: 11/18/2022] Open
Abstract
The spatial distribution of chemical elements in different types of samples is an important field in several research areas such as biology, paleontology or biomedicine, among others. Elemental distribution imaging by laser ablation inductively coupled plasma mass spectrometry (LA–ICP–MS) is an effective technique for qualitative and quantitative imaging due to its high spatial resolution and sensitivity. By applying this technique, vast amounts of raw data are generated to obtain high-quality images, essentially making the use of specific LA–ICP–MS imaging software that can process such data absolutely mandatory. Since existing solutions are usually commercial or hard-to-use for average users, this work introduces LA-iMageS, an open-source, free-to-use multiplatform application for fast and automatic generation of high-quality elemental distribution bioimages from LA–ICP–MS data in the PerkinElmer Elan XL format, whose results can be directly exported to external applications for further analysis. A key strength of LA-iMageS is its substantial added value for users, with particular regard to the customization of the elemental distribution bioimages, which allows, among other features, the ability to change color maps, increase image resolution or toggle between 2D and 3D visualizations.
Collapse
|
Journal Article |
9 |
21 |
11
|
Graña O, López-Fernández H, Fdez-Riverola F, González Pisano D, Glez-Peña D. Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data. Bioinformatics 2019; 34:1414-1415. [PMID: 29211825 DOI: 10.1093/bioinformatics/btx778] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 11/30/2017] [Indexed: 11/13/2022] Open
Abstract
Summary High-throughput sequencing of bisulfite-converted DNA is a technique used to measure DNA methylation levels. Although a considerable number of computational pipelines have been developed to analyze such data, none of them tackles all the peculiarities of the analysis together, revealing limitations that can force the user to manually perform additional steps needed for a complete processing of the data. This article presents bicycle, an integrated, flexible analysis pipeline for bisulfite sequencing data. Bicycle analyzes whole genome bisulfite sequencing data, targeted bisulfite sequencing data and hydroxymethylation data. To show how bicycle overtakes other available pipelines, we compared them on a defined number of features that are summarized in a table. We also tested bicycle with both simulated and real datasets, to show its level of performance, and compared it to different state-of-the-art methylation analysis pipelines. Availability and implementation Bicycle is publicly available under GNU LGPL v3.0 license at http://www.sing-group.org/bicycle. Users can also download a customized Ubuntu LiveCD including bicycle and other bisulfite sequencing data pipelines compared here. In addition, a docker image with bicycle and its dependencies, which allows a straightforward use of bicycle in any platform (e.g. Linux, OS X or Windows), is also available. Contact ograna@cnio.es or dgpena@uvigo.es. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
Research Support, Non-U.S. Gov't |
6 |
19 |
12
|
Lourenço A, Carreira R, Carneiro S, Maia P, Glez-Peña D, Fdez-Riverola F, Ferreira EC, Rocha I, Rocha M. @Note: a workbench for biomedical text mining. J Biomed Inform 2009; 42:710-20. [PMID: 19393341 DOI: 10.1016/j.jbi.2009.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2008] [Revised: 02/16/2009] [Accepted: 04/07/2009] [Indexed: 10/20/2022]
Abstract
Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists' needs is crucial to solve real-world problems and promote further research. We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation. @Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.
Collapse
|
Research Support, Non-U.S. Gov't |
16 |
15 |
13
|
Estévez O, Anibarro L, Garet E, Pallares Á, Barcia L, Calviño L, Maueia C, Mussá T, Fdez-Riverola F, Glez-Peña D, Reboiro-Jato M, López-Fernández H, Fonseca NA, Reljic R, González-Fernández Á. An RNA-seq Based Machine Learning Approach Identifies Latent Tuberculosis Patients With an Active Tuberculosis Profile. Front Immunol 2020; 11:1470. [PMID: 32760401 PMCID: PMC7372107 DOI: 10.3389/fimmu.2020.01470] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 06/05/2020] [Indexed: 12/17/2022] Open
Abstract
A better understanding of the response against Tuberculosis (TB) infection is required to accurately identify the individuals with an active or a latent TB infection (LTBI) and also those LTBI patients at higher risk of developing active TB. In this work, we have used the information obtained from studying the gene expression profile of active TB patients and their infected –LTBI- or uninfected –NoTBI- contacts, recruited in Spain and Mozambique, to build a class-prediction model that identifies individuals with a TB infection profile. Following this approach, we have identified several genes and metabolic pathways that provide important information of the immune mechanisms triggered against TB infection. As a novelty of our work, a combination of this class-prediction model and the direct measurement of different immunological parameters, was used to identify a subset of LTBI contacts (called TB-like) whose transcriptional and immunological profiles are suggestive of infection with a higher probability of developing active TB. Validation of this novel approach to identifying LTBI individuals with the highest risk of active TB disease merits further longitudinal studies on larger cohorts in TB endemic areas.
Collapse
|
Research Support, Non-U.S. Gov't |
5 |
14 |
14
|
Pérez-Pérez M, Glez-Peña D, Fdez-Riverola F, Lourenço A. Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 118:242-251. [PMID: 25480679 DOI: 10.1016/j.cmpb.2014.11.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Revised: 10/24/2014] [Accepted: 11/18/2014] [Indexed: 06/04/2023]
Abstract
BACKGROUND AND OBJECTIVES Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. METHODS At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. RESULTS Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. CONCLUSIONS Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky.
Collapse
|
|
10 |
10 |
15
|
Glez-Peña D, Álvarez R, Díaz F, Fdez-Riverola F. DFP: a Bioconductor package for fuzzy profile identification and gene reduction of microarray data. BMC Bioinformatics 2009; 10:37. [PMID: 19178723 PMCID: PMC2637236 DOI: 10.1186/1471-2105-10-37] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2008] [Accepted: 01/29/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Expression profiling assays done by using DNA microarray technology generate enormous data sets that are not amenable to simple analysis. The greatest challenge in maximizing the use of this huge amount of data is to develop algorithms to interpret and interconnect results from different genes under different conditions. In this context, fuzzy logic can provide a systematic and unbiased way to both (i) find biologically significant insights relating to meaningful genes, thereby removing the need for expert knowledge in preliminary steps of microarray data analyses and (ii) reduce the cost and complexity of later applied machine learning techniques being able to achieve interpretable models. RESULTS DFP is a new Bioconductor R package that implements a method for discretizing and selecting differentially expressed genes based on the application of fuzzy logic. DFP takes advantage of fuzzy membership functions to assign linguistic labels to gene expression levels. The technique builds a reduced set of relevant genes (FP, Fuzzy Pattern) able to summarize and represent each underlying class (pathology). A last step constructs a biased set of genes (DFP, Discriminant Fuzzy Pattern) by intersecting existing fuzzy patterns in order to detect discriminative elements. In addition, the software provides new functions and visualisation tools that summarize achieved results and aid in the interpretation of differentially expressed genes from multiple microarray experiments. CONCLUSION DFP integrates with other packages of the Bioconductor project, uses common data structures and is accompanied by ample documentation. It has the advantage that its parameters are highly configurable, facilitating the discovery of biologically relevant connections between sets of genes belonging to different pathologies. This information makes it possible to automatically filter irrelevant genes thereby reducing the large volume of data supplied by microarray experiments. Based on these contributions GENECBR, a successful tool for cancer diagnosis using microarray datasets, has recently been released.
Collapse
|
product-review |
16 |
10 |
16
|
Santos HM, Glez-Peña D, Reboiro-Jato M, Fdez-Riverola F, Diniz MS, Lodeiro C, Capelo-Martínez JL. A novel 18O inverse labeling-based workflow for accurate bottom-up mass spectrometry quantification of proteins separated by gel electrophoresis. Electrophoresis 2010; 31:3407-19. [DOI: 10.1002/elps.201000251] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
|
15 |
9 |
17
|
López-Fernández H, Reboiro-Jato M, Glez-Peña D, Aparicio F, Gachet D, Buenaga M, Fdez-Riverola F. BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 111:139-147. [PMID: 23562645 DOI: 10.1016/j.cmpb.2013.03.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 12/29/2012] [Accepted: 03/12/2013] [Indexed: 06/02/2023]
Abstract
Automatic term annotation from biomedical documents and external information linking are becoming a necessary prerequisite in modern computer-aided medical learning systems. In this context, this paper presents BioAnnote, a flexible and extensible open-source platform for automatically annotating biomedical resources. Apart from other valuable features, the software platform includes (i) a rich client enabling users to annotate multiple documents in a user friendly environment, (ii) an extensible and embeddable annotation meta-server allowing for the annotation of documents with local or remote vocabularies and (iii) a simple client/server protocol which facilitates the use of our meta-server from any other third-party application. In addition, BioAnnote implements a powerful scripting engine able to perform advanced batch annotations.
Collapse
|
|
12 |
8 |
18
|
Troulé K, López-Fernández H, García-Martín S, Reboiro-Jato M, Carretero-Puche C, Martorell-Marugán J, Martín-Serrano G, Carmona-Sáez P, Glez-Peña D, Al-Shahrour F, Gómez-López G. DREIMT: a drug repositioning database and prioritization tool for immunomodulation. Bioinformatics 2021; 37:578-579. [PMID: 32818254 DOI: 10.1093/bioinformatics/btaa727] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/22/2020] [Accepted: 08/16/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Drug immunomodulation modifies the response of the immune system and can be therapeutically exploited in pathologies such as cancer and autoimmune diseases. RESULTS DREIMT is a new hypothesis-generation web tool, which performs drug prioritization analysis for immunomodulation. DREIMT provides significant immunomodulatory drugs targeting up to 70 immune cells subtypes through a curated database that integrates 4960 drug profiles and ∼2600 immune gene expression signatures. The tool also suggests potential immunomodulatory drugs targeting user-supplied gene expression signatures. Final output includes drug-signature association scores, FDRs and downloadable plots and results tables. AVAILABILITYAND IMPLEMENTATION http://www.dreimt.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
Research Support, Non-U.S. Gov't |
4 |
8 |
19
|
Glez-Peña D, Gómez-López G, Reboiro-Jato M, Fdez-Riverola F, Pisano DG. PileLine: a toolbox to handle genome position information in next-generation sequencing studies. BMC Bioinformatics 2011; 12:31. [PMID: 21261974 PMCID: PMC3037855 DOI: 10.1186/1471-2105-12-31] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 01/24/2011] [Indexed: 12/02/2022] Open
Abstract
Background Genomic position (GP) files currently used in next-generation sequencing (NGS) studies are always difficult to manipulate due to their huge size and the lack of appropriate tools to properly manage them. The structure of these flat files is based on representing one line per position that has been covered by at least one aligned read, imposing significant restrictions from a computational performance perspective. Results PileLine implements a flexible command-line toolkit providing specific support to the management, filtering, comparison and annotation of GP files produced by NGS experiments. PileLine tools are coded in Java and run on both UNIX (Linux, Mac OS) and Windows platforms. The set of tools comprising PileLine are designed to be memory efficient by performing fast seek on-disk operations over sorted GP files. Conclusions Our novel toolbox has been extensively tested taking into consideration performance issues. It is publicly available at http://sourceforge.net/projects/pilelinetools under the GNU LGPL license. Full documentation including common use cases and guided analysis workflows is available at http://sing.ei.uvigo.es/pileline.
Collapse
|
Research Support, Non-U.S. Gov't |
14 |
7 |
20
|
Nogueira-Rodríguez A, Reboiro-Jato M, Glez-Peña D, López-Fernández H. Performance of Convolutional Neural Networks for Polyp Localization on Public Colonoscopy Image Datasets. Diagnostics (Basel) 2022; 12:898. [PMID: 35453946 PMCID: PMC9027927 DOI: 10.3390/diagnostics12040898] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 03/31/2022] [Accepted: 04/01/2022] [Indexed: 01/10/2023] Open
Abstract
Colorectal cancer is one of the most frequent malignancies. Colonoscopy is the de facto standard for precancerous lesion detection in the colon, i.e., polyps, during screening studies or after facultative recommendation. In recent years, artificial intelligence, and especially deep learning techniques such as convolutional neural networks, have been applied to polyp detection and localization in order to develop real-time CADe systems. However, the performance of machine learning models is very sensitive to changes in the nature of the testing instances, especially when trying to reproduce results for totally different datasets to those used for model development, i.e., inter-dataset testing. Here, we report the results of testing of our previously published polyp detection model using ten public colonoscopy image datasets and analyze them in the context of the results of other 20 state-of-the-art publications using the same datasets. The F1-score of our recently published model was 0.88 when evaluated on a private test partition, i.e., intra-dataset testing, but it decayed, on average, by 13.65% when tested on ten public datasets. In the published research, the average intra-dataset F1-score is 0.91, and we observed that it also decays in the inter-dataset setting to an average F1-score of 0.83.
Collapse
|
research-article |
3 |
7 |
21
|
Rubio-Camarillo M, López-Fernández H, Gómez-López G, Carro Á, Fernández JM, Torre CF, Fdez-Riverola F, Glez-Peña D. RUbioSeq+: A multiplatform application that executes parallelized pipelines to analyse next-generation sequencing data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 138:73-81. [PMID: 27886717 DOI: 10.1016/j.cmpb.2016.10.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2016] [Revised: 09/05/2016] [Accepted: 10/18/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVE To facilitate routine analysis and to improve the reproducibility of the results, next-generation sequencing (NGS) analysis requires intuitive, efficient and integrated data processing pipelines. METHODS We have selected well-established software to construct a suite of automated and parallelized workflows to analyse NGS data for DNA-seq (single-nucleotide variants (SNVs) and indels), CNA-seq, bisulfite-seq and ChIP-seq experiments. RESULTS Here, we present RUbioSeq+, an updated and extended version of RUbioSeq, a multiplatform application that incorporates a suite of automated and parallelized workflows to analyse NGS data. This new version includes: (i) an interactive graphical user interface (GUI) that facilitates its use by both biomedical researchers and bioinformaticians, (ii) a new pipeline for ChIP-seq experiments, (iii) pair-wise comparisons (case-control analyses) for DNA-seq experiments, (iv) and improvements in the parallelized and multithreaded execution options. Results generated by our software have been experimentally validated and accepted for publication. CONCLUSIONS RUbioSeq+ is free and open to all users at http://rubioseq.bioinfo.cnio.es/.
Collapse
|
|
8 |
7 |
22
|
Galesio M, López-Fdez H, Reboiro-Jato M, Gómez-Meire S, Glez-Peña D, Fdez-Riverola F, Lodeiro C, Diniz ME, Capelo JL. Speeding up the screening of steroids in urine: development of a user-friendly library. Steroids 2013; 78:1226-32. [PMID: 24036418 DOI: 10.1016/j.steroids.2013.08.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 08/14/2013] [Accepted: 08/23/2013] [Indexed: 12/27/2022]
Abstract
This work presents a novel database search engine - MLibrary - designed to assist the user in the detection and identification of androgenic anabolic steroids (AAS) and its metabolites by matrix assisted laser desorption/ionization (MALDI) and mass spectrometry-based strategies. The detection of the AAS in the samples was accomplished by searching (i) the mass spectrometric (MS) spectra against the library developed to identify possible positives and (ii) by comparison of the tandem mass spectrometric (MS/MS) spectra produced after fragmentation of the possible positives with a complete set of spectra that have previously been assigned to the software. The urinary screening for anabolic agents plays a major role in anti-doping laboratories as they represent the most abused drug class in sports. With the help of the MLibrary software application, the use of MALDI techniques for doping control is simplified and the time for evaluation and interpretation of the results is reduced. To do so, the search engine takes as input several MALDI-TOF-MS and MALDI-TOF-MS/MS spectra. It aids the researcher in an automatic mode by identifying possible positives in a single MS analysis and then confirming their presence in tandem MS analysis by comparing the experimental tandem mass spectrometric data with the database. Furthermore, the search engine can, potentially, be further expanded to other compounds in addition to AASs. The applicability of the MLibrary tool is shown through the analysis of spiked urine samples.
Collapse
|
|
12 |
7 |
23
|
López-Fernández H, Glez-Peña D, Reboiro-Jato M, Gómez-López G, Pisano DG, Fdez-Riverola F. PileLineGUI: a desktop environment for handling genome position files in next-generation sequencing studies. Nucleic Acids Res 2011; 39:W562-6. [PMID: 21646339 PMCID: PMC3125801 DOI: 10.1093/nar/gkr439] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Next-generation sequencing (NGS) technologies are making sequence data available on an unprecedented scale. In this context, new catalogs of Single Nucleotide Polymorphism and mutations generated by resequencing studies are usually stored in genome position files (e.g. Variant Call Format, SAMTools pileup, BED, GFF) comprising of large lists of genomic positions, which are difficult to handle by researchers. Here, we present PileLineGUI, a novel desktop application primarily designed for manipulating, browsing and analysing genome position files (GPF), with specific support to somatic mutation finding studies. The developed tool also integrates a new genome browser module specially designed for inspecting GPFs. PileLineGUI is free, multiplatform and designed to be intuitively used by biomedical researchers. PileLineGUI is available at: http://sing.ei.uvigo.es/pileline/pilelinegui.html.
Collapse
|
Research Support, Non-U.S. Gov't |
14 |
6 |
24
|
Santos HM, Reboiro-Jato M, Glez-Peña D, Nunes-Miranda JD, Fdez-Riverola F, Carvallo R, Capelo JL. Decision peptide-driven: a free software tool for accurate protein quantification using gel electrophoresis and matrix assisted laser desorption ionization time of flight mass spectrometry. Talanta 2010; 82:1412-20. [PMID: 20801349 DOI: 10.1016/j.talanta.2010.07.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Revised: 06/30/2010] [Accepted: 07/03/2010] [Indexed: 10/19/2022]
Abstract
The decision peptide-driven tool implements a software application for assisting the user in a protocol for accurate protein quantification based on the following steps: (1) protein separation through gel electrophoresis; (2) in-gel protein digestion; (3) direct and inverse (18)O-labeling and (4) matrix assisted laser desorption ionization time of flight mass spectrometry, MALDI analysis. The DPD software compares the MALDI results of the direct and inverse (18)O-labeling experiments and quickly identifies those peptides with paralleled loses in different sets of a typical proteomic workflow. Those peptides are used for subsequent accurate protein quantification. The interpretation of the MALDI data from direct and inverse labeling experiments is time-consuming requiring a significant amount of time to do all comparisons manually. The DPD software shortens and simplifies the searching of the peptides that must be used for quantification from a week to just some minutes. To do so, it takes as input several MALDI spectra and aids the researcher in an automatic mode (i) to compare data from direct and inverse (18)O-labeling experiments, calculating the corresponding ratios to determine those peptides with paralleled losses throughout different sets of experiments; and (ii) allow to use those peptides as internal standards for subsequent accurate protein quantification using (18)O-labeling. In this work the DPD software is presented and explained with the quantification of protein carbonic anhydrase.
Collapse
|
Research Support, Non-U.S. Gov't |
15 |
6 |
25
|
Calvo-Dmgz D, Gálvez JF, Glez-Peña D, Gómez-Meire S, Fdez-Riverola F. Using Variable Precision Rough Set for Selection and Classification of Biological Knowledge Integrated in DNA Gene Expression. J Integr Bioinform 2012. [DOI: 10.1515/jib-2012-199] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Summary DNA microarrays have contributed to the exponential growth of genomic and experimental data in the last decade. This large amount of gene expression data has been used by researchers seeking diagnosis of diseases like cancer using machine learning methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge, provided as gene sets, into the classication process by means of Variable Precision Rough Set Theory (VPRS). The proposed model is able to highlight which part of the provided biological knowledge has been important for classification. This paper presents a novel model for microarray data classification which is able to incorporate prior biological knowledge in the form of gene sets. Based on this knowledge, we transform the input microarray data into supergenes, and then we apply rough set theory to select the most promising supergenes and to derive a set of easy interpretable classification rules. The proposed model is evaluated over three breast cancer microarrays datasets obtaining successful results compared to classical classification techniques. The experimental results shows that there are not significat differences between our model and classical techniques but it is able to provide a biological-interpretable explanation of how it classifies new samples.
Collapse
|
|
13 |
5 |