1
|
Cabús L, Lagarde J, Curado J, Lizano E, Pérez-Boza J. Current challenges and best practices for cell-free long RNA biomarker discovery. Biomark Res 2022; 10:62. [PMID: 35978416 PMCID: PMC9385245 DOI: 10.1186/s40364-022-00409-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 08/04/2022] [Indexed: 11/24/2022] Open
Abstract
The analysis of biomarkers in biological fluids, also known as liquid biopsies, is seen with great potential to diagnose complex diseases such as cancer with a high sensitivity and minimal invasiveness. Although it can target any biomolecule, most liquid biopsy studies have focused on circulating nucleic acids. Historically, studies have aimed at the detection of specific mutations on cell-free DNA (cfDNA), but recently, the study of cell-free RNA (cfRNA) has gained traction. Since 2020, a handful of cfDNA tests have been approved for therapy selection by the FDA, however, no cfRNA tests are approved to date. One of the main drawbacks in the field of RNA-based liquid biopsies is the low reproducibility of the results, often caused by technical and biological variability, a lack of standardized protocols and insufficient cohorts. In this review, we will identify the main challenges and biases introduced during the different stages of biomarker discovery in liquid biopsies with cfRNA and propose solutions to minimize them.
Collapse
Affiliation(s)
- Lluc Cabús
- Institut de Biologia Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
- Flomics Biotech, Barcelona, Spain
| | | | | | - Esther Lizano
- Institut de Biologia Evolutiva, Universitat Pompeu Fabra, Barcelona, Spain
| | | |
Collapse
|
2
|
Chen S, Ren C, Zhai J, Yu J, Zhao X, Li Z, Zhang T, Ma W, Han Z, Ma C. CAFU: a Galaxy framework for exploring unmapped RNA-Seq data. Brief Bioinform 2021; 21:676-686. [PMID: 30815667 PMCID: PMC7299299 DOI: 10.1093/bib/bbz018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 01/23/2019] [Accepted: 01/27/2019] [Indexed: 12/13/2022] Open
Abstract
A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.
Collapse
Affiliation(s)
- Siyuan Chen
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Chengzhi Ren
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Jiantao Yu
- College of Information Engineering, Northwest Agriculture and Forestry University
| | - Xuyang Zhao
- College of Information Engineering, Northwest Agriculture and Forestry University
| | - Zelong Li
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Ting Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Wenlong Ma
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Zhaoxue Han
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| | - Chuang Ma
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University
| |
Collapse
|
3
|
Shi X, Wu J, Lang X, Wang C, Bai Y, Riley DG, Liu L, Ma X. Comparative transcriptome and histological analyses provide insights into the skin pigmentation in Minxian black fur sheep (Ovis aries). PeerJ 2021; 9:e11122. [PMID: 33986980 PMCID: PMC8086576 DOI: 10.7717/peerj.11122] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 02/25/2021] [Indexed: 12/30/2022] Open
Abstract
Background Minxian black fur (MBF) sheep are found in the northwestern parts of China. These sheep have developed several special traits. Skin color is a phenotype subject to strong natural selection and diverse skin colors are likely a consequence of differences in gene regulation. Methods Skin structure, color differences, and gene expression (determined by RNA sequencing) were evaluated the Minxian black fur and Small-tail Han sheep (n = 3 each group), which are both native Chinese sheep breeds. Results Small-tail Han sheep have a thicker skin and dermis than the Minxian black fur sheep (P < 0.01); however, the quantity of melanin granules is greater (P < 0.01) in Minxian black fur sheep with a more extensive distribution in skin tissue and hair follicles. One hundred thirty-three differentially expressed genes were significantly associated with 37 ontological terms and two critical KEGG pathways for pigmentation (“tyrosine metabolism” and “melanogenesis” pathways). Important genes from those pathways with known involvement in pigmentation included OCA2 melanosomal transmembrane protein (OCA2), dopachrome tautomerase (DCT), tyrosinase (TYR) and tyrosinase related protein (TYRP1), melanocortin 1 receptor (MC1R), and premelanosome protein (PMEL). The results from our histological and transcriptome analyses will form a foundation for additional investigation into the genetic basis and regulation of pigmentation in these sheep breeds.
Collapse
Affiliation(s)
- Xiaolei Shi
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, Gansu Province, China
| | - Jianping Wu
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, Gansu Province, China
| | - Xia Lang
- Animal Husbandry, Pasture, and Green Agriculture Institute, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, China
| | - Cailian Wang
- Animal Husbandry, Pasture, and Green Agriculture Institute, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, China.,Key Laboratory for Sheep, Goat, and Cattle Germplasm and Straw Feed in Gansu Province, Lanzhou, Gansu Province, China
| | - Yan Bai
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, Gansu Province, China
| | - David Greg Riley
- Department of Animal Science, Texas A&M University, College Station, TX, USA
| | - Lishan Liu
- Animal Husbandry, Pasture, and Green Agriculture Institute, Gansu Academy of Agricultural Sciences, Lanzhou, Gansu Province, China
| | - Xiaoming Ma
- College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, Gansu Province, China
| |
Collapse
|
4
|
Yang A, Troup M, Ho JWK. Scalability and Validation of Big Data Bioinformatics Software. Comput Struct Biotechnol J 2017; 15:379-386. [PMID: 28794828 PMCID: PMC5537105 DOI: 10.1016/j.csbj.2017.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 06/30/2017] [Accepted: 07/17/2017] [Indexed: 12/20/2022] Open
Abstract
This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
Collapse
Affiliation(s)
- Andrian Yang
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia
| | - Michael Troup
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW 2010, Australia.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW 2010, Australia
| |
Collapse
|
5
|
Wolfien M, Rimmbach C, Schmitz U, Jung JJ, Krebs S, Steinhoff G, David R, Wolkenhauer O. TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation. BMC Bioinformatics 2016; 17:21. [PMID: 26738481 PMCID: PMC4702420 DOI: 10.1186/s12859-015-0873-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Accepted: 12/22/2015] [Indexed: 11/23/2022] Open
Abstract
Background Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa). Results Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs'). Conclusion TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline---manual). Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0873-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Markus Wolfien
- Department of Systems Biology and Bioinformatics, University of Rostock, 18057, Rostock, Germany.
| | - Christian Rimmbach
- Reference und Translation Center for Cardiac Stem Cell Therapy (RTC), University of Rostock, Rostock, 18057, Germany.
| | - Ulf Schmitz
- Gene & Stem Cell Therapy Program, Centenary Institute, 2050, Camperdown, Australia. .,Sydney Medical School, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Julia Jeannine Jung
- Reference und Translation Center for Cardiac Stem Cell Therapy (RTC), University of Rostock, Rostock, 18057, Germany.
| | - Stefan Krebs
- Gene Center Munich, LMU Munich, 81377, Munich, Germany.
| | - Gustav Steinhoff
- Reference und Translation Center for Cardiac Stem Cell Therapy (RTC), University of Rostock, Rostock, 18057, Germany.
| | - Robert David
- Reference und Translation Center for Cardiac Stem Cell Therapy (RTC), University of Rostock, Rostock, 18057, Germany.
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, 18057, Rostock, Germany. .,Stellenbosch Institute of Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, 7602, Stellenbosch, South Africa.
| |
Collapse
|
6
|
D'Antonio M, D'Onorio De Meo P, Pallocca M, Picardi E, D'Erchia AM, Calogero RA, Castrignanò T, Pesole G. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application. BMC Genomics 2015; 16:S3. [PMID: 26046471 PMCID: PMC4461013 DOI: 10.1186/1471-2164-16-s6-s3] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
Collapse
|
7
|
Kornobis E, Cabellos L, Aguilar F, Frías-López C, Rozas J, Marco J, Zardoya R. TRUFA: A User-Friendly Web Server for de novo RNA-seq Analysis Using Cluster Computing. Evol Bioinform Online 2015; 11:97-104. [PMID: 26056424 PMCID: PMC4444131 DOI: 10.4137/ebo.s23873] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Revised: 03/09/2015] [Accepted: 03/16/2015] [Indexed: 01/08/2023] Open
Abstract
Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has become increasingly accessible in recent years and are of great interest to many biological disciplines including, eg, evolutionary biology, ecology, biomedicine, and computational biology. Although virtually any research group can now obtain RNA-seq data, only a few have the bioinformatics knowledge and computation facilities required for transcriptome analysis. Here, we present TRUFA (TRanscriptome User-Friendly Analysis), an open informatics platform offering a web-based interface that generates the outputs commonly used in de novo RNA-seq analysis and comparative transcriptomics. TRUFA provides a comprehensive service that allows performing dynamically raw read cleaning, transcript assembly, annotation, and expression quantification. Due to the computationally intensive nature of such analyses, TRUFA is highly parallelized and benefits from accessing high-performance computing resources. The complete TRUFA pipeline was validated using four previously published transcriptomic data sets. TRUFA’s results for the example datasets showed globally similar results when comparing with the original studies, and performed particularly better when analyzing the green tea dataset. The platform permits analyzing RNA-seq data in a fast, robust, and user-friendly manner. Accounts on TRUFA are provided freely upon request at https://trufa.ifca.es.
Collapse
Affiliation(s)
- Etienne Kornobis
- Departamento de biodiversidad y biología evolutiva, Museo Nacional de Ciencias Naturales MNCN (CSIC), Madrid, Spain
| | - Luis Cabellos
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Fernando Aguilar
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Cristina Frías-López
- Departament de Genètica and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Jesús Marco
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Rafael Zardoya
- Departamento de biodiversidad y biología evolutiva, Museo Nacional de Ciencias Naturales MNCN (CSIC), Madrid, Spain
| |
Collapse
|