1
|
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, Robinson MD. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. Biostatistics 2024; 25:1079-1093. [PMID: 38887902 DOI: 10.1093/biostatistics/kxae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 03/21/2024] [Accepted: 05/15/2024] [Indexed: 06/20/2024] Open
Abstract
Although transcriptomics data is typically used to analyze mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g. healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, ie reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, vs. state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.
Collapse
Affiliation(s)
- Simone Tiberi
- Department of Statistical Sciences, University of Bologna, Via delle Belle Arti 41, Bologna, 40126, Italy
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Joël Meili
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Peiying Cai
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Computational Biology Platform, Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Fabrikstrasse 24, Basel, 4056, Switzerland
| | - Dongze He
- Department of Cell Biology and Molecular Genetics, University of Maryland, 4062 Campus Drive, College Park, MD 20742, United States
- Center for Bioinformatics and Computational Biology, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, 35 Olden St, Princeton, NJ 08540, United States
| | - Alejandra Avalos-Pacheco
- Research Unit of Applied Statistics, TU Wien, Wiedner Hauptstrabe 8-10/105, Wien 1040, Austria
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, 200 Longwood Avenue, Boston, MA 02115200 Longwood Avenue, Boston, MA 02115, United States
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
- Department of Computer Science, University of Maryland, 8125 Paint Branch Dr, College Park, MD 20742, United States
| | - Mark D Robinson
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland
| |
Collapse
|
2
|
Liyanaarachchi VC, Nishshanka GKSH, Nimarshana PHV, Chang JS, Ariyadasa TU, Nagarajan D. Modeling of astaxanthin biosynthesis via machine learning, mathematical and metabolic network modeling. Crit Rev Biotechnol 2024; 44:996-1017. [PMID: 37587012 DOI: 10.1080/07388551.2023.2237183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/04/2023] [Accepted: 06/17/2023] [Indexed: 08/18/2023]
Abstract
Natural astaxanthin is synthesized by diverse organisms including: bacteria, fungi, microalgae, and plants involving complex cellular processes, which depend on numerous interrelated parameters. Nonetheless, existing knowledge regarding astaxanthin biosynthesis and the conditions influencing astaxanthin accumulation is fairly limited. Thus, manipulation of the growth conditions to achieve desired biomass and astaxanthin yields can be a complicated process requiring cost-intensive and time-consuming experiment-based research. As a potential solution, modeling and simulation of biological systems have recently emerged, allowing researchers to predict/estimate astaxanthin production dynamics in selected organisms. Moreover, mathematical modeling techniques would enable further optimization of astaxanthin synthesis in a shorter period of time, ultimately contributing to a notable reduction in production costs. Thus, the present review comprehensively discusses existing mathematical modeling techniques which simulate the bioaccumulation of astaxanthin in diverse organisms. Associated challenges, solutions, and future perspectives are critically analyzed and presented.
Collapse
Affiliation(s)
| | | | - P H Viraj Nimarshana
- Department of Mechanical Engineering, Faculty of Engineering, University of Moratuwa, Moratuwa, Sri Lanka
| | - Jo-Shu Chang
- Department of Chemical Engineering, National Cheng Kung University, Tainan, Taiwan
- Department of Chemical and Materials Engineering, Tunghai University, Taichung, Taiwan
- Research Center for Smart Sustainable Circular Economy, Tunghai University, Taichung, Taiwan
- Department of Chemical Engineering and Materials Science, Yuan Ze University, Chung-Li, Taiwan
| | - Thilini U Ariyadasa
- Department of Chemical and Process Engineering, Faculty of Engineering, University of Moratuwa, Moratuwa, Sri Lanka
| | - Dillirani Nagarajan
- Department of Chemical Engineering, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
3
|
Jackson DJ, Cerveau N, Posnien N. De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide. Front Zool 2024; 21:17. [PMID: 38902827 PMCID: PMC11188175 DOI: 10.1186/s12983-024-00538-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
Many questions in biology benefit greatly from the use of a variety of model systems. High-throughput sequencing methods have been a triumph in the democratization of diverse model systems. They allow for the economical sequencing of an entire genome or transcriptome of interest, and with technical variations can even provide insight into genome organization and the expression and regulation of genes. The analysis and biological interpretation of such large datasets can present significant challenges that depend on the 'scientific status' of the model system. While high-quality genome and transcriptome references are readily available for well-established model systems, the establishment of such references for an emerging model system often requires extensive resources such as finances, expertise and computation capabilities. The de novo assembly of a transcriptome represents an excellent entry point for genetic and molecular studies in emerging model systems as it can efficiently assess gene content while also serving as a reference for differential gene expression studies. However, the process of de novo transcriptome assembly is non-trivial, and as a rule must be empirically optimized for every dataset. For the researcher working with an emerging model system, and with little to no experience with assembling and quantifying short-read data from the Illumina platform, these processes can be daunting. In this guide we outline the major challenges faced when establishing a reference transcriptome de novo and we provide advice on how to approach such an endeavor. We describe the major experimental and bioinformatic steps, provide some broad recommendations and cautions for the newcomer to de novo transcriptome assembly and differential gene expression analyses. Moreover, we provide an initial selection of tools that can assist in the journey from raw short-read data to assembled transcriptome and lists of differentially expressed genes.
Collapse
Affiliation(s)
- Daniel J Jackson
- University of Göttingen, Department of Geobiology, Goldschmidtstr.3, Göttingen, 37077, Germany.
| | - Nicolas Cerveau
- University of Göttingen, Department of Geobiology, Goldschmidtstr.3, Göttingen, 37077, Germany
| | - Nico Posnien
- University of Göttingen, Department of Developmental Biology, GZMB, Justus-Von-Liebig-Weg 11, Göttingen, 37077, Germany.
| |
Collapse
|
4
|
Tiberi S, Meili J, Cai P, Soneson C, He D, Sarkar H, Avalos-Pacheco A, Patro R, Robinson MD. DifferentialRegulation: a Bayesian hierarchical approach to identify differentially regulated genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.17.553679. [PMID: 37645841 PMCID: PMC10462127 DOI: 10.1101/2023.08.17.553679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Motivation Although transcriptomics data is typically used to analyse mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g., healthy vs. diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, i.e., reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Results Here, we present DifferentialRegulation, a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, versus state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. Availability and implementation DifferentialRegulation is distributed as a Bioconductor R package.
Collapse
Affiliation(s)
- Simone Tiberi
- Department of Statistical Sciences, University of Bologna, Bologna, Italy
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Joël Meili
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Peiying Cai
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Charlotte Soneson
- Computational Biology Platform, Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Dongze He
- Department of Cell Biology and Molecular Genetics, University of Maryland, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Hirak Sarkar
- Department of Computer Science, Princeton University, NJ, USA
| | - Alejandra Avalos-Pacheco
- Research Unit of Applied Statistics, TU Wien, Vienna, Austria
- Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, MD, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Mark D Robinson
- Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
5
|
Almeida da Paz M, Warger S, Taher L. Disregarding multimappers leads to biases in the functional assessment of NGS data. BMC Genomics 2024; 25:455. [PMID: 38720252 PMCID: PMC11078754 DOI: 10.1186/s12864-024-10344-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored. RESULTS In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified. CONCLUSION Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.
Collapse
Affiliation(s)
| | - Sarah Warger
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria
| | - Leila Taher
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria.
| |
Collapse
|
6
|
Prada-Luengo I, Schuster V, Liang Y, Terkelsen T, Sora V, Krogh A. N-of-one differential gene expression without control samples using a deep generative model. Genome Biol 2023; 24:263. [PMID: 37974217 PMCID: PMC10655485 DOI: 10.1186/s13059-023-03104-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can identify the closest normal representation for a given disease sample. This enables control-free, single-sample differential expression analysis. In breast cancer, we demonstrate how our approach selects marker genes and outperforms a state-of-the-art method. Furthermore, significant genes identified by the model are enriched in driver genes across cancers. Our results show that the in silico closest normal provides a more favorable comparison than control samples.
Collapse
Affiliation(s)
- Iñigo Prada-Luengo
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Viktoria Schuster
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Yuhu Liang
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Thilde Terkelsen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Valentina Sora
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Anders Krogh
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
7
|
Pandey D, Onkara Perumal P. A scoping review on deep learning for next-generation RNA-Seq. data analysis. Funct Integr Genomics 2023; 23:134. [PMID: 37084004 DOI: 10.1007/s10142-023-01064-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/24/2023] [Accepted: 04/17/2023] [Indexed: 04/22/2023]
Abstract
In the last decade, transcriptome research adopting next-generation sequencing (NGS) technologies has gathered incredible momentum amongst functional genomics scientists, particularly amongst clinical/biomedical research groups. The progressive enfoldment/adoption of NGS technologies has incited an abundance of next-generation transcriptomic data harbouring an opulence of new knowledge in public databases. Nevertheless, knowledge discovery from these next-generation RNA-Seq. data analysis necessitates extensive bioinformatics know-how besides elaborate data analysis software packages consistent with the type and context of data analysis. Several reliability and reproducibility concerns continue to impede RNA-Seq. data analysis. Characteristic challenges comprise of data quality, hardware and networking provisions, selection and prioritisation of data analysis tools, and yet significantly implementing of robust machine learning algorithms for maximised exploitation of these experimental transcriptomic data. Over the years, numerous machine learning algorithms have been implemented for improved transcriptomic data analysis executing predominantly shallow learning approaches. More recently, deep learning algorithms are becoming more mainstream, and enactment for next-generation RNA-Seq. data analysis could be revolutionary in the coming years in the biomedical domain. In this scoping review, we attempt to determine the existing literature's size and potential nature in deep learning and NGS RNA-Seq. data analysis. An analysis of the contemporary topics of next-generation RNA-Seq. data analysis based on deep learning algorithms is critically reviewed, emphasising open-source resources.
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India
| | - P Onkara Perumal
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India.
| |
Collapse
|
8
|
Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12041850] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomarkers. In the past, analyses were performed on RNA-Seq data pertaining to the same cancer class as positive and negative samples, i.e., without samples of other cancer types. To perform multiple cancer type classification and to find differentially expressed genes, data for multiple cancer types need to be analyzed. Several repositories offer RNA-Seq data for various cancer types. In this paper, data from the Mendeley data repository for five cancer types are analyzed. As a first step, RNA-Seq values are converted to 2D images using normalization and zero padding. In the next step, relevant features are extracted and selected using Deep Learning (DL). In the last phase, classification is performed, and eight DL algorithms are used. Results and discussion are based on four different splitting strategies and k-fold cross validation for each DL classifier. Furthermore, a comparative analysis is performed with state of the art techniques discussed in literature. The results demonstrated that classifiers performed best at 70–30 split, and that Convolutional Neural Network (CNN) achieved the best overall results. Hence, CNN is the best DL model for classification among the eight studied DL models, and is easy to implement and simple to understand.
Collapse
|
9
|
Hita A, Brocart G, Fernandez A, Rehmsmeier M, Alemany A, Schvartzman S. MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts. BMC Bioinformatics 2022; 23:39. [PMID: 35030988 PMCID: PMC8760670 DOI: 10.1186/s12859-021-04544-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 12/20/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount .
Collapse
Affiliation(s)
- Andrea Hita
- Epigenetics unit, Diagenode s.a., Liège, Belgium
- Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | | | - Ana Fernandez
- Epigenetics unit, Diagenode s.a., Liège, Belgium
- Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Marc Rehmsmeier
- Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Anna Alemany
- Department of Anatomy and Embryology, Leiden University Medical Centre, Leiden, The Netherlands
| | | |
Collapse
|
10
|
Potemkin N, Cawood SMF, Treece J, Guévremont D, Rand CJ, McLean C, Stanton JAL, Williams JM. A method for simultaneous detection of small and long RNA biotypes by ribodepleted RNA-Seq. Sci Rep 2022; 12:621. [PMID: 35022475 PMCID: PMC8755727 DOI: 10.1038/s41598-021-04209-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/24/2021] [Indexed: 11/09/2022] Open
Abstract
RNA sequencing offers unprecedented access to the transcriptome. Key to this is the identification and quantification of many different species of RNA from the same sample at the same time. In this study we describe a novel protocol for simultaneous detection of coding and non-coding transcripts using modifications to the Ion Total RNA-Seq kit v2 protocol, with integration of QIASeq FastSelect rRNA removal kit. We report highly consistent sequencing libraries can be produced from both frozen high integrity mouse hippocampal tissue and the more challenging post-mortem human tissue. Removal of rRNA using FastSelect was extremely efficient, resulting in less than 1.5% rRNA content in the final library. We identified > 30,000 unique transcripts from all samples, including protein-coding genes and many species of non-coding RNA, in biologically-relevant proportions. Furthermore, the normalized sequencing read count for select genes significantly negatively correlated with Ct values from qRT-PCR analysis from the same samples. These results indicate that this protocol accurately and consistently identifies and quantifies a wide variety of transcripts simultaneously. The highly efficient rRNA depletion, coupled with minimized sample handling and without complicated and high-loss size selection protocols, makes this protocol useful to researchers wishing to investigate whole transcriptomes.
Collapse
Affiliation(s)
- Nikita Potemkin
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand
| | - Sophie M F Cawood
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand
| | - Jackson Treece
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Diane Guévremont
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand
| | - Christy J Rand
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Catriona McLean
- Victorian Brain Bank, The Florey Institute of Neuroscience and Mental Health, Melbourne, VIC, Australia
- Anatomical Pathology, The Alfred Hospital, Melbourne, VIC, Australia
| | - Jo-Ann L Stanton
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand
| | - Joanna M Williams
- Department of Anatomy, School of Biomedical Sciences, University of Otago, P.O. Box 56, Dunedin, New Zealand.
- Brain Health Research Centre, Brain Research New Zealand-Rangahau Roro Aotearoa, University of Otago, Dunedin, New Zealand.
| |
Collapse
|
11
|
Wang TF, Chen DS, Zhu JW, Zhu B, Wang ZL, Cao JG, Feng CH, Zhao JW. Unsupervised Machine Learning-Based Analysis of Clinical Features, Bone Mineral Density Features and Medical Care Costs of Rotator Cuff Tears. Risk Manag Healthc Policy 2021; 14:3977-3986. [PMID: 34588829 PMCID: PMC8472212 DOI: 10.2147/rmhp.s330555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 09/16/2021] [Indexed: 11/30/2022] Open
Abstract
Purpose We aim to present unsupervised machine learning-based analysis of clinical features, bone mineral density (BMD) features, and medical care costs of Rotator cuff tears (RCT). Patients and Methods Fifty-three patients with RCT were reviewed, the clinical features, BMD features, and medical care costs were collected and analyzed by descriptive statistics. Furtherly, unsupervised machine learning (UML) algorithm was used for dimensionality reduction and cluster analysis of the RCT data. Results There were 26 males and 27 females. The patients were divided into four subgroups using the UML algorithm. There were significant differences among four subgroups regarding trauma exposure, full-thickness supraspinatus tendon tears, infraspinatus tendon tear, subscapularis tendon tear, BMD distribution, medial row anchors, lateral row anchors, total medical care costs, and consumables costs. We observed the highest frequency of trauma exposure, infraspinatus tendon tear, subscapularis tendon tear, osteoporosis, the highest number of medial row anchors, lateral row anchors, total medical care costs, and consumables costs in subgroup II. Conclusion The unsupervised machine learning-based analysis of RCT can provide clinically meaningful classification, which shows good interpretability and contribute to a better understanding of RCT. The significance of the results is limited due to the small number of samples, a larger follow-up study is needed to confirm the encouraging results.
Collapse
Affiliation(s)
- Tong-Fu Wang
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - De-Sheng Chen
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - Jia-Wang Zhu
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - Bo Zhu
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - Zeng-Liang Wang
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - Jian-Gang Cao
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - Cai-Hong Feng
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| | - Jun-Wei Zhao
- Department of Sports Medicine and Arthroscopy, Tianjin Hospital of Tianjin University, Tianjin, People's Republic of China
| |
Collapse
|
12
|
Krappinger JC, Bonstingl L, Pansy K, Sallinger K, Wreglesworth NI, Grinninger L, Deutsch A, El-Heliebi A, Kroneis T, Mcfarlane RJ, Sensen CW, Feichtinger J. Non-coding Natural Antisense Transcripts: Analysis and Application. J Biotechnol 2021; 340:75-101. [PMID: 34371054 DOI: 10.1016/j.jbiotec.2021.08.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 06/30/2021] [Accepted: 08/04/2021] [Indexed: 12/12/2022]
Abstract
Non-coding natural antisense transcripts (ncNATs) are regulatory RNA sequences that are transcribed in the opposite direction to protein-coding or non-coding transcripts. These transcripts are implicated in a broad variety of biological and pathological processes, including tumorigenesis and oncogenic progression. With this complex field still in its infancy, annotations, expression profiling and functional characterisations of ncNATs are far less comprehensive than those for protein-coding genes, pointing out substantial gaps in the analysis and characterisation of these regulatory transcripts. In this review, we discuss ncNATs from an analysis perspective, in particular regarding the use of high-throughput sequencing strategies, such as RNA-sequencing, and summarize the unique challenges of investigating the antisense transcriptome. Finally, we elaborate on their potential as biomarkers and future targets for treatment, focusing on cancer.
Collapse
Affiliation(s)
- Julian C Krappinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria
| | - Lilli Bonstingl
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Katrin Pansy
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria
| | - Katja Sallinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Nick I Wreglesworth
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Lukas Grinninger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Austrian Biotech University of Applied Sciences, Konrad Lorenz-Straße 10, 3430 Tulln an der Donau, Austria
| | - Alexander Deutsch
- Division of Haematology, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria
| | - Amin El-Heliebi
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Thomas Kroneis
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Center for Biomarker Research in Medicine, Stiftingtalstraße 5, 8010 Graz, Austria
| | - Ramsay J Mcfarlane
- North West Cancer Research Institute, School of Medical Sciences, Bangor University, LL57 2UW Bangor, United Kingdom
| | - Christoph W Sensen
- BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria; Institute of Computational Biotechnology, Graz University of Technology, Petersgasse 14/V, 8010 Graz, Austria; HCEMM Kft., Római blvd. 21, 6723 Szeged, Hungary
| | - Julia Feichtinger
- Division of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center for Cell Signalling, Metabolism and Aging, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; Christian Doppler Laboratory for innovative Pichia pastoris host and vector systems, Division of Cell Biology, Histology and Embryology, Medical University of Graz, Neue Stiftingtalstraße 6/II, 8010 Graz, Austria; BioTechMed-Graz, Mozartgasse 12/II, 8010 Graz, Austria.
| |
Collapse
|
13
|
Ma A, McDermaid A, Xu J, Chang Y, Ma Q. Integrative Methods and Practical Challenges for Single-Cell Multi-omics. Trends Biotechnol 2020; 38:1007-1022. [PMID: 32818441 PMCID: PMC7442857 DOI: 10.1016/j.tibtech.2020.02.013] [Citation(s) in RCA: 118] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 02/27/2020] [Accepted: 02/28/2020] [Indexed: 12/19/2022]
Abstract
Fast-developing single-cell multimodal omics (scMulti-omics) technologies enable the measurement of multiple modalities, such as DNA methylation, chromatin accessibility, RNA expression, protein abundance, gene perturbation, and spatial information, from the same cell. scMulti-omics can comprehensively explore and identify cell characteristics, while also presenting challenges to the development of computational methods and tools for integrative analyses. Here, we review these integrative methods and summarize the existing tools for studying a variety of scMulti-omics data. The various functionalities and practical challenges in using the available tools in the public domain are explored through several case studies. Finally, we identify remaining challenges and future trends in scMulti-omics modeling and analyses.
Collapse
Affiliation(s)
- Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Adam McDermaid
- Imagenetics, Sanford Health, Sioux Falls, SD 57104, USA; Department of Internal Medicine, University of South Dakota, Virmillion, SD 57069, USA
| | - Jennifer Xu
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA.
| |
Collapse
|
14
|
Manavalan B, Basith S, Shin TH, Wei L, Lee G. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics 2020; 35:2757-2765. [PMID: 30590410 DOI: 10.1093/bioinformatics/bty1047] [Citation(s) in RCA: 165] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/05/2018] [Accepted: 12/20/2018] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction. RESULTS In this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6-7% in both benchmarking and independent datasets. AVAILABILITY AND IMPLEMENTATION The user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Leyi Wei
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.,Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| |
Collapse
|
15
|
Deschamps-Francoeur G, Simoneau J, Scott MS. Handling multi-mapped reads in RNA-seq. Comput Struct Biotechnol J 2020; 18:1569-1576. [PMID: 32637053 PMCID: PMC7330433 DOI: 10.1016/j.csbj.2020.06.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 06/06/2020] [Accepted: 06/07/2020] [Indexed: 11/07/2022] Open
Abstract
Many eukaryotic genomes harbour large numbers of duplicated sequences, of diverse biotypes, resulting from several mechanisms including recombination, whole genome duplication and retro-transposition. Such repeated sequences complicate gene/transcript quantification during RNA-seq analysis due to reads mapping to more than one locus, sometimes involving genes embedded in other genes. Genes of different biotypes have dissimilar levels of sequence duplication, with long-noncoding RNAs and messenger RNAs sharing less sequence similarity to other genes than biotypes encoding shorter RNAs. Many strategies have been elaborated to handle these multi-mapped reads, resulting in increased accuracy in gene/transcript quantification, although separate tools are typically used to estimate the abundance of short and long genes due to their dissimilar characteristics. This review discusses the mechanisms leading to sequence duplication, the biotypes affected, the computational strategies employed to deal with multi-mapped reads and the challenges that still remain to be overcome.
Collapse
Affiliation(s)
- Gabrielle Deschamps-Francoeur
- Département de Biochimie et Génomique Fonctionnelle, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Joël Simoneau
- Département de Biochimie et Génomique Fonctionnelle, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| | - Michelle S. Scott
- Département de Biochimie et Génomique Fonctionnelle, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
| |
Collapse
|
16
|
Yang S, Wang Y, Zhang S, Hu X, Ma Q, Tian Y. NCResNet: Noncoding Ribonucleic Acid Prediction Based on a Deep Resident Network of Ribonucleic Acid Sequences. Front Genet 2020; 11:90. [PMID: 32180792 PMCID: PMC7059790 DOI: 10.3389/fgene.2020.00090] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 01/27/2020] [Indexed: 01/15/2023] Open
Abstract
Noncoding RNA (ncRNA) is a kind of RNA that plays an important role in many biological processes, diseases, and cancers, while cannot translate into proteins. With the development of next-generation sequence technology, thousands of novel RNAs with long open reading frames (ORFs, longest ORF length > 303 nt) and short ORFs (longest ORF length ≤ 303 nt) have been discovered in a short time. How to identify ncRNAs more precisely from novel unannotated RNAs is an important step for RNA functional analysis, RNA regulation, etc. However, most previous methods only utilize the information of sequence features. Meanwhile, most of them have focused on long-ORF RNA sequences, but not adapted to short-ORF RNA sequences. In this paper, we propose a new reliable method called NCResNet. NCResNet employs 57 hybrid features of four categories as inputs, including sequence, protein, RNA structure, and RNA physicochemical properties, and introduces feature enhancement and deep feature learning policies in a neural net model to adapt to this problem. The experiments on benchmark datasets of 8 species shows NCResNet has higher accuracy and higher Matthews correlation coefficient (MCC) compared with other state-of-the-art methods. Particularly, on four short-ORF RNA sequence datasets, specifically mouse, Saccharomyces cerevisiae, zebrafish, and cow, NCResNet achieves greater than 10 and 15% improvements over other state-of-the-art methods in terms of accuracy and MCC. Meanwhile, for long-ORF RNA sequence datasets, NCResNet also has better accuracy and MCC than other state-of-the-art methods on most test datasets. Codes and data are available at https://github.com/abcair/NCResNet.
Collapse
Affiliation(s)
- Sen Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, China.,School of Artificial Intelligence, Jilin University, Changchun, China
| | - Shuangquan Zhang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, and College of Computer Science and Technology, Jilin University, Changchun, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Yuan Tian
- School of Artificial Intelligence, Jilin University, Changchun, China
| |
Collapse
|
17
|
Ma Q, Bücking H, Gonzalez Hernandez JL, Subramanian S. Single-Cell RNA Sequencing of Plant-Associated Bacterial Communities. Front Microbiol 2019; 10:2452. [PMID: 31736899 PMCID: PMC6828647 DOI: 10.3389/fmicb.2019.02452] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 10/11/2019] [Indexed: 11/29/2022] Open
Abstract
Plants in soil are not solitary, hence continually interact with and obtain benefits from a community of microbes ("microbiome"). The meta-functional output from the microbiome results from complex interactions among the different community members with distinct taxonomic identities and metabolic capacities. Particularly, the bacterial communities of the root surface are spatially organized structures composed of root-attached biofilms and planktonic cells arranged in complex layers. With the distinct but coordinated roles among the different member cells, bacterial communities resemble properties of a multicellular organism. High throughput sequencing technologies have allowed rapid and large-scale analysis of taxonomic composition and metabolic capacities of bacterial communities. However, these methods are generally unable to reconstruct the assembly of these communities, or how the gene expression patterns in individual cells/species are coordinated within these communities. Single-cell transcriptomes of community members can identify how gene expression patterns vary among members of the community, including differences among different cells of the same species. This information can be used to classify cells based on functional gene expression patterns, and predict the spatial organization of the community. Here we discuss strategies for the isolation of single bacterial cells, mRNA enrichment, library construction, and analysis and interpretation of the resulting single-cell RNA-Seq datasets. Unraveling regulatory and metabolic processes at the single cell level is expected to yield an unprecedented discovery of mechanisms involved in bacterial recruitment, attachment, assembly, organization of the community, or in the specific interactions among the different members of these communities.
Collapse
Affiliation(s)
- Qin Ma
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
| | - Heike Bücking
- Biology and Microbiology Department, South Dakota State University, Brookings, SD, United States
| | - Jose L. Gonzalez Hernandez
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Biology and Microbiology Department, South Dakota State University, Brookings, SD, United States
| | - Senthil Subramanian
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Biology and Microbiology Department, South Dakota State University, Brookings, SD, United States
| |
Collapse
|
18
|
Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet 2019; 20:631-656. [DOI: 10.1038/s41576-019-0150-2] [Citation(s) in RCA: 679] [Impact Index Per Article: 135.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2019] [Indexed: 12/12/2022]
|