1
|
Abrar M, Hussain D, Khan IA, Ullah F, Haq MA, Aleisa MA, Alenizi A, Bhushan S, Martha S. DeepSplice: a deep learning approach for accurate prediction of alternative splicing events in the human genome. Front Genet 2024; 15:1349546. [PMID: 38974384 PMCID: PMC11224287 DOI: 10.3389/fgene.2024.1349546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 05/21/2024] [Indexed: 07/09/2024] Open
Abstract
Alternative splicing (AS) is a crucial process in genetic information processing that generates multiple mRNA molecules from a single gene, producing diverse proteins. Accurate prediction of AS events is essential for understanding various physiological aspects, including disease progression and prognosis. Machine learning (ML) techniques have been widely employed in bioinformatics to address this challenge. However, existing models have limitations in capturing AS events in the presence of mutations and achieving high prediction performance. To overcome these limitations, this research presents deep splicing code (DSC), a deep learning (DL)-based model for AS prediction. The proposed model aims to improve predictive ability by investigating state-of-the-art techniques in AS and developing a DL model specifically designed to predict AS events accurately. The performance of the DSC model is evaluated against existing techniques, revealing its potential to enhance the understanding and predictive power of DL algorithms in AS. It outperforms other models by achieving an average AUC score of 92%. The significance of this research lies in its contribution to identifying functional implications and potential therapeutic targets associated with AS, with applications in genomics, bioinformatics, and biomedical research. The findings of this study have the potential to advance the field and pave the way for more precise and reliable predictions of AS events, ultimately leading to a deeper understanding of genetic information processing and its impact on human physiology and disease.
Collapse
Affiliation(s)
- Mohammad Abrar
- Faculty of Computer Studies, Arab Open University, Muscat, Oman
| | - Didar Hussain
- Department of Computer Science, Bacha Khan University Charsadda, Charsadda, Pakistan
| | - Izaz Ahmad Khan
- Department of Computer Science, Bacha Khan University Charsadda, Charsadda, Pakistan
| | - Fasee Ullah
- Computer and Information Sciences department, Universiti Teknologi PETRONAS, Seri Iskandar, Malaysia
| | - Mohd Anul Haq
- Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al-Majmaah, Saudi Arabia
| | - Mohammed A. Aleisa
- Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al-Majmaah, Saudi Arabia
| | - Abdullah Alenizi
- Department of Information Technology, College of Computer and Information Sciences, Majmaah University, Al-Majmaah, Saudi Arabia
| | - Shashi Bhushan
- Computer and Information Sciences department, Universiti Teknologi PETRONAS, Seri Iskandar, Malaysia
| | - Sheshikala Martha
- School of Computer Science and Artificial Intelligence, SR University, Warangal, India
| |
Collapse
|
2
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:1958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
3
|
Rivera-Chávez J, Ceapă CD, Figueroa M. Biological Dark Matter Exploration using Data Mining for the Discovery of Antimicrobial Natural Products. PLANTA MEDICA 2022; 88:702-720. [PMID: 35697058 DOI: 10.1055/a-1795-0562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The discovery of novel antimicrobials has significantly slowed down over the last three decades. At the same time, humans rely increasingly on antimicrobials because of the progressive antimicrobial resistance in medical practices, human communities, and the environment. Data mining is currently considered a promising option in the discovery of new antibiotics. Some of the advantages of data mining are the ability to predict chemical structures from sequence data, anticipation of the presence of novel metabolites, the understanding of gene evolution, and the corroboration of data from multiple omics technologies. This review analyzes the state-of-the-art for data mining in the fields of bacteria, fungi, and plant genomic data, as well as metabologenomics. It also summarizes some of the most recent research accomplishments in the field, all pinpointing to innovation through uncovering and implementing the next generation of antimicrobials.
Collapse
Affiliation(s)
- José Rivera-Chávez
- Instituto de Química, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Corina-Diana Ceapă
- Instituto de Química, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Mario Figueroa
- Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de México, México
| |
Collapse
|
4
|
Li J, Ho DJ, Henault M, Yang C, Neri M, Ge R, Renner S, Mansur L, Lindeman A, Kelly B, Tumkaya T, Ke X, Soler-Llavina G, Shanker G, Russ C, Hild M, Gubser Keller C, Jenkins JL, Worringer KA, Sigoillot FD, Ihry RJ. DRUG-seq Provides Unbiased Biological Activity Readouts for Neuroscience Drug Discovery. ACS Chem Biol 2022; 17:1401-1414. [PMID: 35508359 PMCID: PMC9207813 DOI: 10.1021/acschembio.1c00920] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Unbiased transcriptomic RNA-seq data has provided deep insights into biological processes. However, its impact in drug discovery has been narrow given high costs and low throughput. Proof-of-concept studies with Digital RNA with pertUrbation of Genes (DRUG)-seq demonstrated the potential to address this gap. We extended the DRUG-seq platform by subjecting it to rigorous testing and by adding an open-source analysis pipeline. The results demonstrate high reproducibility and ability to resolve the mechanism(s) of action for a diverse set of compounds. Furthermore, we demonstrate how this data can be incorporated into a drug discovery project aiming to develop therapeutics for schizophrenia using human stem cell-derived neurons. We identified both an on-target activation signature, induced by a set of chemically distinct positive allosteric modulators of the N-methyl-d-aspartate (NMDA) receptor, and independent off-target effects. Overall, the protocol and open-source analysis pipeline are a step toward industrializing RNA-seq for high-complexity transcriptomics studies performed at a saturating scale.
Collapse
Affiliation(s)
| | | | | | | | - Marilisa Neri
- Chemical and Biological Therapeutics, Novartis Institutes for BioMedical Research, Basel, 4056, Switzerland
| | | | - Steffen Renner
- Chemical and Biological Therapeutics, Novartis Institutes for BioMedical Research, Basel, 4056, Switzerland
| | | | | | | | | | | | | | | | | | | | - Caroline Gubser Keller
- Chemical and Biological Therapeutics, Novartis Institutes for BioMedical Research, Basel, 4056, Switzerland
| | | | | | | | | |
Collapse
|
5
|
Liu S, Cheng H, Ashraf J, Zhang Y, Wang Q, Lv L, He M, Song G, Zuo D. Interpretation of convolutional neural networks reveals crucial sequence features involving in transcription during fiber development. BMC Bioinformatics 2022; 23:91. [PMID: 35291940 PMCID: PMC8922751 DOI: 10.1186/s12859-022-04619-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 02/22/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Upland cotton provides the most natural fiber in the world. During fiber development, the quality and yield of fiber were influenced by gene transcription. Revealing sequence features related to transcription has a profound impact on cotton molecular breeding. We applied convolutional neural networks to predict gene expression status based on the sequences of gene transcription start regions. After that, a gradient-based interpretation and an N-adjusted kernel transformation were implemented to extract sequence features contributing to transcription. RESULTS Our models had approximate 80% accuracies, and the area under the receiver operating characteristic curve reached over 0.85. Gradient-based interpretation revealed 5' untranslated region contributed to gene transcription. Furthermore, 6 DOF binding motifs and 4 transcription activator binding motifs were obtained by N-adjusted kernel-motif transformation from models in three developmental stages. Apart from 10 general motifs, 3 DOF5.1 genes were also detected. In silico analysis about these motifs' binding proteins implied their potential functions in fiber formation. Besides, we also found some novel motifs in plants as important sequence features for transcription. CONCLUSIONS In conclusion, the N-adjusted kernel transformation method could interpret convolutional neural networks and reveal important sequence features related to transcription during fiber development. Potential functions of motifs interpreted from convolutional neural networks could be validated by further wet-lab experiments and applied in cotton molecular breeding.
Collapse
Affiliation(s)
- Shang Liu
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China.,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Hailiang Cheng
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China.,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Javaria Ashraf
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China.,Department of Plant Breeding and Genetics, University College of Agriculture and Environmental Sciences, The Islamia University of Bahawalpur, Punjab, 63100, Pakistan
| | - Youping Zhang
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China.,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Qiaolian Wang
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China.,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Limin Lv
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China.,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China
| | - Man He
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China
| | - Guoli Song
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China. .,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China.
| | - Dongyun Zuo
- Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, China. .,Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, 450001, China.
| |
Collapse
|