1
|
Cheng H, Xu L, Jia C. Characterization of double-stranded RNA and its silencing efficiency for insects using hybrid deep-learning framework. Brief Funct Genomics 2024; 23:858-865. [PMID: 38912767 DOI: 10.1093/bfgp/elae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/24/2024] [Accepted: 06/06/2024] [Indexed: 06/25/2024] Open
Abstract
RNA interference (RNAi) technology is widely used in the biological prevention and control of terrestrial insects. One of the main factors with the application of RNAi in insects is the difference in RNAi efficiency, which may vary not only in different insects, but also in different genes of the same insect, and even in different double-stranded RNAs (dsRNAs) of the same gene. This work focuses on the last question and establishes a bioinformatics software that can help researchers screen for the most efficient dsRNA targeting target genes. Among insects, the red flour beetle (Tribolium castaneum) is known to be one of the most sensitive to RNAi. From iBeetle-Base, we extracted 12 027 efficient dsRNA sequences with a lethality rate of ≥20% or with experimentation-induced phenotypic changes and processed these data to correspond to specific silence efficiency. Based on the first complied novel benchmark dataset, we specifically designed a deep neural network to identify and characterize efficient dsRNA for RNAi in insects. The dna2vec word embedding model was trained to extract distributed feature representations, and three powerful modules, namely convolutional neural network, bidirectional long short-term memory network, and self-attention mechanism, were integrated to form our predictor model to characterize the extracted dsRNAs and their silencing efficiencies for T. castaneum. Our model dsRNAPredictor showed reliable performance in multiple independent tests based on different species, including both T. castaneum and Aedes aegypti. This indicates that dsRNAPredictor can facilitate prescreening for designing high-efficiency dsRNA targeting target genes of insects in advance.
Collapse
Affiliation(s)
- Han Cheng
- Mathematics Department of the School of Science, Dalian Maritime University, Dalian 116026, China
| | - Liping Xu
- Mathematics Department of the School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cangzhi Jia
- Mathematics Department of the School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Du C, Fan W, Zhou Y. Integrated Biochemical and Computational Methods for Deciphering RNA-Processing Codes. WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1875. [PMID: 39523464 DOI: 10.1002/wrna.1875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 09/23/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
RNA processing involves steps such as capping, splicing, polyadenylation, modification, and nuclear export. These steps are essential for transforming genetic information in DNA into proteins and contribute to RNA diversity and complexity. Many biochemical methods have been developed to profile and quantify RNAs, as well as to identify the interactions between RNAs and RNA-binding proteins (RBPs), especially when coupled with high-throughput sequencing technologies. With the rapid accumulation of diverse data, it is crucial to develop computational methods to convert the big data into biological knowledge. In particular, machine learning and deep learning models are commonly utilized to learn the rules or codes governing the transformation from DNA sequences to intriguing RNAs based on manually designed or automatically extracted features. When precise enough, the RNA codes can be incredibly useful for predicting RNA products, decoding the molecular mechanisms, forecasting the impact of disease variants on RNA processing events, and identifying driver mutations. In this review, we systematically summarize the biochemical and computational methods for deciphering five important RNA codes related to alternative splicing, alternative polyadenylation, RNA localization, RNA modifications, and RBP binding. For each code, we review the main types of experimental methods used to generate training data, as well as the key features, strategic model structures, and advantages of representative tools. We also discuss the challenges encountered in developing predictive models using large language models and extensive domain knowledge. Additionally, we highlight useful resources and propose ways to improve computational tools for studying RNA codes.
Collapse
Affiliation(s)
- Chen Du
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
| | - Weiliang Fan
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
| | - Yu Zhou
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Wuhan University, Wuhan, China
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, China
- State Key Laboratory of Virology, Wuhan University, Wuhan, China
| |
Collapse
|
3
|
Chen Y, Sheng G, Wang G. CapsNet-TIS: Predicting translation initiation site based on multi-feature fusion and improved capsule network. Gene 2024; 924:148598. [PMID: 38782224 DOI: 10.1016/j.gene.2024.148598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/22/2024] [Accepted: 05/20/2024] [Indexed: 05/25/2024]
Abstract
Genes are the basic units of protein synthesis in organisms, and accurately identifying the translation initiation site (TIS) of genes is crucial for understanding the regulation, transcription, and translation processes of genes. However, the existing models cannot adequately extract the feature information in TIS sequences, and they also inadequately capture the complex hierarchical relationships among features. Therefore, a novel predictor named CapsNet-TIS is proposed in this paper. CapsNet-TIS first fully extracts the TIS sequence information using four encoding methods, including One-hot encoding, physical structure property (PSP) encoding, nucleotide chemical property (NCP) encoding, and nucleotide density (ND) encoding. Next, multi-scale convolutional neural networks are used to perform feature fusion of the encoded features to enhance the comprehensiveness of the feature representation. Finally, the fused features are classified using capsule network as the main network of the classification model to capture the complex hierarchical relationships among the features. Moreover, we improve the capsule network by introducing residual block, channel attention, and BiLSTM to enhance the model's feature extraction and sequence data modeling capabilities. In this paper, the performance of CapsNet-TIS is evaluated using TIS datasets from four species: human, mouse, bovine, and fruit fly, and the effectiveness of each part is demonstrated by performing ablation experiments. By comparing the experimental results with models proposed by other researchers, the results demonstrate the superior performance of CapsNet-TIS.
Collapse
Affiliation(s)
- Yu Chen
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Guojun Sheng
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Gang Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
4
|
Alagarswamy K, Shi W, Boini A, Messaoudi N, Grasso V, Cattabiani T, Turner B, Croner R, Kahlert UD, Gumbs A. Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review. BIOMEDINFORMATICS 2024; 4:1757-1772. [DOI: 10.3390/biomedinformatics4030096] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
In this scoping review, we delve into the transformative potential of artificial intelligence (AI) in addressing challenges inherent in whole-genome sequencing (WGS) analysis, with a specific focus on its implications in oncology. Unveiling the limitations of existing sequencing technologies, the review illuminates how AI-powered methods emerge as innovative solutions to surmount these obstacles. The evolution of DNA sequencing technologies, progressing from Sanger sequencing to next-generation sequencing, sets the backdrop for AI’s emergence as a potent ally in processing and analyzing the voluminous genomic data generated. Particularly, deep learning methods play a pivotal role in extracting knowledge and discerning patterns from the vast landscape of genomic information. In the context of oncology, AI-powered methods exhibit considerable potential across diverse facets of WGS analysis, including variant calling, structural variation identification, and pharmacogenomic analysis. This review underscores the significance of multimodal approaches in diagnoses and therapies, highlighting the importance of ongoing research and development in AI-powered WGS techniques. Integrating AI into the analytical framework empowers scientists and clinicians to unravel the intricate interplay of genomics within the realm of multi-omics research, paving the way for more successful personalized and targeted treatments.
Collapse
Affiliation(s)
| | - Wenjie Shi
- Department of General-, Visceral-, Vascular and Transplantation Surgery, University of Magdeburg, Haus 60a, Leipziger Str. 44, 39120 Magdeburg, Germany
| | - Aishwarya Boini
- Davao Medical School Foundation, Davao City 8000, Philippines
| | - Nouredin Messaoudi
- Department of Hepatopancreatobiliary Surgery, Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Europe Hospitals, 1090 Brussels, Belgium
| | - Vincent Grasso
- Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA
| | | | | | - Roland Croner
- Department of General-, Visceral-, Vascular and Transplantation Surgery, University of Magdeburg, Haus 60a, Leipziger Str. 44, 39120 Magdeburg, Germany
| | - Ulf D. Kahlert
- Department of General-, Visceral-, Vascular and Transplantation Surgery, University of Magdeburg, Haus 60a, Leipziger Str. 44, 39120 Magdeburg, Germany
| | - Andrew Gumbs
- Department of General-, Visceral-, Vascular and Transplantation Surgery, University of Magdeburg, Haus 60a, Leipziger Str. 44, 39120 Magdeburg, Germany
- Talos Surgical, Inc., New Castle, DE 19720, USA
- Department of Surgery, American Hospital of Tbilisi, 0102 Tbilisi, Georgia
| |
Collapse
|
5
|
Fan X, Chang T, Chen C, Hafner M, Wang Z. Analysis of RNA translation with a deep learning architecture provides new insight into translation control. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.08.548206. [PMID: 39005319 PMCID: PMC11244891 DOI: 10.1101/2023.07.08.548206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Accurate annotation of coding regions in RNAs is essential for understanding gene translation. We developed a deep neural network to directly predict and analyze translation initiation and termination sites from RNA sequences. Trained with human transcripts, our model learned hidden rules of translation control and achieved a near perfect prediction of canonical translation sites across entire human transcriptome. Surprisingly, this model revealed a new role of codon usage in regulating translation termination, which was experimentally validated. We also identified thousands of new open reading frames in mRNAs or lncRNAs, some of which were confirmed experimentally. The model trained with human mRNAs achieved high prediction accuracy of canonical translation sites in all eukaryotes and good prediction in polycistronic transcripts from prokaryotes or RNA viruses, suggesting a high degree of conservation in translation control. Collectively, we present a general and efficient deep learning model for RNA translation, generating new insights into the complexity of translation regulation.
Collapse
Affiliation(s)
- Xiaojuan Fan
- Bio-med Big Data Center, CAS Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD, USA
| | - Tiangen Chang
- Laboratory of Cancer Data Science, National Cancer Institute, Bethesda, MD, USA
| | - Chuyun Chen
- Bio-med Big Data Center, CAS Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Markus Hafner
- RNA Molecular Biology Laboratory, National Institute of Arthritis and Musculoskeletal and Skin Disease, Bethesda, MD, USA
| | - Zefeng Wang
- Bio-med Big Data Center, CAS Key Laboratory of Computational Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Nutrition and Health
- University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
6
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
7
|
Guo Y, Zhou D, Li P, Li C, Cao J. Context-Aware Poly(A) Signal Prediction Model via Deep Spatial-Temporal Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:8241-8253. [PMID: 37015693 DOI: 10.1109/tnnls.2022.3226301] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Polyadenylation [Poly(A)] is an essential process during messenger RNA (mRNA) maturation in biological eukaryote systems. Identifying Poly(A) signals (PASs) from the genome level is the key to understanding the mechanism of translation regulation and mRNA metabolism. In this work, we propose a deep dual-dynamic context-aware Poly(A) signal prediction model, called multiscale convolution with self-attention networks (MCANet), to adaptively uncover the spatial-temporal contextual dependence information. Specifically, the model automatically learns and strengthens informative features from the temporalwise and the spatialwise dimension. The identity connectivity performs contextual feature maps of Poly(A) data by direct connections from previous layers to subsequent layers. Then, a fully parametric rectified linear unit (FP-RELU) with dual-dynamic coefficients is devised to make the training of the model easier and enhance the generalization ability. A cross-entropy loss (CL) function is designed to make the model focus on samples that are easy to misclassify. Experiments on different Poly(A) signals demonstrate the superior performance of the proposed MCANet, and an ablation study shows the effectiveness of the network design for the feature learning and prediction of Poly(A) signals.
Collapse
|
8
|
Liu X, Zhang H, Zeng Y, Zhu X, Zhu L, Fu J. DRANetSplicer: A Splice Site Prediction Model Based on Deep Residual Attention Networks. Genes (Basel) 2024; 15:404. [PMID: 38674339 PMCID: PMC11048956 DOI: 10.3390/genes15040404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/28/2024] Open
Abstract
The precise identification of splice sites is essential for unraveling the structure and function of genes, constituting a pivotal step in the gene annotation process. In this study, we developed a novel deep learning model, DRANetSplicer, that integrates residual learning and attention mechanisms for enhanced accuracy in capturing the intricate features of splice sites. We constructed multiple datasets using the most recent versions of genomic data from three different organisms, Oryza sativa japonica, Arabidopsis thaliana and Homo sapiens. This approach allows us to train models with a richer set of high-quality data. DRANetSplicer outperformed benchmark methods on donor and acceptor splice site datasets, achieving an average accuracy of (96.57%, 95.82%) across the three organisms. Comparative analyses with benchmark methods, including SpliceFinder, Splice2Deep, Deep Splicer, EnsembleSplice, and DNABERT, revealed DRANetSplicer's superior predictive performance, resulting in at least a (4.2%, 11.6%) relative reduction in average error rate. We utilized the DRANetSplicer model trained on O. sativa japonica data to predict splice sites in A. thaliana, achieving accuracies for donor and acceptor sites of (94.89%, 94.25%). These results indicate that DRANetSplicer possesses excellent cross-organism predictive capabilities, with its performance in cross-organism predictions even surpassing that of benchmark methods in non-cross-organism predictions. Cross-organism validation showcased DRANetSplicer's excellence in predicting splice sites across similar organisms, supporting its applicability in gene annotation for understudied organisms. We employed multiple methods to visualize the decision-making process of the model. The visualization results indicate that DRANetSplicer can learn and interpret well-known biological features, further validating its overall performance. Our study systematically examined and confirmed the predictive ability of DRANetSplicer from various levels and perspectives, indicating that its practical application in gene annotation is justified.
Collapse
Affiliation(s)
- Xueyan Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Hongyan Zhang
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Ying Zeng
- School of Computer and Communication, Hunan Institute of Engineering, Xiangtan 411104, China;
| | - Xinghui Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Lei Zhu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| | - Jiahui Fu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China; (X.L.); (X.Z.); (L.Z.); (J.F.)
| |
Collapse
|
9
|
Clauwaert J, McVey Z, Gupta R, Menschaert G. TIS Transformer: remapping the human proteome using deep learning. NAR Genom Bioinform 2023; 5:lqad021. [PMID: 36879896 PMCID: PMC9985340 DOI: 10.1093/nargab/lqad021] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/20/2023] [Accepted: 02/14/2023] [Indexed: 03/07/2023] Open
Abstract
The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome.
Collapse
Affiliation(s)
- Jim Clauwaert
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Oost-Vlaanderen 9000, Belgium
| | - Zahra McVey
- Novo Nordisk Research Centre Oxford, Novo Nordisk Ltd., Crawley, South East England, RH6 0PA, UK
| | - Ramneek Gupta
- Novo Nordisk Research Centre Oxford, Novo Nordisk Ltd., Crawley, South East England, RH6 0PA, UK
| | - Gerben Menschaert
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Oost-Vlaanderen 9000, Belgium
| |
Collapse
|
10
|
Beknazarov N, Poptsova M. DeepZ: A Deep Learning Approach for Z-DNA Prediction. Methods Mol Biol 2023; 2651:217-226. [PMID: 36892770 DOI: 10.1007/978-1-0716-3084-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Abstract
Here we describe an approach that uses deep learning neural networks such as CNN and RNN to aggregate information from DNA sequence; physical, chemical, and structural properties of nucleotides; and omics data on histone modifications, methylation, chromatin accessibility, and transcription factor binding sites and data from other available NGS experiments. We explain how with the trained model one can perform whole-genome annotation of Z-DNA regions and feature importance analysis in order to define key determinants for functional Z-DNA regions.
Collapse
Affiliation(s)
- Nazar Beknazarov
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.
| |
Collapse
|
11
|
Guo Y, Shen H, Li W, Li C, Jin C. Deep Effective k-mer representation learning for polyadenylation signal prediction via co-occurrence embedding. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Ye W, Lian Q, Ye C, Wu X. A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00121-8. [PMID: 36167284 PMCID: PMC10372920 DOI: 10.1016/j.gpb.2022.09.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 08/17/2022] [Accepted: 09/19/2022] [Indexed: 05/08/2023]
Abstract
Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3' untranslated region, tissue-specific, cross-species, and single-cell pA prediction.
Collapse
Affiliation(s)
- Wenbin Ye
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China
| | - Qiwei Lian
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China; Department of Automation, Xiamen University, Xiamen 361005, China
| | - Congting Ye
- Key Laboratory of the Coastal and Wetland Ecosystems, Ministry of Education, College of the Environment and Ecology, Xiamen University, Xiamen 361005, China
| | - Xiaohui Wu
- Pasteurien College, Suzhou Medical College of Soochow University, Soochow University, Suzhou 215000, China.
| |
Collapse
|
13
|
Liu Q, Fang H, Wang X, Wang M, Li S, Coin LJM, Li F, Song J. DeepGenGrep: a general deep learning-based predictor for multiple genomic signals and regions. Bioinformatics 2022; 38:4053-4061. [PMID: 35799358 DOI: 10.1093/bioinformatics/btac454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/11/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identify GSRs as the performance of most existing approaches is unsatisfactory. As such, it is highly desirable to develop more accurate computational methods for GSRs prediction. RESULTS In this study, we propose a general deep learning framework termed DeepGenGrep, a general predictor for the systematic identification of multiple different GSRs from genomic DNA sequences. DeepGenGrep leverages the power of hybrid neural networks comprising a three-layer convolutional neural network and a two-layer long short-term memory to effectively learn useful feature representations from sequences. Benchmarking experiments demonstrate that DeepGenGrep outperforms several state-of-the-art approaches on identifying polyadenylation signals, translation initiation sites and splice sites across four eukaryotic species including Homo sapiens, Mus musculus, Bos taurus and Drosophila melanogaster. Overall, DeepGenGrep represents a useful tool for the high-throughput and cost-effective identification of potential GSRs in eukaryotic genomes. AVAILABILITY AND IMPLEMENTATION The webserver and source code are freely available at http://bigdata.biocie.cn/deepgengrep/home and Github (https://github.com/wx-cie/DeepGenGrep/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Honglin Fang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Xiao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Miao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Shuqin Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Lachlan J M Coin
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Fuyi Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
14
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
15
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
16
|
Context-aware dynamic neural computational models for accurate Poly(A) signal prediction. Neural Netw 2022; 152:287-299. [DOI: 10.1016/j.neunet.2022.04.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/03/2022] [Accepted: 04/22/2022] [Indexed: 11/21/2022]
|
17
|
Prokaryotic and eukaryotic promoters identification based on residual network transfer learning. Bioprocess Biosyst Eng 2022; 45:955-967. [DOI: 10.1007/s00449-022-02716-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/27/2022] [Indexed: 11/26/2022]
|
18
|
Jankovic B, Gojobori T. From shallow to deep: some lessons learned from application of machine learning for recognition of functional genomic elements in human genome. Hum Genomics 2022; 16:7. [PMID: 35180894 PMCID: PMC8855580 DOI: 10.1186/s40246-022-00376-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/02/2022] [Indexed: 11/25/2022] Open
Abstract
Identification of genomic signals as indicators for functional genomic elements is one of the areas that received early and widespread application of machine learning methods. With time, the methods applied grew in variety and generally exhibited a tendency to improve their ability to identify some major genomic and transcriptomics signals. The evolution of machine learning in genomics followed a similar path to applications of machine learning in other fields. These were impacted in a major way by three dominant developments, namely an enormous increase in availability and quality of data, a significant increase in computational power available to machine learning applications, and finally, new machine learning paradigms, of which deep learning is the most well-known example. It is not easy in general to distinguish factors leading to improvements in results of applications of machine learning. This is even more so in the field of genomics, where the advent of next-generation sequencing and the increased ability to perform functional analysis of raw data have had a major effect on the applicability of machine learning in OMICS fields. In this paper, we survey the results from a subset of published work in application of machine learning in the recognition of genomic signals and regions in human genome and summarize some lessons learnt from this endeavor. There is no doubt that a significant progress has been made both in terms of accuracy and reliability of models. Questions remain however whether the progress has been sufficient and what these developments bring to the field of genomics in general and human genomics in particular. Improving usability, interpretability and accuracy of models remains an important open challenge for current and future research in application of machine learning and more generally of artificial intelligence methods in genomics.
Collapse
Affiliation(s)
- Boris Jankovic
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Takashi Gojobori
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. .,Division of Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
| |
Collapse
|
19
|
Artificial Intelligence and Cardiovascular Genetics. Life (Basel) 2022; 12:life12020279. [PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/26/2022] [Accepted: 02/09/2022] [Indexed: 12/13/2022] Open
Abstract
Polygenic diseases, which are genetic disorders caused by the combined action of multiple genes, pose unique and significant challenges for the diagnosis and management of affected patients. A major goal of cardiovascular medicine has been to understand how genetic variation leads to the clinical heterogeneity seen in polygenic cardiovascular diseases (CVDs). Recent advances and emerging technologies in artificial intelligence (AI), coupled with the ever-increasing availability of next generation sequencing (NGS) technologies, now provide researchers with unprecedented possibilities for dynamic and complex biological genomic analyses. Combining these technologies may lead to a deeper understanding of heterogeneous polygenic CVDs, better prognostic guidance, and, ultimately, greater personalized medicine. Advances will likely be achieved through increasingly frequent and robust genomic characterization of patients, as well the integration of genomic data with other clinical data, such as cardiac imaging, coronary angiography, and clinical biomarkers. This review discusses the current opportunities and limitations of genomics; provides a brief overview of AI; and identifies the current applications, limitations, and future directions of AI in genomics.
Collapse
|
20
|
Guo Y, Zhou D, Cao J, Nie R, Ruan X, Liu Y. Gated residual neural networks with self-normalization for translation initiation site recognition. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107783] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
21
|
Li F, Dong S, Leier A, Han M, Guo X, Xu J, Wang X, Pan S, Jia C, Zhang Y, Webb GI, Coin LJM, Li C, Song J. Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Brief Bioinform 2021; 23:6415313. [PMID: 34729589 DOI: 10.1093/bib/bbab461] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/27/2021] [Accepted: 10/07/2021] [Indexed: 12/14/2022] Open
Abstract
Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.
Collapse
Affiliation(s)
- Fuyi Li
- Monash University, Australia
| | | | - André Leier
- Department of Genetics, UAB School of Medicine, USA
| | - Meiya Han
- Department of Biochemistry and Molecular Biology, Monash University, Australia
| | | | - Jing Xu
- Computer Science and Technology from Nankai University, China
| | - Xiaoyu Wang
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Australia
| | - Shirui Pan
- University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Cangzhi Jia
- College of Science, Dalian Maritime University, Australia
| | - Yang Zhang
- Northwestern Polytechnical University, China
| | - Geoffrey I Webb
- Faculty of Information Technology at Monash University, Australia
| | - Lachlan J M Coin
- Department of Clinical Pathology, University of Melbourne, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry of Molecular Biology, Monash University, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash University, Melbourne, Australia
| |
Collapse
|
22
|
Umarov R, Li Y, Arakawa T, Takizawa S, Gao X, Arner E. ReFeaFi: Genome-wide prediction of regulatory elements driving transcription initiation. PLoS Comput Biol 2021; 17:e1009376. [PMID: 34491989 PMCID: PMC8448322 DOI: 10.1371/journal.pcbi.1009376] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/17/2021] [Accepted: 08/23/2021] [Indexed: 11/19/2022] Open
Abstract
Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.
Collapse
Affiliation(s)
- Ramzan Umarov
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- * E-mail: (RU); (XG); (EA)
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong, People’s Republic of China
| | - Takahiro Arakawa
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Satoshi Takizawa
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Xin Gao
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, Thuwal, Saudi Arabia
- * E-mail: (RU); (XG); (EA)
| | - Erik Arner
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
- Laboratory for Applied Regulatory Genomics Network Analysis, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail: (RU); (XG); (EA)
| |
Collapse
|
23
|
Arabic Sentiment Analysis Using Deep Learning and Ensemble Methods. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05475-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
24
|
Guo Y, Zhou D, Li W, Cao J, Nie R, Xiong L, Ruan X. Identifying polyadenylation signals with biological embedding via self-attentive gated convolutional highway networks. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107133] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
25
|
Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics 2021; 37:2556-2562. [PMID: 33638635 DOI: 10.1093/bioinformatics/btab133] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/08/2021] [Accepted: 02/24/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The identification of bitter peptides through experimental approaches is an expensive and time-consuming endeavor. Due to the huge number of newly available peptide sequences in the post-genomic era, the development of automated computational models for the identification of novel bitter peptides is highly desira-ble. RESULTS In this work, we present BERT4Bitter, a bidirectional encoder representation from transformers (BERT)-based model for predicting bitter peptides directly from their amino acid sequence without using any structural information. To the best of our knowledge, this is the first time a BERT-based model has been employed to identify bitter peptides. Compared to widely used machine learning models, BERT4Bitter achieved the best performance with accuracy of 0.861 and 0.922 for cross-validation and independent tests, respectively. Furthermore, extensive empirical benchmarking experiments on the independent dataset demonstrated that BERT4Bitter clearly outperformed the existing method with improvements of > 8% accuracy and >16% Matthews coefficient correlation, highlighting the effectiveness and robustness of BERT4Bitter. We believe that the BERT4Bitter method proposed herein will be a useful tool for rapidly screening and identifying novel bitter peptides for drug development and nutritional research. AVAILABILITY The user-friendly web server of the proposed BERT4Bitter is freely accessible at: http://pmlab.pythonanywhere.com/BERT4Bitter. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | | | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
26
|
Bhandari N, Khare S, Walambe R, Kotecha K. Comparison of machine learning and deep learning techniques in promoter prediction across diverse species. PeerJ Comput Sci 2021; 7:e365. [PMID: 33817015 PMCID: PMC7959599 DOI: 10.7717/peerj-cs.365] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/30/2020] [Indexed: 06/12/2023]
Abstract
Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH, India
| | - Satyajeet Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, MH, India
| | - Rahee Walambe
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, Maharashtra, India
- Electronics and Telecommunication Dept, Symbiosis Institute of Technology, Pune, Maharashtra, India
| | - Ketan Kotecha
- Computer Science, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, MH, India
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, Maharashtra, India
| |
Collapse
|
27
|
Chen C, Hou J, Shi X, Yang H, Birchler JA, Cheng J. DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinformatics 2021; 22:38. [PMID: 33522898 PMCID: PMC7852092 DOI: 10.1186/s12859-020-03952-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 12/29/2020] [Indexed: 12/21/2022] Open
Abstract
Background Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors. Results In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN. Conclusions DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.
Collapse
Affiliation(s)
- Chen Chen
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Xiaowen Shi
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Hua Yang
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - James A Birchler
- Division of Biological Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
28
|
Kong L, Chen Y, Xu F, Xu M, Li Z, Fang J, Zhang L, Pian C. Mining influential genes based on deep learning. BMC Bioinformatics 2021; 22:27. [PMID: 33482718 PMCID: PMC7821411 DOI: 10.1186/s12859-021-03972-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 01/15/2021] [Indexed: 11/17/2022] Open
Abstract
Background Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. Results Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. Conclusions We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.
Collapse
Affiliation(s)
- Lingpeng Kong
- College of Agriculture, Nanjing Agricultural University, Jiangsu, 210095, Nanjing, China
| | - Yuanyuan Chen
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Fengjiao Xu
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Mingmin Xu
- College of Agriculture, Nanjing Agricultural University, Jiangsu, 210095, Nanjing, China
| | - Zutan Li
- College of Agriculture, Nanjing Agricultural University, Jiangsu, 210095, Nanjing, China
| | - Jingya Fang
- College of Agriculture, Nanjing Agricultural University, Jiangsu, 210095, Nanjing, China
| | - Liangyun Zhang
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| | - Cong Pian
- Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China.
| |
Collapse
|
29
|
Alam T, Al-Absi HRH, Schmeier S. Deep Learning in LncRNAome: Contribution, Challenges, and Perspectives. Noncoding RNA 2020; 6:E47. [PMID: 33266128 PMCID: PMC7711891 DOI: 10.3390/ncrna6040047] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 10/27/2020] [Accepted: 11/06/2020] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNA), the pervasively transcribed part of the mammalian genome, have played a significant role in changing our protein-centric view of genomes. The abundance of lncRNAs and their diverse roles across cell types have opened numerous avenues for the research community regarding lncRNAome. To discover and understand lncRNAome, many sophisticated computational techniques have been leveraged. Recently, deep learning (DL)-based modeling techniques have been successfully used in genomics due to their capacity to handle large amounts of data and produce relatively better results than traditional machine learning (ML) models. DL-based modeling techniques have now become a choice for many modeling tasks in the field of lncRNAome as well. In this review article, we summarized the contribution of DL-based methods in nine different lncRNAome research areas. We also outlined DL-based techniques leveraged in lncRNAome, highlighting the challenges computational scientists face while developing DL-based models for lncRNAome. To the best of our knowledge, this is the first review article that summarizes the role of DL-based techniques in multiple areas of lncRNAome.
Collapse
Affiliation(s)
- Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Hamada R. H. Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Sebastian Schmeier
- School of Natural and Computational Sciences, Massey University, Auckland 0632, New Zealand;
| |
Collapse
|
30
|
Zheng Y, Wang H, Zhang Y, Gao X, Xing EP, Xu M. Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species. PLoS Comput Biol 2020; 16:e1008297. [PMID: 33151940 PMCID: PMC7671507 DOI: 10.1371/journal.pcbi.1008297] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 11/17/2020] [Accepted: 08/30/2020] [Indexed: 11/19/2022] Open
Abstract
In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.
Collapse
Affiliation(s)
- Yumin Zheng
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| | - Haohan Wang
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Eric P. Xing
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Min Xu
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
31
|
Yu H, Dai Z. SANPolyA: a deep learning method for identifying Poly(A) signals. Bioinformatics 2020; 36:2393-2400. [PMID: 31904817 DOI: 10.1093/bioinformatics/btz970] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 12/05/2019] [Accepted: 01/01/2020] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Polyadenylation plays a regulatory role in transcription. The recognition of polyadenylation signal (PAS) motif sequence is an important step in polyadenylation. In the past few years, some statistical machine learning-based and deep learning-based methods have been proposed for PAS identification. Although these methods predict PAS with success, there is room for their improvement on PAS identification. RESULTS In this study, we proposed a deep neural network-based computational method, called SANPolyA, for identifying PAS in human and mouse genomes. SANPolyA requires no manually crafted sequence features. We compared our method SANPolyA with several previous PAS identification methods on several PAS benchmark datasets. Our results showed that SANPolyA outperforms the state-of-art methods. SANPolyA also showed good performance on leave-one-motif-out evaluation. AVAILABILITY AND IMPLEMENTATION https://github.com/yuht4/SANPolyA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Zhiming Dai
- School of Data and Computer Science.,Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-Sen University, Guangzhou 510006, China
| |
Collapse
|
32
|
Kwon MS, Lee BT, Lee SY, Kim HU. Modeling regulatory networks using machine learning for systems metabolic engineering. Curr Opin Biotechnol 2020; 65:163-170. [DOI: 10.1016/j.copbio.2020.02.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 12/18/2022]
|
33
|
Albaradei S, Magana-Mora A, Thafar M, Uludag M, Bajic VB, Gojobori T, Essack M, Jankovic BR. Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA. Gene 2020; 763S:100035. [PMID: 32550561 PMCID: PMC7285987 DOI: 10.1016/j.gene.2020.100035] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 05/06/2020] [Indexed: 12/21/2022]
Abstract
Background The accurate identification of the exon/intron boundaries is critical for the correct annotation of genes with multiple exons. Donor and acceptor splice sites (SS) demarcate these boundaries. Therefore, deriving accurate computational models to predict the SS are useful for functional annotation of genes and genomes, and for finding alternative SS associated with different diseases. Although various models have been proposed for the in silico prediction of SS, improving their accuracy is required for reliable annotation. Moreover, models are often derived and tested using the same genome, providing no evidence of broad application, i.e. to other poorly studied genomes. Results With this in mind, we developed the Splice2Deep models for SS detection. Each model is an ensemble of deep convolutional neural networks. We evaluated the performance of the models based on the ability to detect SS in Homo sapiens, Oryza sativa japonica, Arabidopsis thaliana, Drosophila melanogaster, and Caenorhabditis elegans. Results demonstrate that the models efficiently detect SS in other organisms not considered during the training of the models. Compared to the state-of-the-art tools, Splice2Deep models achieved significantly reduced average error rates of 41.97% and 28.51% for acceptor and donor SS, respectively. Moreover, the Splice2Deep cross-organism validation demonstrates that models correctly identify conserved genomic elements enabling annotation of SS in new genomes by choosing the taxonomically closest model. Conclusions The results of our study demonstrated that Splice2Deep both achieved a considerably reduced error rate compared to other state-of-the-art models and the ability to accurately recognize SS in other organisms for which the model was not trained, enabling annotation of poorly studied or newly sequenced genomes. Splice2Deep models are implemented in Python using Keras API; the models and the data are available at https://github.com/SomayahAlbaradei/Splice_Deep.git.
Collapse
Key Words
- AUC, area under curve
- AcSS, acceptor splice site
- Acc, accuracy
- Bioinformatics
- CNN, convolutional neural network
- CONV, convolutional layers
- DL, deep learning
- DNA, deoxyribonucleic acid
- DT, decision trees
- Deep-learning
- DoSS, donor splice site
- FC, fully connected layer
- ML, machine learning
- NB, naive Bayes
- NN, neural network
- POOL, pooling layer
- Prediction
- RF, random forest
- RNA, ribonucleic acid
- ReLU, rectified linear unit layer
- SS, splice site
- SVM, support vector machine
- Sn, sensitivity
- Sp, specificity
- Splice sites
- Splicing
Collapse
Affiliation(s)
- Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Faculty of Computing and Information Technology, King Abdulaziz University, Saudi Arabia
| | - Arturo Magana-Mora
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Saudi Aramco, EXPEC-ARC, Drilling Technology Team, Dhahran 31311, Saudi Arabia
| | - Maha Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Faculty of Computers and Information Systems, Taif University, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.,Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Boris R Jankovic
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
34
|
Thafar M, Raies AB, Albaradei S, Essack M, Bajic VB. Comparison Study of Computational Prediction Tools for Drug-Target Binding Affinities. Front Chem 2019; 7:782. [PMID: 31824921 PMCID: PMC6879652 DOI: 10.3389/fchem.2019.00782] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 10/30/2019] [Indexed: 12/30/2022] Open
Abstract
The drug development is generally arduous, costly, and success rates are low. Thus, the identification of drug-target interactions (DTIs) has become a crucial step in early stages of drug discovery. Consequently, developing computational approaches capable of identifying potential DTIs with minimum error rate are increasingly being pursued. These computational approaches aim to narrow down the search space for novel DTIs and shed light on drug functioning context. Most methods developed to date use binary classification to predict if the interaction between a drug and its target exists or not. However, it is more informative but also more challenging to predict the strength of the binding between a drug and its target. If that strength is not sufficiently strong, such DTI may not be useful. Therefore, the methods developed to predict drug-target binding affinities (DTBA) are of great value. In this study, we provide a comprehensive overview of the existing methods that predict DTBA. We focus on the methods developed using artificial intelligence (AI), machine learning (ML), and deep learning (DL) approaches, as well as related benchmark datasets and databases. Furthermore, guidance and recommendations are provided that cover the gaps and directions of the upcoming work in this research area. To the best of our knowledge, this is the first comprehensive comparison analysis of tools focused on DTBA with reference to AI/ML/DL.
Collapse
Affiliation(s)
- Maha Thafar
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Arwa Bin Raies
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Vladimir B. Bajic
- Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
35
|
Albalawi F, Chahid A, Guo X, Albaradei S, Magana-Mora A, Jankovic BR, Uludag M, Van Neste C, Essack M, Laleg-Kirati TM, Bajic VB. Hybrid model for efficient prediction of poly(A) signals in human genomic DNA. Methods 2019; 166:31-39. [PMID: 30991099 DOI: 10.1016/j.ymeth.2019.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/12/2019] [Accepted: 04/01/2019] [Indexed: 12/15/2022] Open
Abstract
Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers. In this study, we developed a novel method for PAS recognition. The method is implemented in a hybrid PAS recognition model (HybPAS), which is based on deep neural networks (DNNs) and logistic regression models (LRMs). One of such models is developed for each of the 12 most frequent human PAS hexamers. DNN models appeared the best for eight PAS types (including the two most frequent PAS hexamers), while LRM appeared best for the remaining four PAS types. The new models use different combinations of signal processing-based, statistical, and sequence-based features as input. The results obtained on human genomic data show that HybPAS outperforms the well-tuned state-of-the-art Omni-PolyA models, reducing the classification error for different PAS hexamers by up to 57.35% for 10 out of 12 PAS types, with Omni-PolyA models being better for two PAS types. For the most frequent PAS types, 'AATAAA' and 'ATTAAA', HybPAS reduced the error rate by 35.14% and 34.48%, respectively. On average, HybPAS reduces the error by 30.29%. HybPAS is implemented partly in Python and in MATLAB available at https://github.com/EMANG-KAUST/PolyA_Prediction_LRM_DNN.
Collapse
Affiliation(s)
- Fahad Albalawi
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Taif University, Electrical Engineering, Taif 21944, Saudi Arabia
| | - Abderrazak Chahid
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Xingang Guo
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Somayah Albaradei
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Arturo Magana-Mora
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Saudi Aramco, EXPEC-ARC, Drilling Technology Team, Dhahran 31311, Saudi Arabia
| | - Boris R Jankovic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia; Ghent University, Center for Medical Genetics Ghent (CMGG), B-9000 Ghent, Belgium
| | - Magbubah Essack
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia
| | - Taous-Meriem Laleg-Kirati
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia.
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology, Computational Bioscience Research Center, Thuwal 23955-6900, Saudi Arabia.
| |
Collapse
|