1
|
Patrício A, Costa RS, Henriques R. Pattern-centric transformation of omics data grounded on discriminative gene associations aids predictive tasks in TCGA while ensuring interpretability. Biotechnol Bioeng 2024. [PMID: 38859573 DOI: 10.1002/bit.28758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/07/2024] [Accepted: 05/18/2024] [Indexed: 06/12/2024]
Abstract
The increasing prevalence of omics data sources is pushing the study of regulatory mechanisms underlying complex diseases such as cancer. However, the vast quantities of molecular features produced and the inherent interplay between them lead to a level of complexity that hampers both descriptive and predictive tasks, requiring custom-built algorithms that can extract relevant information from these sources of data. We propose a transformation that moves data centered on molecules (e.g., transcripts and proteins) to a new data space focused on putative regulatory modules given by statistically relevant co-expression patterns. To this end, the proposed transformation extracts patterns from the data through biclustering and uses them to create new variables with guarantees of interpretability and discriminative power. The transformation is shown to achieve dimensionality reductions of up to 99% and increase predictive performance of various classifiers across multiple omics layers. Results suggest that omics data transformations from gene-centric to pattern-centric data supports both prediction tasks and human interpretation, notably contributing to precision medicine applications.
Collapse
Affiliation(s)
- André Patrício
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | - Rafael S Costa
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, NOVA University Lisbon, Caparica, Portugal
| | - Rui Henriques
- INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
2
|
Li K, Wang Z, Zhou Y, Li S. Lung adenocarcinoma identification based on hybrid feature selections and attentional convolutional neural networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:2991-3015. [PMID: 38454716 DOI: 10.3934/mbe.2024133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Lung adenocarcinoma, a chronic non-small cell lung cancer, needs to be detected early. Tumor gene expression data analysis is effective for early detection, yet its challenges lie in a small sample size, high dimensionality, and multi-noise characteristics. In this study, we propose a lung adenocarcinoma convolutional neural network (LATCNN), a deep learning model tailored for accurate lung adenocarcinoma prediction and identification of key genes. During the feature selection stage, we introduce a hybrid algorithm. Initially, the fast correlation-based filter (FCBF) algorithm swiftly filters out irrelevant features, followed by applying the k-means-synthetic minority over-sampling technique (k-means-SMOTE) method to address category imbalance. Subsequently, we enhance the particle swarm optimization (PSO) algorithm by incorporating fast-decay dynamic inertia weights and utilizing the classification and regression tree (CART) as the fitness function for the second stage of feature selection, aiming to further eliminate redundant features. In the classifier construction stage, we present an attention convolutional neural network (atCNN) that incorporates an attention mechanism. This improved model conducts feature selection post lung adenocarcinoma gene expression data analysis for classification and prediction. The results show that LATCNN effectively reduces the feature dimensions and accurately identifies 12 key genes with accuracy, recall, F1 score, and MCC of 99.70%, 99.33%, 99.98%, and 98.67%, respectively. These performance metrics surpass those of other comparative models, highlighting the significance of this research for advancing lung adenocarcinoma treatment.
Collapse
Affiliation(s)
- Kunpeng Li
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| | - Zepeng Wang
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| | - Yu Zhou
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| | - Sihai Li
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| |
Collapse
|
3
|
Qumsiyeh E, Salah Z, Yousef M. miRGediNET: A comprehensive examination of common genes in miRNA-Target interactions and disease associations: Insights from a grouping-scoring-modeling approach. Heliyon 2023; 9:e22666. [PMID: 38090011 PMCID: PMC10711121 DOI: 10.1016/j.heliyon.2023.e22666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 11/15/2023] [Accepted: 11/16/2023] [Indexed: 06/15/2024] Open
Abstract
In the broad and complex field of biological data analysis, researchers frequently gather information from a single source or database. Despite being a widespread practice, this has disadvantages. Relying exclusively on a single source can limit our comprehension as it may omit various perspectives that could be obtained by combining multiple knowledge bases. Acknowledging this shortcoming, we report on miRGediNET, a novel approach combining information from three biological databases. Our investigation focuses on microRNAs (miRNAs), small non-coding RNA molecules that regulate gene expression post-transcriptionally. We delve deeply into the knowledge of these miRNA's interactions with genes and the possible effects these interactions may have on different diseases. The scientific community has long recognized a direct correlation between the progression of specific diseases and miRNAs, as well as the genes they target. By using miRGediNET, we go beyond simply acknowledging this relationship. Rather, we actively look for the critical genes that could act as links between the actions of miRNAs and the mechanisms underlying disease. Our methodology, which carefully identifies and investigates these important genes, is supported by a strategic framework that may open up new possibilities for comprehending diseases and creating treatments. We have developed a tool on the Knime platform as a concrete application of our research. This tool serves as both a validation of our study and an invitation to the larger community to interact with, investigate, and build upon our findings. miRGediNET is publicly accessible on GitHub at https://github.com/malikyousef/miRGediNET, providing a collaborative environment for additional research and innovation for enthusiasts and fellow researchers.
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Department of Computer Science and Information Technology, Al-Quds University, Palestine
| | - Zaidoun Salah
- Molecular Genetics and Genetic Toxicology, Arab American University, Ramallah, Palestine
| | - Malik Yousef
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine
| |
Collapse
|
4
|
Ersoz NS, Bakir-Gungor B, Yousef M. GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning. Front Genet 2023; 14:1139082. [PMID: 37671046 PMCID: PMC10476493 DOI: 10.3389/fgene.2023.1139082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 07/05/2023] [Indexed: 09/07/2023] Open
Abstract
Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product. Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype. Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model. Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
Collapse
Affiliation(s)
- Nur Sebnem Ersoz
- Department of Bioengineering, Graduate School of Engineering and Science, Abdullah Gul University, Kayseri, Türkiye
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Türkiye
- Department of Bioengineering, Faculty of Life and Natural Sciences, Abdullah Gul University, Kayseri, Türkiye
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| |
Collapse
|
5
|
Kuzudisli C, Bakir-Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ 2023; 11:e15666. [PMID: 37483989 PMCID: PMC10358338 DOI: 10.7717/peerj.15666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/08/2023] [Indexed: 07/25/2023] Open
Abstract
With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly-ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
Collapse
Affiliation(s)
- Cihan Kuzudisli
- Department of Computer Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Nurten Bulut
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Bahjat Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United States of America
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
6
|
Yousef M, Ozdemir F, Jaber A, Allmer J, Bakir-Gungor B. PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach. BMC Bioinformatics 2023; 24:60. [PMID: 36823571 PMCID: PMC9947447 DOI: 10.1186/s12859-023-05187-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 02/14/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel. .,Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
| | - Fatma Ozdemir
- grid.440414.10000 0004 0558 2628Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey ,grid.5570.70000 0004 0490 981XUniversity Institute of Digital Communication Systems, Ruhr-University, Bochum, Germany
| | - Amhar Jaber
- grid.440414.10000 0004 0558 2628Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Jens Allmer
- grid.454318.f0000 0004 0431 5034Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr, Germany
| | - Burcu Bakir-Gungor
- grid.440414.10000 0004 0558 2628Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
7
|
Chavan AR, Singh AK, Gupta RK, Nakhate SP, Poddar BJ, Gujar VV, Purohit HJ, Khardenavis AA. Recent trends in the biotechnology of functional non-digestible oligosaccharides with prebiotic potential. Biotechnol Genet Eng Rev 2023:1-46. [PMID: 36714949 DOI: 10.1080/02648725.2022.2152627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 11/13/2022] [Indexed: 01/31/2023]
Abstract
Prebiotics as a part of dietary nutrition can play a crucial role in structuring the composition and metabolic function of intestinal microbiota and can thus help in managing a clinical scenario by preventing diseases and/or improving health. Among the different prebiotics, non-digestible carbohydrates are molecules that selectively enrich a typical class of bacteria with probiotic potential. This review summarizes the current knowledge about the different aspects of prebiotics, such as its production, characterization and purification by various techniques, and its link to novel product development at an industrial scale for wide-scale use in diverse range of health management applications. Furthermore, the path to effective valorization of agricultural residues in prebiotic production has been elucidated. This review also discusses the recent developments in application of genomic tools in the area of prebiotics for providing new insights into the taxonomic characterization of gut microorganisms, and exploring their functional metabolic pathways for enzyme synthesis. However, the information regarding the cumulative effect of prebiotics with beneficial bacteria, their colonization and its direct influence through altered metabolic profile is still getting established. The future of this area lies in the designing of clinical condition specific functional foods taking into consideration the host genotypes, thus facilitating the creation of balanced and required metabolome and enabling to maintain the healthy status of the host.
Collapse
Affiliation(s)
- Atul Rajkumar Chavan
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Ashish Kumar Singh
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Rakesh Kumar Gupta
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Suraj Prabhakarrao Nakhate
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Bhagyashri Jagdishprasad Poddar
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Vaibhav Vilasrao Gujar
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- JoVE, Mumbai, India
| | - Hemant J Purohit
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
| | - Anshuman Arun Khardenavis
- Environmental Biotechnology and Genomics Division, CSIR-National Environmental Engineering Research Institute, Nagpur, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
8
|
Jabeer A, Temiz M, Bakir-Gungor B, Yousef M. miRdisNET: Discovering microRNA biomarkers that are associated with diseases utilizing biological knowledge-based machine learning. Front Genet 2023; 13:1076554. [PMID: 36712859 PMCID: PMC9877296 DOI: 10.3389/fgene.2022.1076554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/30/2022] [Indexed: 01/14/2023] Open
Abstract
During recent years, biological experiments and increasing evidence have shown that microRNAs play an important role in the diagnosis and treatment of human complex diseases. Therefore, to diagnose and treat human complex diseases, it is necessary to reveal the associations between a specific disease and related miRNAs. Although current computational models based on machine learning attempt to determine miRNA-disease associations, the accuracy of these models need to be improved, and candidate miRNA-disease relations need to be evaluated from a biological perspective. In this paper, we propose a computational model named miRdisNET to predict potential miRNA-disease associations. Specifically, miRdisNET requires two types of data, i.e., miRNA expression profiles and known disease-miRNA associations as input files. First, we generate subsets of specific diseases by applying the grouping component. These subsets contain miRNA expressions with class labels associated with each specific disease. Then, we assign an importance score to each group by using a machine learning method for classification. Finally, we apply a modeling component and obtain outputs. One of the most important outputs of miRdisNET is the performance of miRNA-disease prediction. Compared with the existing methods, miRdisNET obtained the highest AUC value of .9998. Another output of miRdisNET is a list of significant miRNAs for disease under study. The miRNAs identified by miRdisNET are validated via referring to the gold-standard databases which hold information on experimentally verified microRNA-disease associations. miRdisNET has been developed to predict candidate miRNAs for new diseases, where miRNA-disease relation is not yet known. In addition, miRdisNET presents candidate disease-disease associations based on shared miRNA knowledge. The miRdisNET tool and other supplementary files are publicly available at: https://github.com/malikyousef/miRdisNET.
Collapse
Affiliation(s)
- Amhar Jabeer
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Mustafa Temiz
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey,*Correspondence: Malik Yousef, ; Mustafa Temiz,
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel,Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel,*Correspondence: Malik Yousef, ; Mustafa Temiz,
| |
Collapse
|
9
|
Qumsiyeh E, Showe L, Yousef M. GediNET for discovering gene associations across diseases using knowledge based machine learning approach. Sci Rep 2022; 12:19955. [PMID: 36402891 PMCID: PMC9675776 DOI: 10.1038/s41598-022-24421-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022] Open
Abstract
The most common approaches to discovering genes associated with specific diseases are based on machine learning and use a variety of feature selection techniques to identify significant genes that can serve as biomarkers for a given disease. More recently, the integration in this process of prior knowledge-based approaches has shown significant promise in the discovery of new biomarkers with potential translational applications. In this study, we developed a novel approach, GediNET, that integrates prior biological knowledge to gene Groups that are shown to be associated with a specific disease such as a cancer. The novelty of GediNET is that it then also allows the discovery of significant associations between that specific disease and other diseases. The initial step in this process involves the identification of gene Groups. The Groups are then subjected to a Scoring component to identify the top performing classification Groups. The top-ranked gene Groups are then used to train a Machine Learning Model. The process of Grouping, Scoring and Modelling (G-S-M) is used by GediNET to identify other diseases that are similarly associated with this signature. GediNET identifies these relationships through Disease-Disease Association (DDA) based machine learning. DDA explores novel associations between diseases and identifies relationships which could be used to further improve approaches to diagnosis, prognosis, and treatment. The GediNET KNIME workflow can be downloaded from: https://github.com/malikyousef/GediNET.git or https://kni.me/w/3kH1SQV_mMUsMTS .
Collapse
Affiliation(s)
- Emma Qumsiyeh
- Information Technology Engineering, Al-Quds University, Abu Dis, Palestine.
| | - Louise Showe
- The Wistar Institute, Philadelphia, PA, 19104, USA
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel.
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
| |
Collapse
|
10
|
Kranz A, Polen T, Kotulla C, Arndt A, Bosco G, Bussmann M, Chattopadhyay A, Cramer A, Davoudi CF, Degner U, Diesveld R, Freiherr von Boeselager R, Gärtner K, Gätgens C, Georgi T, Geraths C, Haas S, Heyer A, Hünnefeld M, Ishige T, Kabus A, Kallscheuer N, Kever L, Klaffl S, Kleine B, Kočan M, Koch-Koerfges A, Kraxner KJ, Krug A, Krüger A, Küberl A, Labib M, Lange C, Mack C, Maeda T, Mahr R, Majda S, Michel A, Morosov X, Müller O, Nanda AM, Nickel J, Pahlke J, Pfeifer E, Platzen L, Ramp P, Rittmann D, Schaffer S, Scheele S, Spelberg S, Schulte J, Schweitzer JE, Sindelar G, Sorger-Herrmann U, Spelberg M, Stansen C, Tharmasothirajan A, Ooyen JV, van Summeren-Wesenhagen P, Vogt M, Witthoff S, Zhu L, Eikmanns BJ, Oldiges M, Schaumann G, Baumgart M, Brocker M, Eggeling L, Freudl R, Frunzke J, Marienhagen J, Wendisch VF, Bott M. A manually curated compendium of expression profiles for the microbial cell factory Corynebacterium glutamicum. Sci Data 2022; 9:594. [PMID: 36182956 PMCID: PMC9526701 DOI: 10.1038/s41597-022-01706-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 08/18/2022] [Indexed: 11/12/2022] Open
Abstract
Corynebacterium glutamicum is the major host for the industrial production of amino acids and has become one of the best studied model organisms in microbial biotechnology. Rational strain construction has led to an improvement of producer strains and to a variety of novel producer strains with a broad substrate and product spectrum. A key factor for the success of these approaches is detailed knowledge of transcriptional regulation in C. glutamicum. Here, we present a large compendium of 927 manually curated microarray-based transcriptional profiles for wild-type and engineered strains detecting genome-wide expression changes of the 3,047 annotated genes in response to various environmental conditions or in response to genetic modifications. The replicates within the 927 experiments were combined to 304 microarray sets ordered into six categories that were used for differential gene expression analysis. Hierarchical clustering confirmed that no outliers were present in the sets. The compendium provides a valuable resource for future fundamental and applied research with C. glutamicum and contributes to a systemic understanding of this microbial cell factory.Measurement(s) | Gene Expression Analysis | Technology Type(s) | Two Color Microarray | Factor Type(s) | WT condition A vs. WT condition B • Plasmid-based gene overexpression in parental strain vs. parental strain with empty vector control • Deletion mutant vs. parental strain | Sample Characteristic - Organism | Corynebacterium glutamicum | Sample Characteristic - Environment | laboratory environment | Sample Characteristic - Location | Germany |
Collapse
Affiliation(s)
- Angela Kranz
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany. .,IBG-4: Bioinformatics, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany.
| | - Tino Polen
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Christian Kotulla
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Annette Arndt
- Institute of Microbiology and Biotechnology, University of Ulm, D-89069, Ulm, Germany
| | - Graziella Bosco
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Michael Bussmann
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Ava Chattopadhyay
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Annette Cramer
- Institute of Microbiology and Biotechnology, University of Ulm, D-89069, Ulm, Germany
| | - Cedric-Farhad Davoudi
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Ursula Degner
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Ramon Diesveld
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | | | - Kim Gärtner
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Cornelia Gätgens
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Tobias Georgi
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Christian Geraths
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Sabine Haas
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Antonia Heyer
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Max Hünnefeld
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Takeru Ishige
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Armin Kabus
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Nicolai Kallscheuer
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Larissa Kever
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Simon Klaffl
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Britta Kleine
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Martina Kočan
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Abigail Koch-Koerfges
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Kim J Kraxner
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Andreas Krug
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Aileen Krüger
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Andreas Küberl
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Mohamed Labib
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Christian Lange
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Christina Mack
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Tomoya Maeda
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Regina Mahr
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Stephan Majda
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Andrea Michel
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Xenia Morosov
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Olga Müller
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Arun M Nanda
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Jens Nickel
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Jennifer Pahlke
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Eugen Pfeifer
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Laura Platzen
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Paul Ramp
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Doris Rittmann
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Steffen Schaffer
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Sandra Scheele
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Stephanie Spelberg
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Julia Schulte
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Jens-Eric Schweitzer
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Georg Sindelar
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Ulrike Sorger-Herrmann
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Markus Spelberg
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Corinna Stansen
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Apilaasha Tharmasothirajan
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Jan van Ooyen
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | | | - Michael Vogt
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Sabrina Witthoff
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Lingfeng Zhu
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Bernhard J Eikmanns
- Institute of Microbiology and Biotechnology, University of Ulm, D-89069, Ulm, Germany
| | - Marco Oldiges
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Georg Schaumann
- SenseUp GmbH, c/o Campus Forschungszentrum, Wilhelm-Johnen-Strasse, D-52425, Jülich, Germany
| | - Meike Baumgart
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Melanie Brocker
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Lothar Eggeling
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Roland Freudl
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Julia Frunzke
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Jan Marienhagen
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany
| | - Volker F Wendisch
- Genetics of Prokaryotes, Biology & CeBiTec, Bielefeld University, Universitaetsstr. 25, D-33615, Bielefeld, Germany
| | - Michael Bott
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich, D-52425, Jülich, Germany.
| |
Collapse
|
11
|
Lee C, Lee S, Park E, Hong J, Shin DY, Byun JM, Yun H, Koh Y, Yoon SS. Transcriptional signatures of the BCL2 family for individualized acute myeloid leukaemia treatment. Genome Med 2022; 14:111. [PMID: 36171613 PMCID: PMC9520894 DOI: 10.1186/s13073-022-01115-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 09/20/2022] [Indexed: 11/10/2022] Open
Abstract
Background Although anti-apoptotic proteins of the B-cell lymphoma-2 (BCL2) family have been utilized as therapeutic targets in acute myeloid leukaemia (AML), their complicated regulatory networks make individualized therapy difficult. This study aimed to discover the transcriptional signatures of BCL2 family genes that reflect regulatory dynamics, which can guide individualized therapeutic strategies. Methods From three AML RNA-seq cohorts (BeatAML, LeuceGene, and TCGA; n = 451, 437, and 179, respectively), we constructed the BCL2 family signatures (BFSigs) by applying an innovative gene-set selection method reflecting biological knowledge followed by non-negative matrix factorization (NMF). To demonstrate the significance of the BFSigs, we conducted modelling to predict response to BCL2 family inhibitors, clustering, and functional enrichment analysis. Cross-platform validity of BFSigs was also confirmed using NanoString technology in a separate cohort of 47 patients. Results We established BFSigs labeled as the BCL2, MCL1/BCL2, and BFL1/MCL1 signatures that identify key anti-apoptotic proteins. Unsupervised clustering based on BFSig information consistently classified AML patients into three robust subtypes across different AML cohorts, implying the existence of biological entities revealed by the BFSig approach. Interestingly, each subtype has distinct enrichment patterns of major cancer pathways, including MAPK and mTORC1, which propose subtype-specific combination treatment with apoptosis modulating drugs. The BFSig-based classifier also predicted response to venetoclax with remarkable performance (area under the ROC curve, AUROC = 0.874), which was well-validated in an independent cohort (AUROC = 0.950). Lastly, we successfully confirmed the validity of BFSigs using NanoString technology. Conclusions This study proposes BFSigs as a biomarker for the effective selection of apoptosis targeting treatments and cancer pathways to co-target in AML. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-022-01115-w.
Collapse
Affiliation(s)
- Chansub Lee
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea.,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea.,Center for Precision Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Eunchae Park
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea.,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea
| | - Junshik Hong
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea.,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea.,Division of Hematology and Medical Oncology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Dong-Yeop Shin
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea.,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea.,Division of Hematology and Medical Oncology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Ja Min Byun
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea.,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea.,Division of Hematology and Medical Oncology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Hongseok Yun
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, Republic of Korea. .,Center for Precision Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
| | - Youngil Koh
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea. .,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea. .,Division of Hematology and Medical Oncology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
| | - Sung-Soo Yoon
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Republic of Korea. .,Center for Medical Innovation, Seoul National University Hospital, Seoul, Republic of Korea. .,Division of Hematology and Medical Oncology, Department of Internal Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
| |
Collapse
|
12
|
Ensemble feature selection for multi‐label text classification: An intelligent order statistics approach. INT J INTELL SYST 2022. [DOI: 10.1002/int.23044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
13
|
|
14
|
EGFAFS: A Novel Feature Selection Algorithm Based on Explosion Gravitation Field Algorithm. ENTROPY 2022; 24:e24070873. [PMID: 35885095 PMCID: PMC9322764 DOI: 10.3390/e24070873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 02/04/2023]
Abstract
Feature selection (FS) is a vital step in data mining and machine learning, especially for analyzing the data in high-dimensional feature space. Gene expression data usually consist of a few samples characterized by high-dimensional feature space. As a result, they are not suitable to be processed by simple methods, such as the filter-based method. In this study, we propose a novel feature selection algorithm based on the Explosion Gravitation Field Algorithm, called EGFAFS. To reduce the dimensions of the feature space to acceptable dimensions, we constructed a recommended feature pool by a series of Random Forests based on the Gini index. Furthermore, by paying more attention to the features in the recommended feature pool, we can find the best subset more efficiently. To verify the performance of EGFAFS for FS, we tested EGFAFS on eight gene expression datasets compared with four heuristic-based FS methods (GA, PSO, SA, and DE) and four other FS methods (Boruta, HSICLasso, DNN-FS, and EGSG). The results show that EGFAFS has better performance for FS on gene expression data in terms of evaluation metrics, having more than the other eight FS algorithms. The genes selected by EGFAGS play an essential role in the differential co-expression network and some biological functions further demonstrate the success of EGFAFS for solving FS problems on gene expression data.
Collapse
|
15
|
Yousef M, Voskergian D. TextNetTopics: Text Classification Based Word Grouping as Topics and Topics’ Scoring. Front Genet 2022; 13:893378. [PMID: 35795215 PMCID: PMC9251539 DOI: 10.3389/fgene.2022.893378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/25/2022] [Indexed: 11/28/2022] Open
Abstract
Medical document classification is one of the active research problems and the most challenging within the text classification domain. Medical datasets often contain massive feature sets where many features are considered irrelevant, redundant, and add noise, thus, reducing the classification performance. Therefore, to obtain a better accuracy of a classification model, it is crucial to choose a set of features (terms) that best discriminate between the classes of medical documents. This study proposes TextNetTopics, a novel approach that applies feature selection by considering Bag-of-topics (BOT) rather than the traditional approach, Bag-of-words (BOW). Thus our approach performs topic selections rather than words selection. TextNetTopics is based on the generic approach entitled G-S-M (Grouping, Scoring, and Modeling), developed by Yousef and his colleagues and used mainly in biological data. The proposed approach suggests scoring topics to select the top topics for training the classifier. This study applied TextNetTopics to textual data to respond to the CAMDA challenge. TextNetTopics outperforms various feature selection approaches while highly performing when applying the model to the validation data provided by the CAMDA. Additionally, we have applied our algorithm to different textual datasets.
Collapse
Affiliation(s)
- Malik Yousef
- Zefat Academic College, Zefat, Israel
- *Correspondence: Malik Yousef, ; Daniel Voskergian,
| | - Daniel Voskergian
- Computer Engineering Department, Al-Quds University, Jerusalem, Palestine
- *Correspondence: Malik Yousef, ; Daniel Voskergian,
| |
Collapse
|
16
|
Mate Analysis of Hepatocellular Carcinoma Immune Subtypes and Their Functional Effects Based on Fuzzy Logic and Evolutionary Algorithms. CONTRAST MEDIA & MOLECULAR IMAGING 2022; 2022:5787981. [PMID: 35601568 PMCID: PMC9098361 DOI: 10.1155/2022/5787981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 03/23/2022] [Accepted: 04/05/2022] [Indexed: 11/17/2022]
Abstract
Functional analysis of immune subtypes in hepatocellular carcinoma has attracted much attention due to its advantages in solving some optimization problems. At present, the research on the immune subtype of hepatocellular carcinoma is still in its infancy, and the high stability of its system still has problems. Based on fuzzy logic and evolutionary algorithms, this paper constructs a Mate analysis of the optimization problem of immune subtypes and dynamic optimization problems of hepatocellular carcinoma. The model conducts in-depth analysis and research on the biological immune subtype system, solving the problems of reliable information processing and body defense. Tested with existing test functions, very competitive results were achieved. The simulation results show that the improved algorithm based on data statistics has global search ability, the solution accuracy reaches 0.931, and the stability reaches 88.1%.
Collapse
|
17
|
Integrated Bioinformatics Analysis and Verification of Gene Targets for Myocardial Ischemia-Reperfusion Injury. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2022; 2022:2056630. [PMID: 35463067 PMCID: PMC9033367 DOI: 10.1155/2022/2056630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 03/11/2022] [Accepted: 03/28/2022] [Indexed: 11/18/2022]
Abstract
Background Myocardial ischemia-reperfusion injury (MIRI) has become a thorny and unsolved clinical problem. The pathological mechanisms of MIRI are intricate and unclear, so it is of great significance to explore potential hub genes and search for some natural products that exhibit potential therapeutic efficacy on MIRI via targeting the hub genes. Methods First, the differential expression genes (DEGs) from GSE58486, GSE108940, and GSE115568 were screened and integrated via a robust rank aggregation algorithm. Then, the hub genes were identified and verified by the functional experiment of the MIRI mice. Finally, natural products with protective effects against MIRI were retrieved, and molecular docking simulations between hub genes and natural products were performed. Results 230 integrated DEGs and 9 hub genes were identified. After verification, Emr1, Tyrobp, Itgb2, Fcgr2b, Cybb, and Fcer1g might be the most significant genes during MIRI. A total of 75 natural products were discovered. Most of them (especially araloside C, glycyrrhizic acid, ophiopogonin D, polyphyllin I, and punicalagin) showed good ability to bind the hub genes. Conclusions Emr1, Tyrobp, Itgb2, Fcgr2b, Cybb, and Fcer1g might be critical in the pathological process of MIRI, and the natural products (araloside C, glycyrrhizic acid, ophiopogonin D, polyphyllin I, and punicalagin) targeting these hub genes exhibited potential therapeutic efficacy on MIRI. Our findings provided new insights to explore the mechanism and treatments for MIRI and revealed new therapeutic targets for natural products with protective properties against MIRI.
Collapse
|
18
|
Bakir-Gungor B, Hacılar H, Jabeer A, Nalbantoglu OU, Aran O, Yousef M. Inflammatory bowel disease biomarkers of human gut microbiota selected via different feature selection methods. PeerJ 2022; 10:e13205. [PMID: 35497193 PMCID: PMC9048649 DOI: 10.7717/peerj.13205] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/10/2022] [Indexed: 01/12/2023] Open
Abstract
The tremendous boost in next generation sequencing and in the "omics" technologies makes it possible to characterize the human gut microbiome-the collective genomes of the microbial community that reside in our gastrointestinal tract. Although some of these microorganisms are considered to be essential regulators of our immune system, the alteration of the complexity and eubiotic state of microbiota might promote autoimmune and inflammatory disorders such as diabetes, rheumatoid arthritis, Inflammatory bowel diseases (IBD), obesity, and carcinogenesis. IBD, comprising Crohn's disease and ulcerative colitis, is a gut-related, multifactorial disease with an unknown etiology. IBD presents defects in the detection and control of the gut microbiota, associated with unbalanced immune reactions, genetic mutations that confer susceptibility to the disease, and complex environmental conditions such as westernized lifestyle. Although some existing studies attempt to unveil the composition and functional capacity of the gut microbiome in relation to IBD diseases, a comprehensive picture of the gut microbiome in IBD patients is far from being complete. Due to the complexity of metagenomic studies, the applications of the state-of-the-art machine learning techniques became popular to address a wide range of questions in the field of metagenomic data analysis. In this regard, using IBD associated metagenomics dataset, this study utilizes both supervised and unsupervised machine learning algorithms, (i) to generate a classification model that aids IBD diagnosis, (ii) to discover IBD-associated biomarkers, (iii) to discover subgroups of IBD patients using k-means and hierarchical clustering approaches. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization (CMIM), Fast Correlation Based Filter (FCBF), min redundancy max relevance (mRMR), Select K Best (SKB), Information Gain (IG) and Extreme Gradient Boosting (XGBoost). In our experiments with 100-fold Monte Carlo cross-validation (MCCV), XGBoost, IG, and SKB methods showed a considerable effect in terms of minimizing the microbiota used for the diagnosis of IBD and thus reducing the cost and time. We observed that compared to Decision Tree, Support Vector Machine, Logitboost, Adaboost, and stacking ensemble classifiers, our Random Forest classifier resulted in better performance measures for the classification of IBD. Our findings revealed potential microbiome-mediated mechanisms of IBD and these findings might be useful for the development of microbiome-based diagnostics.
Collapse
Affiliation(s)
- Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Hilal Hacılar
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Amhar Jabeer
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | | | - Oya Aran
- TETAM, Bogazici University, Istanbul, Turkey
| | - Malik Yousef
- Zefat Academic College, Zefat, Israel,Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
19
|
Yousef M, Goy G, Bakir-Gungor B. miRModuleNet: Detecting miRNA-mRNA Regulatory Modules. Front Genet 2022; 13:767455. [PMID: 35495139 PMCID: PMC9039401 DOI: 10.3389/fgene.2022.767455] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 03/24/2022] [Indexed: 12/13/2022] Open
Abstract
Increasing evidence that microRNAs (miRNAs) play a key role in carcinogenesis has revealed the need for elucidating the mechanisms of miRNA regulation and the roles of miRNAs in gene-regulatory networks. A better understanding of the interactions between miRNAs and their mRNA targets will provide a better understanding of the complex biological processes that occur during carcinogenesis. Increased efforts to reveal these interactions have led to the development of a variety of tools to detect and understand these interactions. We have recently described a machine learning approach miRcorrNet, based on grouping and scoring (ranking) groups of genes, where each group is associated with a miRNA and the group members are genes with expression patterns that are correlated with this specific miRNA. The miRcorrNet tool requires two types of -omics data, miRNA and mRNA expression profiles, as an input file. In this study we describe miRModuleNet, which groups mRNA (genes) that are correlated with each miRNA to form a star shape, which we identify as a miRNA-mRNA regulatory module. A scoring procedure is then applied to each module to further assess their contribution in terms of classification. An important output of miRModuleNet is that it provides a hierarchical list of significant miRNA-mRNA regulatory modules. miRModuleNet was further validated on external datasets for their disease associations, and functional enrichment analysis was also performed. The application of miRModuleNet aids the identification of functional relationships between significant biomarkers and reveals essential pathways involved in cancer pathogenesis. The miRModuleNet tool and all other supplementary files are available at https://github.com/malikyousef/miRModuleNet/
Collapse
Affiliation(s)
- Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- *Correspondence: Malik Yousef,
| | - Gokhan Goy
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
- The Scientific and Technological Research Council of Turkey, Ankara, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Turkey
| |
Collapse
|
20
|
Prediction of Linear Cationic Antimicrobial Peptides Active against Gram-Negative and Gram-Positive Bacteria Based on Machine Learning Models. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073631] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Antimicrobial peptides (AMPs) are considered as promising alternatives to conventional antibiotics in order to overcome the growing problems of antibiotic resistance. Computational prediction approaches receive an increasing interest to identify and design the best candidate AMPs prior to the in vitro tests. In this study, we focused on the linear cationic peptides with non-hemolytic activity, which are downloaded from the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). Referring to the MIC (Minimum inhibition concentration) values, we have assigned a positive label to a peptide if it shows antimicrobial activity; otherwise, the peptide is labeled as negative. Here, we focused on the peptides showing antimicrobial activity against Gram-negative and against Gram-positive bacteria separately, and we created two datasets accordingly. Ten different physico-chemical properties of the peptides are calculated and used as features in our study. Following data exploration and data preprocessing steps, a variety of classification algorithms are used with 100-fold Monte Carlo Cross-Validation to build models and to predict the antimicrobial activity of the peptides. Among the generated models, Random Forest has resulted in the best performance metrics for both Gram-negative dataset (Accuracy: 0.98, Recall: 0.99, Specificity: 0.97, Precision: 0.97, AUC: 0.99, F1: 0.98) and Gram-positive dataset (Accuracy: 0.95, Recall: 0.95, Specificity: 0.95, Precision: 0.90, AUC: 0.97, F1: 0.92) after outlier elimination is applied. This prediction approach might be useful to evaluate the antibacterial potential of a candidate peptide sequence before moving to the experimental studies.
Collapse
|
21
|
Long Non-Coding RNAs Might Regulate Phenotypic Switch of Vascular Smooth Muscle Cells Acting as ceRNA: Implications for In-Stent Restenosis. Int J Mol Sci 2022; 23:ijms23063074. [PMID: 35328496 PMCID: PMC8952224 DOI: 10.3390/ijms23063074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 03/07/2022] [Accepted: 03/09/2022] [Indexed: 02/01/2023] Open
Abstract
Coronary in-stent restenosis is a late complication of angioplasty. It is a multifactorial process that involves vascular smooth muscle cells (VSMCs), endothelial cells, and inflammatory and genetic factors. In this study, the transcriptomic landscape of VSMCs’ phenotypic switch process was assessed under stimuli resembling stent injury. Co-cultured contractile VSMCs and endothelial cells were exposed to a bare metal stent and platelet-derived growth factor (PDGF-BB) 20 ng/mL. Migratory capacity (wound healing assay), proliferative capacity, and cell cycle analysis of the VSMCs were performed. RNAseq analysis of contractile vs. proliferative VSMCs was performed. Gene differential expression (DE), identification of new long non-coding RNA candidates (lncRNAs), gene ontology (GO), and pathway enrichment (KEGG) were analyzed. A competing endogenous RNA network was constructed, and significant lncRNA–miRNA–mRNA axes were selected. VSMCs exposed to “stent injury” conditions showed morphologic changes, with proliferative and migratory capacities progressing from G0-G1 cell cycle phase to S and G2-M. RNAseq analysis showed DE of 1099, 509 and 64 differentially expressed mRNAs, lncRNAs, and miRNAs, respectively. GO analysis of DE genes showed significant enrichment in collagen and extracellular matrix organization, regulation of smooth muscle cell proliferation, and collagen biosynthetic process. The main upregulated nodes in the lncRNA-mediated ceRNA network were PVT1 and HIF1-AS2, with downregulation of ACTA2-AS1 and MIR663AHG. The PVT1 ceRNA axis appears to be an attractive target for in-stent restenosis diagnosis and treatment.
Collapse
|
22
|
Galbraith E, Convertino M. The Eco-Evo Mandala: Simplifying Bacterioplankton Complexity into Ecohealth Signatures. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1471. [PMID: 34828169 PMCID: PMC8625105 DOI: 10.3390/e23111471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 10/30/2021] [Accepted: 11/05/2021] [Indexed: 12/24/2022]
Abstract
The microbiome emits informative signals of biological organization and environmental pressure that aid ecosystem monitoring and prediction. Are the many signals reducible to a habitat-specific portfolio that characterizes ecosystem health? Does an optimally structured microbiome imply a resilient microbiome? To answer these questions, we applied our novel Eco-Evo Mandala to bacterioplankton data from four habitats within the Great Barrier Reef, to explore how patterns in community structure, function and genetics signal habitat-specific organization and departures from theoretical optimality. The Mandala revealed communities departing from optimality in habitat-specific ways, mostly along structural and functional traits related to bacterioplankton abundance and interaction distributions (reflected by ϵ and λ as power law and exponential distribution parameters), which are not linearly associated with each other. River and reef communities were similar in their relatively low abundance and interaction disorganization (low ϵ and λ) due to their protective structured habitats. On the contrary, lagoon and estuarine inshore reefs appeared the most disorganized due to the ocean temperature and biogeochemical stress. Phylogenetic distances (D) were minimally informative in characterizing bacterioplankton organization. However, dominant populations, such as Proteobacteria, Bacteroidetes, and Cyanobacteria, were largely responsible for community patterns, being generalists with a large functional gene repertoire (high D) that increases resilience. The relative balance of these populations was found to be habitat-specific and likely related to systemic environmental stress. The position on the Mandala along the three fundamental traits, as well as fluctuations in this ecological state, conveys information about the microbiome's health (and likely ecosystem health considering bacteria-based multitrophic dependencies) as divergence from the expected relative optimality. The Eco-Evo Mandala emphasizes how habitat and the microbiome's interaction network topology are first- and second-order factors for ecosystem health evaluation over taxonomic species richness. Unhealthy microbiome communities and unbalanced microbes are identified not by macroecological indicators but by mapping their impact on the collective proportion and distribution of interactions, which regulates the microbiome's ecosystem function.
Collapse
Affiliation(s)
- Elroy Galbraith
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
| | - Matteo Convertino
- bluEco Lab, Institute of Environment and Ecology, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China;
| |
Collapse
|
23
|
Bhosale H, Ramakrishnan V, Jayaraman VK. Support vector machine-based prediction of pore-forming toxins (PFT) using distributed representation of reduced alphabets. J Bioinform Comput Biol 2021; 19:2150028. [PMID: 34693886 DOI: 10.1142/s0219720021500281] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Bacterial virulence can be attributed to a wide variety of factors including toxins that harm the host. Pore-forming toxins are one class of toxins that confer virulence to the bacteria and are one of the promising targets for therapeutic intervention. In this work, we develop a sequence-based machine learning framework for the prediction of pore-forming toxins. For this, we have used distributed representation of the protein sequence encoded by reduced alphabet schemes based on conformational similarity and hydropathy index as input features to Support Vector Machines (SVMs). The choice of conformational similarity and hydropathy indices is based on the functional mechanism of pore-forming toxins. Our methodology achieves about 81% accuracy indicating that conformational similarity, an indicator of the flexibility of amino acids, along with hydrophobic index can capture the intrinsic features of pore-forming toxins that distinguish it from other types of transporter proteins. Increased understanding of the mechanisms of pore-forming toxins can further contribute to the use of such "mechanism-informed" features that may increase the prediction accuracy further.
Collapse
Affiliation(s)
- Hrushikesh Bhosale
- Department of Computer Science, FLAME University, Pune, Maharashtra, India
| | - Vigneshwar Ramakrishnan
- School of Chemical & Biotechnology, SASTRA Deemed-to-be University, Thanjavur, Tamilnadu, India
| | - Valadi K Jayaraman
- Department of Computer Science, FLAME University, Pune, Maharashtra, India
| |
Collapse
|
24
|
A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data. Processes (Basel) 2021. [DOI: 10.3390/pr9081466] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Data-driven model with predictive ability are important to be used in medical and healthcare. However, the most challenging task in predictive modeling is to construct a prediction model, which can be addressed using machine learning (ML) methods. The methods are used to learn and trained the model using a gene expression dataset without being programmed explicitly. Due to the vast amount of gene expression data, this task becomes complex and time consuming. This paper provides a recent review on recent progress in ML and deep learning (DL) for cancer classification, which has received increasing attention in bioinformatics and computational biology. The development of cancer classification methods based on ML and DL is mostly focused on this review. Although many methods have been applied to the cancer classification problem, recent progress shows that most of the successful techniques are those based on supervised and DL methods. In addition, the sources of the healthcare dataset are also described. The development of many machine learning methods for insight analysis in cancer classification has brought a lot of improvement in healthcare. Currently, it seems that there is highly demanded further development of efficient classification methods to address the expansion of healthcare applications.
Collapse
|
25
|
Arora G, Joshi J, Mandal RS, Shrivastava N, Virmani R, Sethi T. Artificial Intelligence in Surveillance, Diagnosis, Drug Discovery and Vaccine Development against COVID-19. Pathogens 2021; 10:1048. [PMID: 34451513 PMCID: PMC8399076 DOI: 10.3390/pathogens10081048] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 08/11/2021] [Accepted: 08/11/2021] [Indexed: 12/15/2022] Open
Abstract
As of August 6th, 2021, the World Health Organization has notified 200.8 million laboratory-confirmed infections and 4.26 million deaths from COVID-19, making it the worst pandemic since the 1918 flu. The main challenges in mitigating COVID-19 are effective vaccination, treatment, and agile containment strategies. In this review, we focus on the potential of Artificial Intelligence (AI) in COVID-19 surveillance, diagnosis, outcome prediction, drug discovery and vaccine development. With the help of big data, AI tries to mimic the cognitive capabilities of a human brain, such as problem-solving and learning abilities. Machine Learning (ML), a subset of AI, holds special promise for solving problems based on experiences gained from the curated data. Advances in AI methods have created an unprecedented opportunity for building agile surveillance systems using the deluge of real-time data generated within a short span of time. During the COVID-19 pandemic, many reports have discussed the utility of AI approaches in prioritization, delivery, surveillance, and supply chain of drugs, vaccines, and non-pharmaceutical interventions. This review will discuss the clinical utility of AI-based models and will also discuss limitations and challenges faced by AI systems, such as model generalizability, explainability, and trust as pillars for real-life deployment in healthcare.
Collapse
Affiliation(s)
- Gunjan Arora
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Jayadev Joshi
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44106, USA;
| | - Rahul Shubhra Mandal
- Department of Cancer Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA;
| | - Nitisha Shrivastava
- Department of Pathology, Albert Einstein College of Medicine/Montefiore Medical Center, Bronx, NY 10461, USA;
| | - Richa Virmani
- Confo Therapeutics, Technologiepark 94, 9052 Ghent, Belgium;
| | - Tavpritesh Sethi
- Indraprastha Institute of Information Technology, New Delhi 110020, India;
| |
Collapse
|
26
|
Arora G, Joshi J, Mandal RS, Shrivastava N, Virmani R, Sethi T. Artificial Intelligence in Surveillance, Diagnosis, Drug Discovery and Vaccine Development against COVID-19. Pathogens 2021; 10:1048. [PMID: 34451513 PMCID: PMC8399076 DOI: 10.3390/pathogens10081048,] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
As of August 6th, 2021, the World Health Organization has notified 200.8 million laboratory-confirmed infections and 4.26 million deaths from COVID-19, making it the worst pandemic since the 1918 flu. The main challenges in mitigating COVID-19 are effective vaccination, treatment, and agile containment strategies. In this review, we focus on the potential of Artificial Intelligence (AI) in COVID-19 surveillance, diagnosis, outcome prediction, drug discovery and vaccine development. With the help of big data, AI tries to mimic the cognitive capabilities of a human brain, such as problem-solving and learning abilities. Machine Learning (ML), a subset of AI, holds special promise for solving problems based on experiences gained from the curated data. Advances in AI methods have created an unprecedented opportunity for building agile surveillance systems using the deluge of real-time data generated within a short span of time. During the COVID-19 pandemic, many reports have discussed the utility of AI approaches in prioritization, delivery, surveillance, and supply chain of drugs, vaccines, and non-pharmaceutical interventions. This review will discuss the clinical utility of AI-based models and will also discuss limitations and challenges faced by AI systems, such as model generalizability, explainability, and trust as pillars for real-life deployment in healthcare.
Collapse
Affiliation(s)
- Gunjan Arora
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
- Correspondence: or
| | - Jayadev Joshi
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44106, USA;
| | - Rahul Shubhra Mandal
- Department of Cancer Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA;
| | - Nitisha Shrivastava
- Department of Pathology, Albert Einstein College of Medicine/Montefiore Medical Center, Bronx, NY 10461, USA;
| | - Richa Virmani
- Confo Therapeutics, Technologiepark 94, 9052 Ghent, Belgium;
| | - Tavpritesh Sethi
- Indraprastha Institute of Information Technology, New Delhi 110020, India;
| |
Collapse
|
27
|
Yousef M, Goy G, Mitra R, Eischen CM, Jabeer A, Bakir-Gungor B. miRcorrNet: machine learning-based integration of miRNA and mRNA expression profiles, combined with feature grouping and ranking. PeerJ 2021; 9:e11458. [PMID: 34055490 PMCID: PMC8140596 DOI: 10.7717/peerj.11458] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 04/25/2021] [Indexed: 11/20/2022] Open
Abstract
A better understanding of disease development and progression mechanisms at the molecular level is critical both for the diagnosis of a disease and for the development of therapeutic approaches. The advancements in high throughput technologies allowed to generate mRNA and microRNA (miRNA) expression profiles; and the integrative analysis of these profiles allowed to uncover the functional effects of RNA expression in complex diseases, such as cancer. Several researches attempt to integrate miRNA and mRNA expression profiles using statistical methods such as Pearson correlation, and then combine it with enrichment analysis. In this study, we developed a novel tool called miRcorrNet, which performs machine learning-based integration to analyze miRNA and mRNA gene expression profiles. miRcorrNet groups mRNAs based on their correlation to miRNA expression levels and hence it generates groups of target genes associated with each miRNA. Then, these groups are subject to a rank function for classification. We have evaluated our tool using miRNA and mRNA expression profiling data downloaded from The Cancer Genome Atlas (TCGA), and performed comparative evaluation with existing tools. In our experiments we show that miRcorrNet performs as good as other tools in terms of accuracy (reaching more than 95% AUC value). Additionally, miRcorrNet includes ranking steps to separate two classes, namely case and control, which is not available in other tools. We have also evaluated the performance of miRcorrNet using a completely independent dataset. Moreover, we conducted a comprehensive literature search to explore the biological functions of the identified miRNAs. We have validated our significantly identified miRNA groups against known databases, which yielded about 90% accuracy. Our results suggest that miRcorrNet is able to accurately prioritize pan-cancer regulating high-confidence miRNAs. miRcorrNet tool and all other supplementary files are available at https://github.com/malikyousef/miRcorrNet.
Collapse
Affiliation(s)
- Malik Yousef
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.,Department of Information Systems, Zefat Academic College, Zefat, Israel
| | - Gokhan Goy
- Department of Computer Engineering, Abdullah Gül University, Kayseri, Turkey
| | - Ramkrishna Mitra
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
| | - Christine M Eischen
- Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
| | - Amhar Jabeer
- Department of Computer Engineering, Abdullah Gül University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gül University, Kayseri, Turkey
| |
Collapse
|