1
|
Zhang B, Liu H, Wu F, Ding Y, Wu J, Lu L, Bajpai AK, Sang M, Wang X. Identification of hub genes and potential molecular mechanisms related to drug sensitivity in acute myeloid leukemia based on machine learning. Front Pharmacol 2024; 15:1359832. [PMID: 38650628 PMCID: PMC11033397 DOI: 10.3389/fphar.2024.1359832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 03/21/2024] [Indexed: 04/25/2024] Open
Abstract
Background: Acute myeloid leukemia (AML) is the most common form of leukemia among adults and is characterized by uncontrolled proliferation and clonal expansion of hematopoietic cells. There has been a significant improvement in the treatment of younger patients, however, prognosis in the elderly AML patients remains poor. Methods: We used computational methods and machine learning (ML) techniques to identify and explore the differential high-risk genes (DHRGs) in AML. The DHRGs were explored through multiple in silico approaches including genomic and functional analysis, survival analysis, immune infiltration, miRNA co-expression and stemness features analyses to reveal their prognostic importance in AML. Furthermore, using different ML algorithms, prognostic models were constructed and validated using the DHRGs. At the end molecular docking studies were performed to identify potential drug candidates targeting the selected DHRGs. Results: We identified a total of 80 DHRGs by comparing the differentially expressed genes derived between AML patients and normal controls and high-risk AML genes identified by Cox regression. Genetic and epigenetic alteration analyses of the DHRGs revealed a significant association of their copy number variations and methylation status with overall survival (OS) of AML patients. Out of the 137 models constructed using different ML algorithms, the combination of Ridge and plsRcox maintained the highest mean C-index and was used to build the final model. When AML patients were classified into low- and high-risk groups based on DHRGs, the low-risk group had significantly longer OS in the AML training and validation cohorts. Furthermore, immune infiltration, miRNA coexpression, stemness feature and hallmark pathway analyses revealed significant differences in the prognosis of the low- and high-risk AML groups. Drug sensitivity and molecular docking studies revealed top 5 drugs, including carboplatin and austocystin-D that may significantly affect the DHRGs in AML. Conclusion: The findings from the current study identified a set of high-risk genes that may be used as prognostic and therapeutic markers for AML patients. In addition, significant use of the ML algorithms in constructing and validating the prognostic models in AML was demonstrated. Although our study used extensive bioinformatics and machine learning methods to identify the hub genes in AML, their experimental validations using knock-out/-in methods would strengthen our findings.
Collapse
Affiliation(s)
- Boyu Zhang
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| | - Haiyan Liu
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| | - Fengxia Wu
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| | - Yuhong Ding
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| | - Jiarun Wu
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| | - Lu Lu
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Akhilesh K. Bajpai
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, United States
| | - Mengmeng Sang
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| | - Xinfeng Wang
- Department of Hematology, Affiliated Hospital of Nantong University, Medical School of Nantong University, Nantong, Jiangsu, China
| |
Collapse
|
2
|
Prediction of Phage Virion Proteins Using Machine Learning Methods. Molecules 2023; 28:molecules28052238. [PMID: 36903484 PMCID: PMC10004995 DOI: 10.3390/molecules28052238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 01/27/2023] [Accepted: 02/20/2023] [Indexed: 03/04/2023] Open
Abstract
Antimicrobial resistance (AMR) is a major problem and an immediate alternative to antibiotics is the need of the hour. Research on the possible alternative products to tackle bacterial infections is ongoing worldwide. One of the most promising alternatives to antibiotics is the use of bacteriophages (phage) or phage-driven antibacterial drugs to cure bacterial infections caused by AMR bacteria. Phage-driven proteins, including holins, endolysins, and exopolysaccharides, have shown great potential in the development of antibacterial drugs. Likewise, phage virion proteins (PVPs) might also play an important role in the development of antibacterial drugs. Here, we have developed a machine learning-based prediction method to predict PVPs using phage protein sequences. We have employed well-known basic and ensemble machine learning methods with protein sequence composition features for the prediction of PVPs. We found that the gradient boosting classifier (GBC) method achieved the best accuracy of 80% on the training dataset and an accuracy of 83% on the independent dataset. The performance on the independent dataset is better than other existing methods. A user-friendly web server developed by us is freely available to all users for the prediction of PVPs from phage protein sequences. The web server might facilitate the large-scale prediction of PVPs and hypothesis-driven experimental study design.
Collapse
|
3
|
Ortiz-Vilchis P, De-la-Cruz-García JS, Ramirez-Arellano A. Identification of Relevant Protein Interactions with Partial Knowledge: A Complex Network and Deep Learning Approach. BIOLOGY 2023; 12:140. [PMID: 36671832 PMCID: PMC9856098 DOI: 10.3390/biology12010140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 01/18/2023]
Abstract
Protein-protein interactions (PPIs) are the basis for understanding most cellular events in biological systems. Several experimental methods, e.g., biochemical, molecular, and genetic methods, have been used to identify protein-protein associations. However, some of them, such as mass spectrometry, are time-consuming and expensive. Machine learning (ML) techniques have been widely used to characterize PPIs, increasing the number of proteins analyzed simultaneously and optimizing time and resources for identifying and predicting protein-protein functional linkages. Previous ML approaches have focused on well-known networks or specific targets but not on identifying relevant proteins with partial or null knowledge of the interaction networks. The proposed approach aims to generate a relevant protein sequence based on bidirectional Long-Short Term Memory (LSTM) with partial knowledge of interactions. The general framework comprises conducting a scale-free and fractal complex network analysis. The outcome of these analyses is then used to fine-tune the fractal method for the vital protein extraction of PPI networks. The results show that several PPI networks are self-similar or fractal, but that both features cannot coexist. The generated protein sequences (by the bidirectional LSTM) also contain an average of 39.5% of proteins in the original sequence. The average length of the generated sequences was 17% of the original one. Finally, 95% of the generated sequences were true.
Collapse
Affiliation(s)
- Pilar Ortiz-Vilchis
- Sección de Estudios de Posgrado e Investigación, Escuela Superior de Medicina, Instituto Politécnico Nacional, Mexico City 11340, Mexico
| | - Jazmin-Susana De-la-Cruz-García
- Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria de Ingeniería y Ciencias Sociales y Administrativas, Instituto Politécnico Nacional, Mexico City 08400, Mexico
| | - Aldo Ramirez-Arellano
- Sección de Estudios de Posgrado e Investigación, Unidad Profesional Interdisciplinaria de Ingeniería y Ciencias Sociales y Administrativas, Instituto Politécnico Nacional, Mexico City 08400, Mexico
| |
Collapse
|
4
|
Next Generation Infectious Diseases Monitoring Gages via Incremental Federated Learning: Current Trends and Future Possibilities. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:1102715. [PMID: 36909972 PMCID: PMC9995206 DOI: 10.1155/2023/1102715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/29/2022] [Accepted: 09/27/2022] [Indexed: 03/05/2023]
Abstract
Infectious diseases are always alarming for the survival of human life and are a key concern in the public health domain. Therefore, early diagnosis of these infectious diseases is a high demand for modern-era healthcare systems. Novel general infectious diseases such as coronavirus are infectious diseases that cause millions of human deaths across the globe in 2020. Therefore, early, robust recognition of general infectious diseases is the desirable requirement of modern intelligent healthcare systems. This systematic study is designed under Kitchenham guidelines and sets different RQs (research questions) for robust recognition of general infectious diseases. From 2018 to 2021, four electronic databases, IEEE, ACM, Springer, and ScienceDirect, are used for the extraction of research work. These extracted studies delivered different schemes for the accurate recognition of general infectious diseases through different machine learning techniques with the inclusion of deep learning and federated learning models. A framework is also introduced to share the process of detection of infectious diseases by using machine learning models. After the filtration process, 21 studies are extracted and mapped to defined RQs. In the future, early diagnosis of infectious diseases will be possible through wearable health monitoring cages. Moreover, these gages will help to reduce the time and death rate by detection of severe diseases at starting stage.
Collapse
|
5
|
Sahu M, Gupta R, Ambasta RK, Kumar P. Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 190:57-100. [PMID: 36008002 DOI: 10.1016/bs.pmbts.2022.03.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The integration of artificial intelligence in precision medicine has revolutionized healthcare delivery. Precision medicine identifies the phenotype of particular patients with less-common responses to treatment. Recent studies have demonstrated that translational research exploring the convergence between artificial intelligence and precision medicine will help solve the most difficult challenges facing precision medicine. Here, we discuss different aspects of artificial intelligence in precision medicine that improve healthcare delivery. First, we discuss how artificial intelligence changes the landscape of precision medicine and the evolution of artificial intelligence in precision medicine. Second, we highlight the synergies between artificial intelligence and precision medicine and promises of artificial intelligence and precision medicine in healthcare delivery. Third, we briefly explain the promise of big data analytics and the integration of nanomaterials in precision medicine. Last, we highlight the challenges and opportunities of artificial intelligence in precision medicine.
Collapse
Affiliation(s)
- Mehar Sahu
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Delhi Technological University (Formerly Delhi College of Engineering), Shahbad Daulatpur, Delhi, India.
| |
Collapse
|
6
|
Le TD, Nguyen PD, Korkin D, Thieu T. PHILM2Web: A high-throughput database of macromolecular host–pathogen interactions on the Web. Database (Oxford) 2022; 2022:6625823. [PMID: 35776535 PMCID: PMC9248916 DOI: 10.1093/database/baac042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 04/27/2022] [Accepted: 05/31/2022] [Indexed: 12/02/2022]
Abstract
During infection, the pathogen’s entry into the host organism, breaching the host immune defense, spread and multiplication are frequently mediated by multiple interactions between the host and pathogen proteins. Systematic studying of host–pathogen interactions (HPIs) is a challenging task for both experimental and computational approaches and is critically dependent on the previously obtained knowledge about these interactions found in the biomedical literature. While several HPI databases exist that manually filter HPI protein–protein interactions from the generic databases and curated experimental interactomic studies, no comprehensive database on HPIs obtained from the biomedical literature is currently available. Here, we introduce a high-throughput literature-mining platform for extracting HPI data that includes the most comprehensive to date collection of HPIs obtained from the PubMed abstracts. Our HPI data portal, PHILM2Web (Pathogen–Host Interactions by Literature Mining on the Web), integrates an automatically generated database of interactions extracted by PHILM, our high-precision HPI literature-mining algorithm. Currently, the database contains 23 581 generic HPIs between 157 host and 403 pathogen organisms from 11 609 abstracts. The interactions were obtained from processing 608 972 PubMed abstracts, each containing mentions of at least one host and one pathogen organisms. In response to the coronavirus disease 2019 (COVID-19) pandemic, we also utilized PHILM to process 25 796 PubMed abstracts obtained by the same query as the COVID-19 Open Research Dataset. This COVID-19 processing batch resulted in 257 HPIs between 19 host and 31 pathogen organisms from 167 abstracts. The access to the entire HPI dataset is available via a searchable PHILM2Web interface; scientists can also download the entire database in bulk for offline processing. Database URL: http://philm2web.live
Collapse
Affiliation(s)
- Tuan-Dung Le
- Department of Computer Science, Oklahoma State University , Stillwater, OK, USA
| | - Phuong D Nguyen
- Department of Biochemistry and Molecular Biology, Oklahoma State University , Stillwater, OK, USA
| | - Dmitry Korkin
- Department of Computer Science and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute , Worcester, MA, USA
| | - Thanh Thieu
- Machine Learning Department, Moffitt Cancer Center and Research Institute , Tampa, FL, USA
| |
Collapse
|
7
|
Das B, Mitra P. Protein Interaction Network-based Deep Learning Framework for Identifying Disease-Associated Human Proteins. J Mol Biol 2021; 433:167149. [PMID: 34271012 DOI: 10.1016/j.jmb.2021.167149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 06/11/2021] [Accepted: 07/06/2021] [Indexed: 10/20/2022]
Abstract
Infectious diseases in humans appear to be one of the most primary public health issues. Identification of novel disease-associated proteins will furnish an efficient recognition of the novel therapeutic targets. Here, we develop a Graph Convolutional Network (GCN)-based model called PINDeL to identify the disease-associated host proteins by integrating the human Protein Locality Graph and its corresponding topological features. Because of the amalgamation of GCN with the protein interaction network, PINDeL achieves the highest accuracy of 83.45% while AUROC and AUPRC values are 0.90 and 0.88, respectively. With high accuracy, recall, F1-score, specificity, AUROC, and AUPRC, PINDeL outperforms other existing machine-learning and deep-learning techniques for disease gene/protein identification in humans. Application of PINDeL on an independent dataset of 24320 proteins, which are not used for training, validation, or testing purposes, predicts 6448 new disease-protein associations of which we verify 3196 disease-proteins through experimental evidence like disease ontology, Gene Ontology, and KEGG pathway enrichment analyses. Our investigation informs that experimentally-verified 748 proteins are indeed responsible for pathogen-host protein interactions of which 22 disease-proteins share their association with multiple diseases such as cancer, aging, chem-dependency, pharmacogenomics, normal variation, infection, and immune-related diseases. This unique Graph Convolution Network-based prediction model is of utmost use in large-scale disease-protein association prediction and hence, will provide crucial insights on disease pathogenesis and will further aid in developing novel therapeutics.
Collapse
Affiliation(s)
- Barnali Das
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur 721302, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur 721302, India.
| |
Collapse
|
8
|
Pan J, Chao NX, Zhang YY, Huang TM, Chen CX, Qin QH, Guo JH, Huang RS, Luo GR. Upregulating KTN1 promotes Hepatocellular Carcinoma progression. J Cancer 2021; 12:4791-4809. [PMID: 34234850 PMCID: PMC8247380 DOI: 10.7150/jca.55570] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 05/23/2021] [Indexed: 12/24/2022] Open
Abstract
Background: Hepatocellular carcinoma (HCC) presents a common malignant tumor worldwide. Although kinectin 1 (KTN1) is the most frequently identified antigen in HCC tissues, the detailed roles of KTN1 in HCC remain unknown. This study seeks to clarify the expression status and clinical value of KTN1 in HCC and to explore the complicated biological functions of KTN1 and its underlying mechanisms. Methods: In-house reverse transcription quantitative polymerase chain reaction (RT-qPCR) was used to detect the expression of KTN1 in HCC tissues. External gene microarrays and RNA-sequencing datasets were downloaded to confirm the expression patterns of KTN1. The prognostic ability of KTN1 in HCC was assessed by a Kaplan-Meier curve and a hazard ratio forest plot. The CRISPR/Cas9 gene-editing system was used to knock out KTN1 in Huh7 cells, which was verified by PCR-Sanger sequencing and western blotting. Assays of cell migration, invasion, viability, cell cycle, and apoptosis were conducted to explore the biological functions. RNA sequencing was performed to quantitatively analyze the functional deregulation in KTN1-knockout cells compared to Huh7-wild-type cells. Upregulated genes that co-expressed with KTN1 were identified from HCC tissues and were functionally annotated. Results: KTN1 expression was increased in HCC tissues (standardized mean difference [SMD] = 0.20 [0.04, 0.37]). High KTN1 expression was significantly correlated with poorer prognosis of HCC patients, and KTN1 may be an independent risk factor for HCC (pooled HRs = 1.31 [1.05, 1.64]). After KTN1-knockout, the viability, migration, and invasion ability of HCC cells were inhibited. The proportion of HCC cells in the G0-G1 phases increased after KTN1 knockout, which also elevated the apoptosis rates in HCC cells. Several cascades, including innate immune response, chemical carcinogenesis, and positive regulation of transcription by RNA polymerase II, were dramatically changed after KTN1 knockout. KTN1 primarily participated in the cell cycle, DNA replication, and microRNAs in cancer pathways in HCC tissues. Conclusion: Upregulation of KTN1 served as a promising prognosticator in HCC patients. KTN1 promotes the occurrence and deterioration of HCC by mediating cell survival, migration, invasion, cell cycle activation, and apoptotic inhibition. KTN1 may be a therapeutic target in HCC patients.
Collapse
Affiliation(s)
- Jian Pan
- Department of Human Anatomy, Guangxi Medical University.,Guangxi Colleges and Universities Key Laboratory of Human Development and Disease Research, Guangxi Medical University, Nanning, China
| | - Nai-Xia Chao
- Department of Biochemistry and Molecular Biology, Guangxi Medical University
| | - Yao-Yao Zhang
- Department of Histology and Embryology, Guangxi Medical University
| | - Tian-Ming Huang
- Department of Histology and Embryology, Guangxi Medical University
| | - Cheng-Xiao Chen
- The Ninth Affiliated Hospital of Guangxi Medical University, Guangxi Medical University
| | - Qiu-Hong Qin
- Jiang bin Hospital of Guangxi Zhuang Autonomous Region
| | | | - Rong-Shi Huang
- Department of Histology and Embryology, Guangxi Traditional Chinese Medical University
| | - Guo-Rong Luo
- Department of Histology and Embryology, Guangxi Medical University.,Guangxi Colleges and Universities Key Laboratory of Human Development and Disease Research, Guangxi Medical University, Nanning, China
| |
Collapse
|
9
|
Patrício A, Costa RS, Henriques R. Predictability of COVID-19 Hospitalizations, Intensive Care Unit Admissions, and Respiratory Assistance in Portugal: Longitudinal Cohort Study. J Med Internet Res 2021; 23:e26075. [PMID: 33835931 PMCID: PMC8080965 DOI: 10.2196/26075] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 02/14/2021] [Accepted: 03/18/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND In the face of the current COVID-19 pandemic, the timely prediction of upcoming medical needs for infected individuals enables better and quicker care provision when necessary and management decisions within health care systems. OBJECTIVE This work aims to predict the medical needs (hospitalizations, intensive care unit admissions, and respiratory assistance) and survivability of individuals testing positive for SARS-CoV-2 infection in Portugal. METHODS A retrospective cohort of 38,545 infected individuals during 2020 was used. Predictions of medical needs were performed using state-of-the-art machine learning approaches at various stages of a patient's cycle, namely, at testing (prehospitalization), at posthospitalization, and during postintensive care. A thorough optimization of state-of-the-art predictors was undertaken to assess the ability to anticipate medical needs and infection outcomes using demographic and comorbidity variables, as well as dates associated with symptom onset, testing, and hospitalization. RESULTS For the target cohort, 75% of hospitalization needs could be identified at the time of testing for SARS-CoV-2 infection. Over 60% of respiratory needs could be identified at the time of hospitalization. Both predictions had >50% precision. CONCLUSIONS The conducted study pinpoints the relevance of the proposed predictive models as good candidates to support medical decisions in the Portuguese population, including both monitoring and in-hospital care decisions. A clinical decision support system is further provided to this end.
Collapse
Affiliation(s)
- André Patrício
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Rafael S Costa
- LAQV-REQUIMTE, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Caparica, Portugal
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| | - Rui Henriques
- Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
- Instituto de Engenharia de Sistemas e Computadores-Investigação e Desenvolvimento, Lisboa, Portugal
| |
Collapse
|
10
|
Understanding current states of machine learning approaches in medical informatics: a systematic literature review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00538-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
11
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
12
|
Ma J, Wang P, Huang L, Qiao J, Li J. Bioinformatic analysis reveals an exosomal miRNA-mRNA network in colorectal cancer. BMC Med Genomics 2021; 14:60. [PMID: 33639954 PMCID: PMC7913431 DOI: 10.1186/s12920-021-00905-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 02/16/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Exosomes play important roles in angiogenesis, drug resistance, and metastasis of colorectal cancer (CRC), but the underlying mechanism has seldom been reported. Herein, our study aimed to reveal an exosomal miRNA-mRNA network involved in CRC by performing bioinformatical analysis. METHODS The mRNA and miRNA data of colon adenocarcinoma and rectal adenocarcinoma were downloaded from The Cancer Genome Atlas (TCGA) database, and exosomal miRNAs data were downloaded from the GEO dataset GSE39833. The differential expression analysis was performed using "limma" and "edgeR". Target mRNAs of miRNAs were predicted using FunRich 3.1.3, miRNAtap and multiMiR. The candidate mRNAs and exosomal miRNAs were obtained by intersecting two groups of differentially expressed miRNAs and intersection of the differential expressed mRNAs and the target mRNAs, respectively. Key mRNAs and exosomal miRNAs were identified by the least absolute shrinkage and selection operator regression analysis, and used to construct the exosomal miRNA-mRNA network. The network verified was by receiver operating characteristic curve, GEPIA and LinkedOmics. Functional enrichment analysis was also performed for studied miRNAs and mRNAs. RESULTS A total of 6568 differentially expressed mRNAs and 531 differentially expressed miRNAs from TCGA data, and 166 differentially expressed exosomal miRNAs in GSE39833 dataset were identified. Next, 16 key mRNAs and five key exosomal miRNAs were identified from the 5284 candidate mRNAs and 61 candidate exosomal miRNAs, respectively. The exosomal miRNA-mRNA network with high connectivity contained 13 hub mRNAs (CBFB, CDH3, ETV4, FOXQ1, FUT1, GCNT2, GRIN2D, KIAA1549, KRT80, LZTS1, SLC39A10, SPTBN2, and ZSWIM4) and five hub exosomal miRNAs (hsa-miR-126, hsa-miR-139, hsa-miR-141, hsa-miR-29c, and hsa-miR-423). The functional annotation revealed that these hub mRNAs were mainly involved in the regulation of B cell receptor signaling pathway and glycosphingolipid biosynthesis related pathways. All hub mRNAs and hub exosomal miRNAs exhibited high diagnosis value for CRC. Furthermore, the association of the hub mRNAs with overall survival, stages, and MSI phenotype of CRC revealed their important roles in CRC progression. CONCLUSION This study constructed an exosomal miRNA-mRNA network which may play crucial roles in the carcinogenesis and progression of CRC, thus providing potential diagnostic biomarkers and therapeutic targets for CRC.
Collapse
Affiliation(s)
- Jun Ma
- Department of Thoracic Surgery, Heji Hospital Affiliated To Changzhi Medical College, Changzhi, 046011, Shanxi, China
| | - Peilong Wang
- Department of Endoscopy, Heji Hospital Affiliated To Changzhi Medical College, Changzhi, 046011, Shanxi, China
| | - Lei Huang
- Department of Endoscopy, Heji Hospital Affiliated To Changzhi Medical College, Changzhi, 046011, Shanxi, China
| | - Jianxia Qiao
- Department of Endoscopy, Heji Hospital Affiliated To Changzhi Medical College, Changzhi, 046011, Shanxi, China
| | - Jianhong Li
- Department of Pathology, Heping Hospital Affiliated To Changzhi Medical College, 160 East Jiefang Street, Changzhi, 046000, Shanxi, China.
| |
Collapse
|
13
|
Agany DD, Pietri JE, Gnimpieba EZ. Assessment of vector-host-pathogen relationships using data mining and machine learning. Comput Struct Biotechnol J 2020; 18:1704-1721. [PMID: 32670510 PMCID: PMC7340972 DOI: 10.1016/j.csbj.2020.06.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/19/2020] [Accepted: 06/19/2020] [Indexed: 12/15/2022] Open
Abstract
Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.
Collapse
Affiliation(s)
- Diing D.M. Agany
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| | - Jose E. Pietri
- University of South Dakota, Sanford School of Medicine, Division of Basic Biomedical Sciences, Vermillion, SD, United States
| | - Etienne Z. Gnimpieba
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| |
Collapse
|
14
|
Le DH. Machine learning-based approaches for disease gene prediction. Brief Funct Genomics 2020; 19:350-363. [PMID: 32567652 DOI: 10.1093/bfgp/elaa013] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/30/2020] [Accepted: 05/09/2020] [Indexed: 12/20/2022] Open
Abstract
Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| |
Collapse
|