1
|
Labani M, Beheshti A, Argha A, Alinejad-Rokny H. A Comprehensive Investigation of Genomic Variants in Prostate Cancer Reveals 30 Putative Regulatory Variants. Int J Mol Sci 2023; 24:ijms24032472. [PMID: 36768794 PMCID: PMC9916892 DOI: 10.3390/ijms24032472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/18/2023] [Accepted: 01/23/2023] [Indexed: 01/31/2023] Open
Abstract
Prostate cancer (PC) is the most frequently diagnosed non-skin cancer in the world. Previous studies have shown that genomic alterations represent the most common mechanism for molecular alterations responsible for the development and progression of PC. This highlights the importance of identifying functional genomic variants for early detection in high-risk PC individuals. Great efforts have been made to identify common protein-coding genetic variations; however, the impact of non-coding variations, including regulatory genetic variants, is not well understood. Identification of these variants and the underlying target genes will be a key step in improving the detection and treatment of PC. To gain an understanding of the functional impact of genetic variants, and in particular, regulatory variants in PC, we developed an integrative pipeline (AGV) that uses whole genome/exome sequences, GWAS SNPs, chromosome conformation capture data, and ChIP-Seq signals to investigate the potential impact of genomic variants on the underlying target genes in PC. We identified 646 putative regulatory variants, of which 30 significantly altered the expression of at least one protein-coding gene. Our analysis of chromatin interactions data (Hi-C) revealed that the 30 putative regulatory variants could affect 131 coding and non-coding genes. Interestingly, our study identified the 131 protein-coding genes that are involved in disease-related pathways, including Reactome and MSigDB, for most of which targeted treatment options are currently available. Notably, our analysis revealed several non-coding RNAs, including RP11-136K7.2 and RAMP2-AS1, as potential enhancer elements of the protein-coding genes CDH12 and EZH1, respectively. Our results provide a comprehensive map of genomic variants in PC and reveal their potential contribution to prostate cancer progression and development.
Collapse
Affiliation(s)
- Mahdieh Labani
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- Data Analytic Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
| | - Amin Beheshti
- Data Analytic Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- UNSW Data Science Hub, The University of New South Wales, Sydney, NSW 2052, Australia
- Health Data Analytics Program, Centre for Applied AI, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
2
|
Wang L, Luo J, Wang H, Li T. Markov clustering ensemble. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Band SS, Ardabili S, Yarahmadi A, Pahlevanzadeh B, Kiani AK, Beheshti A, Alinejad-Rokny H, Dehzangi I, Chang A, Mosavi A, Moslehpour M. A Survey on Machine Learning and Internet of Medical Things-Based Approaches for Handling COVID-19: Meta-Analysis. Front Public Health 2022; 10:869238. [PMID: 35812486 PMCID: PMC9260273 DOI: 10.3389/fpubh.2022.869238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/20/2022] [Indexed: 11/13/2022] Open
Abstract
Early diagnosis, prioritization, screening, clustering, and tracking of patients with COVID-19, and production of drugs and vaccines are some of the applications that have made it necessary to use a new style of technology to involve, manage, and deal with this epidemic. Strategies backed by artificial intelligence (A.I.) and the Internet of Things (IoT) have been undeniably effective to understand how the virus works and prevent it from spreading. Accordingly, the main aim of this survey is to critically review the ML, IoT, and the integration of IoT and ML-based techniques in the applications related to COVID-19, from the diagnosis of the disease to the prediction of its outbreak. According to the main findings, IoT provided a prompt and efficient approach to tracking the disease spread. On the other hand, most of the studies developed by ML-based techniques aimed at the detection and handling of challenges associated with the COVID-19 pandemic. Among different approaches, Convolutional Neural Network (CNN), Support Vector Machine, Genetic CNN, and pre-trained CNN, followed by ResNet have demonstrated the best performances compared to other methods.
Collapse
Affiliation(s)
- Shahab S. Band
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Douliou, Taiwan
| | - Sina Ardabili
- Department of Informatics, J. Selye University, Komárom, Slovakia
| | - Atefeh Yarahmadi
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Douliou, Taiwan
| | - Bahareh Pahlevanzadeh
- Department of Design and System Operations, Regional Information Center for Science and Technology (R.I.C.E.S.T.), Shiraz, Iran
| | - Adiqa Kausar Kiani
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Douliou, Taiwan
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW, Australia
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, U.N.S.W. Sydney, Sydney, NSW, Australia
- U.N.S.W. Data Science Hub, The University of New South Wales (U.N.S.W. Sydney), Sydney, NSW, Australia
- Health Data Analytics Program, AI-enabled Processes (A.I.P.) Research Centre, Macquarie University, Sydney, NSW, Australia
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, United States
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, United States
| | - Arthur Chang
- Bachelor Program in Interdisciplinary Studies, National Yunlin University of Science and Technology, Douliu, Taiwan
| | - Amir Mosavi
- John von Neumann Faculty of Informatics, Obuda University, Budapest, Hungary
- Institute of Information Engineering, Automation and Mathematics, Slovak University of Technology in Bratislava, Bratislava, Slovakia
| | - Massoud Moslehpour
- Department of Business Administration, College of Management, Asia University, Taichung, Taiwan
- Department of Management, California State University, San Bernardino, CA, United States
| |
Collapse
|
4
|
Sharifonnasabi F, Jhanjhi NZ, John J, Obeidy P, Band SS, Alinejad-Rokny H, Baz M. Hybrid HCNN-KNN Model Enhances Age Estimation Accuracy in Orthopantomography. Front Public Health 2022; 10:879418. [PMID: 35712286 PMCID: PMC9197238 DOI: 10.3389/fpubh.2022.879418] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 04/22/2022] [Indexed: 11/17/2022] Open
Abstract
Age estimation in dental radiographs Orthopantomography (OPG) is a medical imaging technique that physicians and pathologists utilize for disease identification and legal matters. For example, for estimating post-mortem interval, detecting child abuse, drug trafficking, and identifying an unknown body. Recent development in automated image processing models improved the age estimation's limited precision to an approximate range of +/- 1 year. While this estimation is often accepted as accurate measurement, age estimation should be as precise as possible in most serious matters, such as homicide. Current age estimation techniques are highly dependent on manual and time-consuming image processing. Age estimation is often a time-sensitive matter in which the image processing time is vital. Recent development in Machine learning-based data processing methods has decreased the imaging time processing; however, the accuracy of these techniques remains to be further improved. We proposed an ensemble method of image classifiers to enhance the accuracy of age estimation using OPGs from 1 year to a couple of months (1-3-6). This hybrid model is based on convolutional neural networks (CNN) and K nearest neighbors (KNN). The hybrid (HCNN-KNN) model was used to investigate 1,922 panoramic dental radiographs of patients aged 15 to 23. These OPGs were obtained from the various teaching institutes and private dental clinics in Malaysia. To minimize the chance of overfitting in our model, we used the principal component analysis (PCA) algorithm and eliminated the features with high correlation. To further enhance the performance of our hybrid model, we performed systematic image pre-processing. We applied a series of classifications to train our model. We have successfully demonstrated that combining these innovative approaches has improved the classification and segmentation and thus the age-estimation outcome of the model. Our findings suggest that our innovative model, for the first time, to the best of our knowledge, successfully estimated the age in classified studies of 1 year old, 6 months, 3 months and 1-month-old cases with accuracies of 99.98, 99.96, 99.87, and 98.78 respectively.
Collapse
Affiliation(s)
- Fatemeh Sharifonnasabi
- Department of Computer Science & Engineering, School of Computing & IT (SoCIT), Taylor's University, Subang Jaya, Malaysia
| | - Noor Zaman Jhanjhi
- Department of Computer Science & Engineering, School of Computing & IT (SoCIT), Taylor's University, Subang Jaya, Malaysia
| | - Jacob John
- Department of Restorative Dentistry, Faculty of Dentistry, University of Malaya, Kuala Lumpur, Malaysia
| | - Peyman Obeidy
- Charles Perkins Centre, Faculty of Medicine and Health, University of Sydney, Darlington, NSW, Australia
| | - Shahab S Band
- Future Technology Research Centre, College of Future, National Yunlin University of Science and Technology, Yunlin, Taiwan
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, University of New South Wales (UNSW) Sydney, Kensington, NSW, Australia.,UNSW Data Science Hub, The University of New South Wales, UNSW Sydney, Kensington, NSW, Australia.,Health Data Analytics Program, AI-enabled Processes (AIP) Research Centre, Macquarie University, Macquarie Park, NSW, Australia
| | - Mohammed Baz
- Department of Computer Engineering, College of Computer and Information Technology, Taif University, Taif, Saudi Arabia
| |
Collapse
|
5
|
Dashti H, Dehzangi I, Bayati M, Breen J, Beheshti A, Lovell N, Rabiee HR, Alinejad-Rokny H. Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer. BMC Bioinformatics 2022; 23:138. [PMID: 35439935 PMCID: PMC9017053 DOI: 10.1186/s12859-022-04652-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 03/24/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths worldwide. Recent studies have observed causative mutations in susceptible genes related to colorectal cancer in 10 to 15% of the patients. This highlights the importance of identifying mutations for early detection of this cancer for more effective treatments among high risk individuals. Mutation is considered as the key point in cancer research. Many studies have performed cancer subtyping based on the type of frequently mutated genes, or the proportion of mutational processes. However, to the best of our knowledge, combination of these features has never been used together for this task. This highlights the potential to introduce better and more inclusive subtype classification approaches using wider range of related features to enable biomarker discovery and thus inform drug development for CRC. RESULTS In this study, we develop a new pipeline based on a novel concept called 'gene-motif', which merges mutated gene information with tri-nucleotide motif of mutated sites, for colorectal cancer subtype identification. We apply our pipeline to the International Cancer Genome Consortium (ICGC) CRC samples and identify, for the first time, 3131 gene-motif combinations that are significantly mutated in 536 ICGC colorectal cancer samples. Using these features, we identify seven CRC subtypes with distinguishable phenotypes and biomarkers, including unique cancer related signaling pathways, in which for most of them targeted treatment options are currently available. Interestingly, we also identify several genes that are mutated in multiple subtypes but with unique sequence contexts. CONCLUSION Our results highlight the importance of considering both the mutation type and mutated genes in identification of cancer subtypes and cancer biomarkers. The new CRC subtypes presented in this study demonstrates distinguished phenotypic properties which can be effectively used to develop new treatments. By knowing the genes and phenotypes associated with the subtypes, a personalized treatment plan can be developed that considers the specific phenotypes associated with their genomic lesion.
Collapse
Affiliation(s)
- Hamed Dashti
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - Iman Dehzangi
- Center for Computational and Integrative Biology (CCIB), Rutgers University, Camden, NJ, 08102, USA
| | - Masroor Bayati
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran
| | - James Breen
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.,Robinson Research Institute, University of Adelaide, Adelaide, SA, 5006, Australia.,Bioinformatics Hub, University of Adelaide, Adelaide, SA, 5006, Australia
| | - Amin Beheshti
- Department of Computing, Macquarie University, Sydney, NSW, 2109, Australia
| | - Nigel Lovell
- Tyree Institute of Health Engineering and The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia
| | - Hamid R Rabiee
- Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology, 11365, Tehran, Iran.
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab, The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW, 2052, Australia. .,UNSW Data Science Hub, The University of New South Wales, Sydney, NSW, 2052, Australia. .,Health Data Analytics Program, AI-Enabled Processes (AIP) Research Centre, Macquarie University, Sydney, 2109, Australia.
| |
Collapse
|
6
|
Hybrid Reptile Search Algorithm and Remora Optimization Algorithm for Optimization Tasks and Data Clustering. Symmetry (Basel) 2022. [DOI: 10.3390/sym14030458] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Data clustering is a complex data mining problem that clusters a massive amount of data objects into a predefined number of clusters; in other words, it finds symmetric and asymmetric objects. Various optimization methods have been used to solve different machine learning problems. They usually suffer from local optimal problems and unbalance between the search mechanisms. This paper proposes a novel hybrid optimization method for solving various optimization problems. The proposed method is called HRSA, which combines the original Reptile Search Algorithm (RSA) and Remora Optimization Algorithm (ROA) and handles these mechanisms’ search processes by a novel transition method. The proposed HRSA method aims to avoid the main weaknesses raised by the original methods and find better solutions. The proposed HRSA is tested on solving various complicated optimization problems—twenty-three benchmark test functions and eight data clustering problems. The obtained results illustrate that the proposed HRSA method performs significantly better than the original and comparative state-of-the-art methods. The proposed method overwhelmed all the comparative methods according to the mathematical problems. It obtained promising results in solving the clustering problems. Thus, HRSA has a remarkable efficacy when employed for various clustering problems.
Collapse
|
7
|
Duan Y, Liu C, Li S, Guo X, Yang C. Gradient-based elephant herding optimization for cluster analysis. APPL INTELL 2022; 52:11606-11637. [PMID: 35106027 PMCID: PMC8795968 DOI: 10.1007/s10489-021-03020-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 11/17/2022]
Abstract
Clustering analysis is essential for obtaining valuable information from a predetermined dataset. However, traditional clustering methods suffer from falling into local optima and an overdependence on the quality of the initial solution. Given these defects, a novel clustering method called gradient-based elephant herding optimization for cluster analysis (GBEHO) is proposed. A well-defined set of heuristics is introduced to select the initial centroids instead of selecting random initial points. Specifically, the elephant optimization algorithm (EHO) is combined with the gradient-based algorithm GBO for assigning initial cluster centers across the search space. Second, to overcome the imbalance between the original EHO exploration and exploitation, the initialized population is improved by introducing Gaussian chaos mapping. In addition, two operators, i.e., random wandering and variation operators, are set to adjust the location update strategy of the agents. Nine datasets from synthetic and real-world datasets are adopted to evaluate the effectiveness of the proposed algorithm and the other metaheuristic algorithms. The results show that the proposed algorithm ranks first among the 10 algorithms. It is also extensively compared with state-of-the-art techniques, and four evaluation criteria of accuracy rate, specificity, detection rate, and F-measure are used. The obtained results clearly indicate the excellent performance of GBEHO, while the stability is also more prominent.
Collapse
|
8
|
Wang T, Sun B, Jiang C, Weng H, Chu X. Kernel alignment-based three-way clustering on attribute space and its application in stroke risk identification. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01478-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
New approach to determine the healthy immune variations by combining clustering methods. Sci Rep 2021; 11:8917. [PMID: 33903641 PMCID: PMC8076194 DOI: 10.1038/s41598-021-88272-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Accepted: 04/06/2021] [Indexed: 01/09/2023] Open
Abstract
Immune-mediated inflammatory diseases are characterized by variability in disease presentation and severity but studying it is a challenging task. Defining the limits of a healthy immune system is therefore a prior step to capture variability in disease conditions. The goal of this study is to characterize the global immune cell composition along with their influencing factors. Blood samples were collected from 2 independent cohorts of respectively 389 (exploratory) and 208 (replication) healthy subjects. Twelve immune cells were measured in blood together with biological parameters. Three complementary clustering approaches were used to evaluate if variability related to the immune cells could be characterized as clusters or as a continuum. Large coefficients of variation confirmed the inter-individual variability of immune cells. Considering all subset variations in an overall analysis, it appeared that the immune makeup was organized as a continuum through the two cohorts. Some intrinsic and environmental factors affected the inter-individual variability of cells but without unveiling separable groups with similar features. This study provides a framework based on complementary clustering approach for analyzing inter-individual variability of immune cells. Our analyses support the absence of clusters in our two healthy cohorts. Also, our study reports some influence of age, gender, BMI, cortisol, season and CMV infection on immune variability.
Collapse
|
10
|
VIRMOTIF: A User-Friendly Tool for Viral Sequence Analysis. Genes (Basel) 2021; 12:genes12020186. [PMID: 33514039 PMCID: PMC7911170 DOI: 10.3390/genes12020186] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 01/10/2021] [Accepted: 01/19/2021] [Indexed: 12/16/2022] Open
Abstract
Bioinformatics and computational biology have significantly contributed to the generation of vast and important knowledge that can lead to great improvements and advancements in biology and its related fields. Over the past three decades, a wide range of tools and methods have been developed and proposed to enhance performance, diagnosis, and throughput while maintaining feasibility and convenience for users. Here, we propose a new user-friendly comprehensive tool called VIRMOTIF to analyze DNA sequences. VIRMOTIF brings different tools together as one package so that users can perform their analysis as a whole and in one place. VIRMOTIF is able to complete different tasks, including computing the number or probability of motifs appearing in DNA sequences, visualizing data using the matplotlib and heatmap libraries, and clustering data using four different methods, namely K-means, PCA, Mean Shift, and ClusterMap. VIRMOTIF is the only tool with the ability to analyze genomic motifs based on their frequency and representation (D-ratio) in a virus genome.
Collapse
|