1
|
Santorsola M, Lescai F. The promise of explainable deep learning for omics data analysis: Adding new discovery tools to AI. N Biotechnol 2023; 77:1-11. [PMID: 37329982 DOI: 10.1016/j.nbt.2023.06.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/01/2023] [Accepted: 06/14/2023] [Indexed: 06/19/2023]
Abstract
Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.
Collapse
Affiliation(s)
| | - Francesco Lescai
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
| |
Collapse
|
2
|
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Genome-wide association studies have proven their ability to improve human health outcomes by identifying genotypes associated with phenotypes. Various works have attempted to predict the risk of diseases for individuals based on genotype data. This prediction can either be considered as an analysis model that can lead to a better understanding of gene functions that underlie human disease or as a black box in order to be used in decision support systems and in early disease detection. Deep learning techniques have gained more popularity recently. In this work, we propose a deep-learning framework for disease risk prediction. The proposed framework employs a multilayer perceptron (MLP) in order to predict individuals' disease status. The proposed framework was applied to the Wellcome Trust Case-Control Consortium (WTCCC), the UK National Blood Service (NBS) Control Group, and the 1958 British Birth Cohort (58C) datasets. The performance comparison of the proposed framework showed that the proposed approach outperformed the other methods in predicting disease risk, achieving an area under the curve (AUC) up to 0.94.
Collapse
Affiliation(s)
- Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley PA1 2BE, UK
| |
Collapse
|
3
|
Wang X, Cao X, Feng Y, Guo M, Yu G, Wang J. ELSSI: parallel SNP-SNP interactions detection by ensemble multi-type detectors. Brief Bioinform 2022; 23:6607749. [PMID: 35696639 DOI: 10.1093/bib/bbac213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/18/2022] [Accepted: 05/07/2022] [Indexed: 12/11/2022] Open
Abstract
With the development of high-throughput genotyping technology, single nucleotide polymorphism (SNP)-SNP interactions (SSIs) detection has become an essential way for understanding disease susceptibility. Various methods have been proposed to detect SSIs. However, given the disease complexity and bias of individual SSI detectors, these single-detector-based methods are generally unscalable for real genome-wide data and with unfavorable results. We propose a novel ensemble learning-based approach (ELSSI) that can significantly reduce the bias of individual detectors and their computational load. ELSSI randomly divides SNPs into different subsets and evaluates them by multi-type detectors in parallel. Particularly, ELSSI introduces a four-stage pipeline (generate, score, switch and filter) to iteratively generate new SNP combination subsets from SNP subsets, score the combination subset by individual detectors, switch high-score combinations to other detectors for re-scoring, then filter out combinations with low scores. This pipeline makes ELSSI able to detect high-order SSIs from large genome-wide datasets. Experimental results on various simulated and real genome-wide datasets show the superior efficacy of ELSSI to state-of-the-art methods in detecting SSIs, especially for high-order ones. ELSSI is applicable with moderate PCs on the Internet and flexible to assemble new detectors. The code of ELSSI is available at https://www.sdu-idea.cn/codes.php?name=ELSSI.
Collapse
Affiliation(s)
- Xin Wang
- School of Software, Shandong University, Jinan 250101, China.,Joint SDU-NTU Centre for Artificial Intelligence Research(C-FAIR), Shandong University, Jinan 250101, China
| | - Xia Cao
- College of Computer and Information Sciences, Southwest University, Chongqing 400715, China
| | - Yuantao Feng
- College of Computer and Information Sciences, Southwest University, Chongqing 400715, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, China
| | - Jun Wang
- Joint SDU-NTU Centre for Artificial Intelligence Research(C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
4
|
Abd El Hamid MM, Shaheen M, Mabrouk MS, Omar YMK. MACHINE LEARNING FOR DETECTING EPISTASIS INTERACTIONS AND ITS RELEVANCE TO PERSONALIZED MEDICINE IN ALZHEIMER’S DISEASE: SYSTEMATIC REVIEW. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2021; 33. [DOI: 10.4015/s1016237221500472] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Alzheimer’s disease (AD) is a progressive disease that attacks the brain’s neurons and causes problems in memory, thinking, and reasoning skills. Personalized Medicine (PM) needs a better and more accurate understanding of the relationship between human genetic data and complex diseases like AD. The goal of PM is to tailor the treatment of a case person to his individual properties. PM requires the prediction of a person’s disease from genetic data, and its success depends on the accurate detection of genetic biomarkers. Single Nucleotide polymorphisms (SNPs) are considered the most prevalent type of variation in the human genome. Epistasis has a biological relevance to complex diseases and has an important impact on PM. Detection of the most significant epistasis interactions associated with complex diseases is a big challenge. This paper reviews several machine learning techniques and algorithms to detect the most significant epistasis interactions in Alzheimer’s disease. We discuss many machine learning techniques that can be used for detecting SNPs’ combinations like Random Forests, Support Vector Machines, Multifactor Dimensionality Reduction, Neural Network, and Deep Learning. This review paper highlights the pros and cons of these techniques and explains how they can be applied in an efficient framework to apply knowledge discovery and data mining in AD disease.
Collapse
Affiliation(s)
- Marwa M. Abd El Hamid
- The Higher Institute of Computer Science & Information Technology, El-Shorouk Academy, El Shorouk City, Cairo, Egypt
- College of Computing and Information Technology AASTMT, Egypt
| | - Mohamed Shaheen
- College of Computing and Information Technology AASTMT, Egypt
| | - Mai S. Mabrouk
- Biomedical Engineering Department Misr University for Science and Technology 6th of October City, Egypt
| | | |
Collapse
|
5
|
Soumare H, Rezgui S, Gmati N, Benkahla A. New neural network classification method for individuals ancestry prediction from SNPs data. BioData Min 2021; 14:30. [PMID: 34183066 PMCID: PMC8240223 DOI: 10.1186/s13040-021-00258-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/29/2021] [Indexed: 11/18/2022] Open
Abstract
Artificial Neural Network (ANN) algorithms have been widely used to analyse genomic data. Single Nucleotide Polymorphisms(SNPs) represent the genetic variations, the most common in the human genome, it has been shown that they are involved in many genetic diseases, and can be used to predict their development. Developing ANN to handle this type of data can be considered as a great success in the medical world. However, the high dimensionality of genomic data and the availability of a limited number of samples can make the learning task very complicated. In this work, we propose a New Neural Network classification method based on input perturbation. The idea is first to use SVD to reduce the dimensionality of the input data and to train a classification network, which prediction errors are then reduced by perturbing the SVD projection matrix. The proposed method has been evaluated on data from individuals with different ancestral origins, the experimental results have shown the effectiveness of the proposed method. Achieving up to 96.23% of classification accuracy, this approach surpasses previous Deep learning approaches evaluated on the same dataset.
Collapse
Affiliation(s)
- H. Soumare
- The Laboratory of Mathematical Modelling and Numeric in Engineering Sciences, National Engineering School of Tunis, Rue Béchir Salem Belkhiria Campus universitaire, B.P. 37, 1002 Tunis Belvédère, University of Tunis El Manar, Tunis, Tunisia
- Laboratory of BioInformatics, bioMathematics, and bioStatistics, 13 place Pasteur, B.P. 74 1002 Tunis, Belvédère, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - S. Rezgui
- ADAGOS. Le Belvédère centre, 61 rue El Khartoum, El Menzah, Tunis, Tunisia
| | - N. Gmati
- College of sciences & Basic and Applied Scientific Research Center, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, 31441, Dammam, Kingdom of Saudi Arabia, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - A. Benkahla
- Laboratory of BioInformatics, bioMathematics, and bioStatistics, 13 place Pasteur, B.P. 74 1002 Tunis, Belvédère, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
6
|
Xia J, Xu T, Qing J, Wang L, Tang J. Detection of Single Nucleotide Polymorphisms by Fluorescence Embedded Dye SYBR Green I Based on Graphene Oxide. Front Chem 2021; 9:631959. [PMID: 33869140 PMCID: PMC8044317 DOI: 10.3389/fchem.2021.631959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2020] [Accepted: 02/11/2021] [Indexed: 11/28/2022] Open
Abstract
The detection of single nucleotide polymorphisms (SNPs) is of great significance in the early diagnosis of diseases and the rational use of drugs. Thus, a novel biosensor based on the quenching effect of fluorescence-embedded SYBR Green I (SG) dye and graphene oxide (GO) was introduced in this study. The probe DNA forms a double helix structure with perfectly complementary DNA (pcDNA) and 15 single-base mismatch DNA (smDNA) respectively. SG is highly intercalated with perfectly complementary dsDNA (pc-dsDNA) and exhibits strong fluorescence emission. Single-base mismatch dsDNA (SNPs) has a loose double-stranded structure and exhibits poor SG intercalation and low fluorescence sensing. At this time, the sensor still showed poor SNP discrimination. GO has a strong effect on single-stranded DNA (ssDNA), which can reduce the fluorescence response of probe DNA and eliminate background interference. And competitively combined with ssDNA in SNPs, quenching the fluorescence of SG/SNP, while the fluorescence value of pc-dsDNA was retained, increasing the signal-to-noise ratio. At this time, the sensor has obtained excellent SNP resolution. Different SNPs detect different intensities of fluorescence in the near-infrared region to evaluate the sensor's identification of SNPs. The experimental parameters such as incubation time, incubation temperature and salt concentration were optimized. Under optimal conditions, 1 nM DNA with 0–10 nM linear range and differentiate 5% SNP were achieved. The detection method does not require labeling, is low cost, simple in operation, exhibits high SNP discrimination and can be distinguished by SNP at room temperature.
Collapse
Affiliation(s)
- Jiaoyun Xia
- School of Chemistry and Food Engineering, Changsha University of Science and Technology, Changsha, China
| | - Tong Xu
- School of Chemistry and Food Engineering, Changsha University of Science and Technology, Changsha, China
| | - Jing Qing
- School of Chemistry and Food Engineering, Changsha University of Science and Technology, Changsha, China
| | - Lihua Wang
- Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai, China
| | - Junlong Tang
- School of Physics and Electronic Science, Changsha University of Science and Technology, Changsha, China
| |
Collapse
|
7
|
Manavalan R, Priya S. Genetic interactions effects for cancer disease identification using computational models: a review. Med Biol Eng Comput 2021; 59:733-758. [PMID: 33839998 DOI: 10.1007/s11517-021-02343-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 03/10/2021] [Indexed: 11/29/2022]
Abstract
Genome-wide association studies (GWAS) provide clear insight into understanding genetic variations and environmental influences responsible for various human diseases. Cancer identification through genetic interactions (epistasis) is one of the significant ongoing researches in GWAS. The growth of the cancer cell emerges from multi-locus as well as complex genetic interaction. It is impractical for the physician to detect cancer via manual examination of SNPs interaction. Due to its importance, several computational approaches have been modeled to infer epistasis effects. This article includes a comprehensive and multifaceted review of all relevant genetic studies published between 2001 and 2020. In this contemporary review, various computational methods are as follows: multifactor dimensionality reduction-based approaches, statistical strategies, machine learning, and optimization-based techniques are carefully reviewed and presented with their evaluation results. Moreover, these computational approaches' strengths and limitations are described. The issues behind the computational methods for identifying the cancer disease through genetic interactions and the various evaluation parameters used by researchers have been analyzed. This review is highly beneficial for researchers and medical professionals to learn techniques adapted to discover the epistasis and aids to design novel automatic epistasis detection systems with strong robustness and maximum efficiency to address the different research problems in finding practical solutions effectively.
Collapse
Affiliation(s)
- R Manavalan
- Department of Computer Science, Arignar Anna Government Arts College, Villupuram, Tamil Nadu, 605602, India.
| | - S Priya
- Computer Science, Arignar Anna Government Arts College, Villupuram, Tamil Nadu, India
| |
Collapse
|
8
|
A Self-organizing Deep Auto-Encoder approach for Classification of Complex Diseases using SNP Genomics Data. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106718] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
9
|
Sun Y, Wang X, Shang J, Liu JX, Zheng CH, Lei X. Introducing Heuristic Information Into Ant Colony Optimization Algorithm for Identifying Epistasis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1253-1261. [PMID: 30403637 DOI: 10.1109/tcbb.2018.2879673] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Epistasis learning, which is aimed at detecting associations between multiple Single Nucleotide Polymorphisms (SNPs) and complex diseases, has gained increasing attention in genome wide association studies. Although much work has been done on mapping the SNPs underlying complex diseases, there is still difficulty in detecting epistatic interactions due to the lack of heuristic information to expedite the search process. In this study, a method EACO is proposed to detect epistatic interactions based on the ant colony optimization (ACO) algorithm, the highlights of which are the introduced heuristic information, fitness function, and a candidate solutions filtration strategy. The heuristic information multi-SURF* is introduced into EACO for identifying epistasis, which is incorporated into ant-decision rules to guide the search with linear time. Two functionally complementary fitness functions, mutual information and the Gini index, are combined to effectively evaluate the associations between SNP combinations and the phenotype. Furthermore, a strategy for candidate solutions filtration is provided to adaptively retain all optimal solutions which yields a more accurate way for epistasis searching. Experiments of EACO, as well as three ACO based methods (AntEpiSeeker, MACOED, and epiACO) and four commonly used methods (BOOST, SNPRuler, TEAM, and epiMODE) are performed on both simulation data sets and a real data set of age-related macular degeneration. Results indicate that EACO is promising in identifying epistasis.
Collapse
|
10
|
Cao X, Yu G, Ren W, Guo M, Wang J. DualWMDR: Detecting epistatic interaction with dual screening and multifactor dimensionality reduction. Hum Mutat 2019; 41:719-734. [DOI: 10.1002/humu.23951] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 09/10/2019] [Accepted: 11/07/2019] [Indexed: 12/14/2022]
Affiliation(s)
- Xia Cao
- College of Computer and Information ScienceSouthwest UniversityChongqing China
| | - Guoxian Yu
- College of Computer and Information ScienceSouthwest UniversityChongqing China
| | - Wei Ren
- College of Computer and Information ScienceSouthwest UniversityChongqing China
| | - Maozu Guo
- School of Electrical and Information EngineeringBeijing University of Civil Engineering and ArchitectureBeijing China
- Beijing Key Laboratory of Intelligent Processing for Building Big DataBeijing China
| | - Jun Wang
- College of Computer and Information ScienceSouthwest UniversityChongqing China
| |
Collapse
|
11
|
Liu Y, Wang D, He F, Wang J, Joshi T, Xu D. Phenotype Prediction and Genome-Wide Association Study Using Deep Convolutional Neural Network of Soybean. Front Genet 2019; 10:1091. [PMID: 31824557 PMCID: PMC6883005 DOI: 10.3389/fgene.2019.01091] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 10/09/2019] [Indexed: 12/21/2022] Open
Abstract
Genomic selection uses single-nucleotide polymorphisms (SNPs) to predict quantitative phenotypes for enhancing traits in breeding populations and has been widely used to increase breeding efficiency for plants and animals. Existing statistical methods rely on a prior distribution assumption of imputed genotype effects, which may not fit experimental datasets. Emerging deep learning technology could serve as a powerful machine learning tool to predict quantitative phenotypes without imputation and also to discover potential associated genotype markers efficiently. We propose a deep-learning framework using convolutional neural networks (CNNs) to predict the quantitative traits from SNPs and also to investigate genotype contributions to the trait using saliency maps. The missing values of SNPs are treated as a new genotype for the input of the deep learning model. We tested our framework on both simulation data and experimental datasets of soybean. The results show that the deep learning model can bypass the imputation of missing values and achieve more accurate results for predicting quantitative phenotypes than currently available other well-known statistical methods. It can also effectively and efficiently identify significant markers of SNPs and SNP combinations associated in genome-wide association study.
Collapse
Affiliation(s)
- Yang Liu
- Institute of Data Science and Informatics, University of Missouri, Columbia, MO, United States.,Department of Electrical Engineer and Computer Science, University of Missouri, Columbia, MO, United States
| | - Duolin Wang
- Department of Electrical Engineer and Computer Science, University of Missouri, Columbia, MO, United States.,Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States
| | - Fei He
- Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States.,Department of Computer Science and Information Technology, Northeast Normal University, Changchun, China
| | - Juexin Wang
- Department of Electrical Engineer and Computer Science, University of Missouri, Columbia, MO, United States.,Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States
| | - Trupti Joshi
- Institute of Data Science and Informatics, University of Missouri, Columbia, MO, United States.,Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States.,Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO, United States
| | - Dong Xu
- Institute of Data Science and Informatics, University of Missouri, Columbia, MO, United States.,Department of Electrical Engineer and Computer Science, University of Missouri, Columbia, MO, United States.,Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, United States
| |
Collapse
|
12
|
Romagnoni A, Jégou S, Van Steen K, Wainrib G, Hugot JP. Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Sci Rep 2019; 9:10351. [PMID: 31316157 PMCID: PMC6637191 DOI: 10.1038/s41598-019-46649-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 07/03/2019] [Indexed: 02/08/2023] Open
Abstract
Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information. The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait. ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers.
Collapse
Affiliation(s)
- Alberto Romagnoni
- Centre de recherche sur l'inflammation UMR 1149, Inserm - Université Paris Diderot, 75018, Paris, France.,Data Team, Département d'informatique de l'ENS, École normale supérieure, CNRS, PSL Research University, 75005, Paris, France
| | | | - Kristel Van Steen
- WELBIO, GIGA-R Medical Genomics - BIO3, University of Liège, Liège, Belgium.,Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Gilles Wainrib
- Data Team, Département d'informatique de l'ENS, École normale supérieure, CNRS, PSL Research University, 75005, Paris, France.,Owkin, 75011, Paris, France
| | - Jean-Pierre Hugot
- Centre de recherche sur l'inflammation UMR 1149, Inserm - Université Paris Diderot, 75018, Paris, France. .,Hôpital Robert Debré, Assistance Publique-Hôpitaux de Paris, 75019, Paris, France.
| | | |
Collapse
|
13
|
Jafarpisheh N, Teshnehlab M. Cancers classification based on deep neural networks and emotional learning approach. IET Syst Biol 2018; 12:258-263. [PMID: 30472689 PMCID: PMC8687421 DOI: 10.1049/iet-syb.2018.5002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
In the present era, enormous factors contribute to causing cancer. So cancer classification cannot rely only on doctor's thoughts. As a result, intelligent algorithms concerning doctor's help are inevitable. Therefore, the authors are motivated to suggest a novel algorithm to classify three cancer datasets; colon, ALL‐AML, and leukaemia cancers. Their proposed algorithm is based on the deep neural network and emotional learning process. First of all, by applying the principal component analysis, they had a feature reduction. Then, they used deep neural as a feature extraction. Then, they implemented different classifiers; multi‐layer perceptron, support vector machine (SVM), decision tree, and Gaussian mixture model. In the end, because in the real world, especially when working on systems biology, unpredictable events, and uncertainties are undeniable, the robustness of their model against uncertainties is important. So they added Gaussian noise to the input features of the first encoder in each dataset, then, they applied the stacked denoising method. Experimental results disclosed that, generally, using emotional learning increased the accuracy. In addition, the highest accuracy was gained by SVM, 91.66, 92.27, and 96.56% for colon, ALL‐AML, and leukaemia, respectively. However, GMM led to the lowest accuracy. The best accuracy gained by GMM was 60%.
Collapse
Affiliation(s)
- Noushin Jafarpisheh
- Department of Electrical EngineeringK.N. Toosi University of TechnologyTehranIran
| | - Mohammad Teshnehlab
- Department of Electrical EngineeringK.N. Toosi University of TechnologyTehranIran
| |
Collapse
|
14
|
Uppu S, Krishna A. A deep hybrid model to detect multi-locus interacting SNPs in the presence of noise. Int J Med Inform 2018; 119:134-151. [DOI: 10.1016/j.ijmedinf.2018.09.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 04/13/2018] [Accepted: 09/03/2018] [Indexed: 01/17/2023]
|
15
|
Yang CH, Kao YK, Chuang LY, Lin YD. Catfish Taguchi-Based Binary Differential Evolution Algorithm for Analyzing Single Nucleotide Polymorphism Interactions in Chronic Dialysis. IEEE Trans Nanobioscience 2018; 17:291-299. [PMID: 29994217 DOI: 10.1109/tnb.2018.2844342] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Single-nucleotide polymorphism (SNP)-SNP interactions are crucial for understanding the association between disease-related multifactorials for disease analysis. Existing statistical methods for determining such interactions are limited by the considerable computation required for evaluating all potential associations between disease-related multifactorials. Identifying SNP-SNP interactions is thus a major challenge in genetic association studies. This paper proposes a catfish Taguchi-based binary differential evolution (CT-BDE) algorithm for identifying SNP-SNP interactions. In the search space, the catfish effect prevents the premature convergence of the population, and the Taguchi method improves the search ability of the BDE algorithm. Hence, the proposed algorithm enables obtaining a favorable solution regarding the identification of high-order SNP-SNP interactions. Additionally, the proposed algorithm applies an effective fitness function derived from a multifactor dimensionality reduction (MDR) operation to evaluate the solutions from BDE-based algorithms. Simulated and real data sets were used to evaluate the ability of several BDE-based algorithms in identifying specific SNP-SNP interactions. We compared the fitness function derived from the MDR operation with that derived according to the difference between cases and controls, by using the different BDE-based algorithms. The results showed that the proposed CT-BDE algorithm applying the fitness function derived from the MDR operation exhibited a superior ability in identifying SNP-SNP interactions compared with the other BDE-based algorithms.
Collapse
|
16
|
Kalinin AA, Higgins GA, Reamaroon N, Soroushmehr S, Allyn-Feuer A, Dinov ID, Najarian K, Athey BD. Deep learning in pharmacogenomics: from gene regulation to patient stratification. Pharmacogenomics 2018; 19:629-650. [PMID: 29697304 PMCID: PMC6022084 DOI: 10.2217/pgs-2018-0008] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 03/09/2018] [Indexed: 01/02/2023] Open
Abstract
This Perspective provides examples of current and future applications of deep learning in pharmacogenomics, including: identification of novel regulatory variants located in noncoding domains of the genome and their function as applied to pharmacoepigenomics; patient stratification from medical records; and the mechanistic prediction of drug response, targets and their interactions. Deep learning encapsulates a family of machine learning algorithms that has transformed many important subfields of artificial intelligence over the last decade, and has demonstrated breakthrough performance improvements on a wide range of tasks in biomedicine. We anticipate that in the future, deep learning will be widely used to predict personalized drug response and optimize medication selection and dosing, using knowledge extracted from large and complex molecular, epidemiological, clinical and demographic datasets.
Collapse
Affiliation(s)
- Alexandr A Kalinin
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Statistics Online Computational Resource (SOCR), University of Michigan School of Nursing, Ann Arbor, MI 48109, USA
| | - Gerald A Higgins
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Narathip Reamaroon
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Sayedmohammadreza Soroushmehr
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Ari Allyn-Feuer
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Ivo D Dinov
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Statistics Online Computational Resource (SOCR), University of Michigan School of Nursing, Ann Arbor, MI 48109, USA
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
| | - Kayvan Najarian
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Brian D Athey
- Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, MI 48109, USA
- Department of Internal Medicine, University of Michigan Health System, Ann Arbor, MI 48109, USA
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
17
|
Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
Collapse
|