1
|
Huang X, Liu H, Li X, Guan L, Li J, Tellier LCAM, Yang H, Wang J, Zhang J. Revealing Alzheimer's disease genes spectrum in the whole-genome by machine learning. BMC Neurol 2018; 18:5. [PMID: 29320986 PMCID: PMC5763548 DOI: 10.1186/s12883-017-1010-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 12/21/2017] [Indexed: 11/23/2022] Open
Abstract
Background Alzheimer’s disease (AD) is an important, progressive neurodegenerative disease, with a complex genetic architecture. A key goal of biomedical research is to seek out disease risk genes, and to elucidate the function of these risk genes in the development of disease. For this purpose, expanding the AD-associated gene set is necessary. In past research, the prediction methods for AD related genes has been limited in their exploration of the target genome regions. We here present a genome-wide method for AD candidate genes predictions. Methods We present a machine learning approach (SVM), based upon integrating gene expression data with human brain-specific gene network data, to discover the full spectrum of AD genes across the whole genome. Results We classified AD candidate genes with an accuracy and the area under the receiver operating characteristic (ROC) curve of 84.56% and 94%. Our approach provides a supplement for the spectrum of AD-associated genes extracted from more than 20,000 genes in a genome wide scale. Conclusions In this study, we have elucidated the whole-genome spectrum of AD, using a machine learning approach. Through this method, we expect for the candidate gene catalogue to provide a more comprehensive annotation of AD for researchers. Electronic supplementary material The online version of this article (10.1186/s12883-017-1010-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyan Huang
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.,BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Hankui Liu
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Xinming Li
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Liping Guan
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Jiankang Li
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China
| | - Laurent Christian Asker M Tellier
- BGI-Shenzhen, Shenzhen, 518083, China.,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China.,Department of Biology, Bioinformatics, University of Copenhagen, Copenhagen, Denmark
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, 518083, China.,James D. Watson Institute of Genome Sciences, Hangzhou, 310058, China
| | - Jian Wang
- BGI-Shenzhen, Shenzhen, 518083, China.,James D. Watson Institute of Genome Sciences, Hangzhou, 310058, China
| | - Jianguo Zhang
- BGI-Shenzhen, Shenzhen, 518083, China. .,China National GeneBank, BGI-Shenzhen, Shenzhen, 518120, China. .,Shenzhen Key Lab of Neurogenomics, BGI-Shenzhen, Shenzhen, 518120, China.
| |
Collapse
|
2
|
Hossain A, Willan AR, Beyene J. A flexible nonparametric approach to find candidate genes associated with disease in microarray experiments. J Bioinform Comput Biol 2013; 11:1250021. [PMID: 23600812 DOI: 10.1142/s0219720012500217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Very often biologists are interested to know the biological function of a particular gene. Its true biological function may depend on other genes. Finding other genes in the same biological pathway of that gene may enhance further understanding of its biological function. Therefore, we are interested in finding other candidate genes whose expression values are highly correlated with that of a "seed" gene. The "seed" gene, which is known and associated with a disease, is used as a reference to extract candidate genes from microarray experiments and enriched pathways. We propose a nonparametric procedure for selecting the candidate genes. The expression levels for these candidate genes are correlated with that of a "seed" gene in microarray experiments. The proposed test statistic compares two Area Under Receiver Operating Characteristic Curves (AUC) for gene pairs, taking implicit correlation between two AUCs into account. The performance of our method is compared to the other well-known methods through the use of simulation and real data analysis.
Collapse
Affiliation(s)
- Ahmed Hossain
- Dalla Lana School of Public Health, University of Toronto, 155 College Street, Toronto, ON M5T 3M7, Canada.
| | | | | |
Collapse
|
4
|
Scheubert L, Luštrek M, Schmidt R, Repsilber D, Fuellen G. Tissue-based Alzheimer gene expression markers-comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC Bioinformatics 2012; 13:266. [PMID: 23066814 PMCID: PMC3574043 DOI: 10.1186/1471-2105-13-266] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2012] [Accepted: 09/12/2012] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND Alzheimer's disease has been known for more than 100 years and the underlying molecular mechanisms are not yet completely understood. The identification of genes involved in the processes in Alzheimer affected brain is an important step towards such an understanding. Genes differentially expressed in diseased and healthy brains are promising candidates. RESULTS Based on microarray data we identify potential biomarkers as well as biomarker combinations using three feature selection methods: information gain, mean decrease accuracy of random forest and a wrapper of genetic algorithm and support vector machine (GA/SVM). Information gain and random forest are two commonly used methods. We compare their output to the results obtained from GA/SVM. GA/SVM is rarely used for the analysis of microarray data, but it is able to identify genes capable of classifying tissues into different classes at least as well as the two reference methods. CONCLUSION Compared to the other methods, GA/SVM has the advantage of finding small, less redundant sets of genes that, in combination, show superior classification characteristics. The biological significance of the genes and gene pairs is discussed.
Collapse
Affiliation(s)
- Lena Scheubert
- Institute of Computer Science, University of Osnabr¨ uck, Albrechtstr. 28, 49076 Osnabrück, Germany
| | | | | | | | | |
Collapse
|