1
|
Shigemizu D, Akiyama S, Asanomi Y, Boroevich KA, Sharma A, Tsunoda T, Sakurai T, Ozaki K, Ochiya T, Niida S. A comparison of machine learning classifiers for dementia with Lewy bodies using miRNA expression data. BMC Med Genomics 2019; 12:150. [PMID: 31666070 PMCID: PMC6822471 DOI: 10.1186/s12920-019-0607-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 10/18/2019] [Indexed: 12/21/2022] Open
Abstract
Background Dementia with Lewy bodies (DLB) is the second most common subtype of neurodegenerative dementia in humans following Alzheimer’s disease (AD). Present clinical diagnosis of DLB has high specificity and low sensitivity and finding potential biomarkers of prodromal DLB is still challenging. MicroRNAs (miRNAs) have recently received a lot of attention as a source of novel biomarkers. Methods In this study, using serum miRNA expression of 478 Japanese individuals, we investigated potential miRNA biomarkers and constructed an optimal risk prediction model based on several machine learning methods: penalized regression, random forest, support vector machine, and gradient boosting decision tree. Results The final risk prediction model, constructed via a gradient boosting decision tree using 180 miRNAs and two clinical features, achieved an accuracy of 0.829 on an independent test set. We further predicted candidate target genes from the miRNAs. Gene set enrichment analysis of the miRNA target genes revealed 6 functional genes included in the DHA signaling pathway associated with DLB pathology. Two of them were further supported by gene-based association studies using a large number of single nucleotide polymorphism markers (BCL2L1: P = 0.012, PIK3R2: P = 0.021). Conclusions Our proposed prediction model provides an effective tool for DLB classification. Also, a gene-based association test of rare variants revealed that BCL2L1 and PIK3R2 were statistically significantly associated with DLB.
Collapse
Affiliation(s)
- Daichi Shigemizu
- Laboratory Chief, Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu, Aichi, 474-8511, Japan. .,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan. .,RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan. .,CREST, JST, Tokyo, 113-8510, Japan.
| | - Shintaro Akiyama
- Laboratory Chief, Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu, Aichi, 474-8511, Japan
| | - Yuya Asanomi
- Laboratory Chief, Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu, Aichi, 474-8511, Japan
| | - Keith A Boroevich
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Alok Sharma
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 113-8510, Japan.,School of Engineering & Physics, University of the South Pacific, Suva, Fiji.,Institute for Integrated and Intelligent Systems, Griffith University, QLD, Brisbane, 4111, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 113-8510, Japan
| | - Takashi Sakurai
- The Center for Comprehensive Care and Research on Memory Disorders, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan.,Department of Cognitive and Behavioral Science, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 466-8550, Japan
| | - Kouichi Ozaki
- Laboratory Chief, Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu, Aichi, 474-8511, Japan.,RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Takahiro Ochiya
- Division of Molecular and Cellular Medicine, Fundamental Innovative Oncology Core Center, National Cancer Center Research Institute, Tokyo, 104-0045, Japan.,Institute of Medical Science, Tokyo Medical University, Tokyo, 160-8402, Japan
| | - Shumpei Niida
- Laboratory Chief, Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu, Aichi, 474-8511, Japan
| |
Collapse
|
2
|
Mandal A, Maji P. FaRoC: Fast and Robust Supervised Canonical Correlation Analysis for Multimodal Omics Data. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1229-1241. [PMID: 28391216 DOI: 10.1109/tcyb.2017.2685625] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
One of the main problems associated with high dimensional multimodal real life data sets is how to extract relevant and significant features. In this regard, a fast and robust feature extraction algorithm, termed as FaRoC, is proposed, integrating judiciously the merits of canonical correlation analysis (CCA) and rough sets. The proposed method extracts new features sequentially from two multidimensional data sets by maximizing their relevance with respect to class label and significance with respect to already-extracted features. To generate canonical variables sequentially, an analytical formulation is introduced to establish the relation between regularization parameters and CCA. The formulation enables the proposed method to extract required number of correlated features sequentially with lesser computational cost as compared to existing methods. To compute both significance and relevance measures of a feature, the concept of hypercuboid equivalence partition matrix of rough hypercuboid approach is used. It also provides an efficient way to find optimum regularization parameters employed in CCA. The efficacy of the proposed FaRoC algorithm, along with a comparison with other existing methods, is extensively established on several real life data sets.
Collapse
|
3
|
Paul S, Lakatos P, Hartmann A, Schneider-Stock R, Vera J. Identification of miRNA-mRNA Modules in Colorectal Cancer Using Rough Hypercuboid Based Supervised Clustering. Sci Rep 2017; 7:42809. [PMID: 28220871 PMCID: PMC5318911 DOI: 10.1038/srep42809] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/13/2017] [Indexed: 02/06/2023] Open
Abstract
Differences in the expression profiles of miRNAs and mRNAs have been reported in colorectal cancer. Nevertheless, information on important miRNA-mRNA regulatory modules in colorectal cancer is still lacking. In this regard, this study presents an application of the RH-SAC algorithm on miRNA and mRNA expression data for identification of potential miRNA-mRNA modules. First, a set of miRNA rules was generated using the RH-SAC algorithm. The mRNA targets of the selected miRNAs were identified using the miRTarBase database. Next, the expression values of target mRNAs were used to generate mRNA rules using the RH-SAC. Then all miRNA-mRNA rules have been integrated for generating networks. The RH-SAC algorithm unlike other existing methods selects a group of co-expressed miRNAs and mRNAs that are also differentially expressed. In total 17 miRNAs and 141 mRNAs were selected. The enrichment analysis of selected mRNAs revealed that our method selected mRNAs that are significantly associated with colorectal cancer. We identified novel miRNA/mRNA interactions in colorectal cancer. Through experiment, we could confirm that one of our discovered miRNAs, hsa-miR-93-5p, was significantly up-regulated in 75.8% CRC in comparison to their corresponding non-tumor samples. It could have the potential to examine colorectal cancer subtype specific unique miRNA/mRNA interactions.
Collapse
Affiliation(s)
- Sushmita Paul
- Department of Bioscience & Bioengineering, Indian Institute of Technology Jodhpur, India
| | - Petra Lakatos
- Experimental Tumorpathology, Institute of Pathology, University Hospital of Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Arndt Hartmann
- Institute of Pathology, University Hospital of Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Regine Schneider-Stock
- Experimental Tumorpathology, Institute of Pathology, University Hospital of Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Erlangen University Hospital and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
5
|
Paul S, Vera J. Rough hypercuboid based supervised clustering of miRNAs. MOLECULAR BIOSYSTEMS 2016; 11:2068-81. [PMID: 25996345 DOI: 10.1039/c5mb00213c] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The microRNAs are small, endogenous non-coding RNAs found in plants, animals, and some viruses, which function in RNA silencing and post-transcriptional regulation of gene expression. It is suggested by various genome-wide studies that a substantial fraction of miRNA genes is likely to form clusters. The coherent expression of the miRNA clusters can then be used to classify samples according to the clinical outcome. In this regard, a new clustering algorithm, termed as rough hypercuboid based supervised attribute clustering (RH-SAC), is proposed to find such groups of miRNAs. The proposed algorithm is based on the theory of rough set, which directly incorporates the information of sample categories into the miRNA clustering process, generating a supervised clustering algorithm for miRNAs. The effectiveness of the new approach is demonstrated on several publicly available miRNA expression data sets using support vector machine. The so-called B.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. The association of the miRNA clusters to various biological pathways is also shown by doing pathway enrichment analysis.
Collapse
Affiliation(s)
- Sushmita Paul
- Laboratory of Systems Tumor Immunology, Department of Dermatology, University of Erlangen-Nürnberg, Hartmannstr. 14, 91052 Erlangen, Germany.
| | | |
Collapse
|
6
|
Vu T, Sima C, Braga-Neto UM, Dougherty ER. Unbiased bootstrap error estimation for linear discriminant analysis. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:15. [PMID: 28194165 PMCID: PMC5270504 DOI: 10.1186/s13637-014-0015-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 08/18/2014] [Indexed: 11/26/2022]
Abstract
Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.
Collapse
Affiliation(s)
- Thang Vu
- Department of Electrical and Computer Engineering, Texas A&M University, 3128 TAMU, College Station, 77843 TX USA
| | - Chao Sima
- Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, 101 Gateway, Suite A, College Station, 77845 TX USA
| | - Ulisses M Braga-Neto
- Department of Electrical and Computer Engineering, Texas A&M University, 3128 TAMU, College Station, 77843 TX USA.,Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, 101 Gateway, Suite A, College Station, 77845 TX USA
| | - Edward R Dougherty
- Department of Electrical and Computer Engineering, Texas A&M University, 3128 TAMU, College Station, 77843 TX USA.,Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, 101 Gateway, Suite A, College Station, 77845 TX USA
| |
Collapse
|