1
|
Shi L, Hai B, Kuang Z, Wang H, Zhao J. ResnetAge: A Resnet-Based DNA Methylation Age Prediction Method. Bioengineering (Basel) 2023; 11:34. [PMID: 38247911 PMCID: PMC10813502 DOI: 10.3390/bioengineering11010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/13/2023] [Accepted: 12/26/2023] [Indexed: 01/23/2024] Open
Abstract
Aging is a significant contributing factor to degenerative diseases such as cancer. The extent of DNA methylation in human cells indicates the aging process and screening for age-related methylation sites can be used to construct epigenetic clocks. Thereby, it can be a new aging-detecting marker for clinical diagnosis and treatments. Predicting the biological age of human individuals is conducive to the study of physical aging problems. Although many researchers have developed epigenetic clock prediction methods based on traditional machine learning and even deep learning, higher prediction accuracy is still required to match the clinical applications. Here, we proposed an epigenetic clock prediction method based on a Resnet neuro networks model named ResnetAge. The model accepts 22,278 CpG sites as a sample input, supporting both the Illumina 27K and 450K identification frameworks. It was trained using 32 public datasets containing multiple tissues such as whole blood, saliva, and mouth. The Mean Absolute Error (MAE) of the training set is 1.29 years, and the Median Absolute Deviation (MAD) is 0.98 years. The Mean Absolute Error (MAE) of the validation set is 3.24 years, and the Median Absolute Deviation (MAD) is 2.3 years. Our method has higher accuracy in age prediction in comparison with other methylation-based age prediction methods.
Collapse
Affiliation(s)
- Lijuan Shi
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| | - Boquan Hai
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| | - Zhejun Kuang
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| | - Han Wang
- The Institution of Computational Biology of Northeast Normal University, Changchun 130000, China;
| | - Jian Zhao
- Key Laboratory of Intelligent Rehabilitation and Barrier-Free for the Disabled (Changchun University), Ministry of Education, Changchun University, Changchun 130012, China; (L.S.); (B.H.)
- Jilin Provincial Key Laboratory of Human Health Status Identification & Function Enhancement, Changchun 130022, China
| |
Collapse
|
2
|
Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023; 24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.
Collapse
Affiliation(s)
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
- Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
3
|
Xu C, Zhang R, Duan M, Zhou Y, Bao J, Lu H, Wang J, Hu M, Hu Z, Zhou F, Zhu W. A polygenic stacking classifier revealed the complicated platelet transcriptomic landscape of adult immune thrombocytopenia. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 28:477-487. [PMID: 35505964 PMCID: PMC9046129 DOI: 10.1016/j.omtn.2022.04.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/01/2022] [Indexed: 01/19/2023]
Abstract
Immune thrombocytopenia (ITP) is an autoimmune disease with the typical symptom of a low platelet count in blood. ITP demonstrated age and sex biases in both occurrences and prognosis, and adult ITP was mainly induced by the living environments. The current diagnosis guideline lacks the integration of molecular heterogenicity. This study recruited the largest cohort of platelet transcriptome samples. A comprehensive procedure of feature selection, feature engineering, and stacking classification was carried out to detect the ITP biomarkers using RNA sequencing (RNA-seq) transcriptomes. The 40 detected biomarkers were loaded to train the final ITP detection model, with an overall accuracy 0.974. The biomarkers suggested that ITP onset may be associated with various transcribed components, including protein-coding genes, long intergenic non-coding RNA (lincRNA) genes, and pseudogenes with apparent transcriptions. The delivered ITP detection model may also be utilized as a complementary ITP diagnosis tool. The code and the example dataset is freely available on http://www.healthinformaticslab.org/supp/resources.php
Collapse
Affiliation(s)
- Chengfeng Xu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Ruochi Zhang
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Meiyu Duan
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yongming Zhou
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Jizhang Bao
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Hao Lu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Jie Wang
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Minghui Hu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
| | - Zhaoyang Hu
- Fun-Med Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai 201100, China
- Corresponding author Zhaoyang Hu, PhD, Fengneng Pharmaceutical Technology (Shanghai) Co., Ltd., RM. A310, 115 Xinjunhuan Road, Minhang District, Shanghai 201100, China.
| | - Fengfeng Zhou
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- Corresponding author Fengfeng Zhou, PhD, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China.
| | - Wenwei Zhu
- Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China
- Corresponding author Wenwei Zhu, PhD, Department of Hematology, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, 110 Ganhe Road, Hongkou District, Shanghai 200437, China.
| |
Collapse
|
4
|
Bai T, Xu J, Zhang Z, Guo S, Luo X. Context-aware learning for cancer cell nucleus recognition in pathology images. Bioinformatics 2022; 38:2892-2898. [PMID: 35561198 DOI: 10.1093/bioinformatics/btac167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 02/28/2022] [Accepted: 03/17/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Nucleus identification supports many quantitative analysis studies that rely on nuclei positions or categories. Contextual information in pathology images refers to information near the to-be-recognized cell, which can be very helpful for nucleus subtyping. Current CNN-based methods do not explicitly encode contextual information within the input images and point annotations. RESULTS In this article, we propose a novel framework with context to locate and classify nuclei in microscopy image data. Specifically, first we use state-of-the-art network architectures to extract multi-scale feature representations from multi-field-of-view, multi-resolution input images and then conduct feature aggregation on-the-fly with stacked convolutional operations. Then, two auxiliary tasks are added to the model to effectively utilize the contextual information. One for predicting the frequencies of nuclei, and the other for extracting the regional distribution information of the same kind of nuclei. The entire framework is trained in an end-to-end, pixel-to-pixel fashion. We evaluate our method on two histopathological image datasets with different tissue and stain preparations, and experimental results demonstrate that our method outperforms other recent state-of-the-art models in nucleus identification. AVAILABILITY AND IMPLEMENTATION The source code of our method is freely available at https://github.com/qjxjy123/DonRabbit. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tian Bai
- College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, 130012 Changchun, China
| | - Jiayu Xu
- College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, 130012 Changchun, China
| | - Zhenting Zhang
- College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, 130012 Changchun, China
| | - Shuyu Guo
- College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, 130012 Changchun, China
| | - Xiao Luo
- Department of Breast Surgery, China-Japan Union Hospital of Jilin University, 130033 Changchun, China
| |
Collapse
|
5
|
Zoo: Selecting Transcriptomic and Methylomic Biomarkers by Ensembling Animal-Inspired Swarm Intelligence Feature Selection Algorithms. Genes (Basel) 2021; 12:genes12111814. [PMID: 34828418 PMCID: PMC8621246 DOI: 10.3390/genes12111814] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 11/12/2021] [Accepted: 11/15/2021] [Indexed: 02/03/2023] Open
Abstract
Biological omics data such as transcriptomes and methylomes have the inherent “large p small n” paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.
Collapse
|
6
|
Duan M, Zhang L, Wang Y, Fan Y, Liu S, Yu Q, Huang L, Zhou F. Computational pan-cancer characterization of model-based quantitative transcription regulations dysregulated in regional lymph node metastasis. Comput Biol Med 2021; 135:104571. [PMID: 34166881 DOI: 10.1016/j.compbiomed.2021.104571] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 10/21/2022]
Abstract
Cancer is one of the major causes of mortality worldwide. Regional lymph node metastasis is an important mechanism during the spread of human cancers, in which transcription regulation plays an essential role. This study formulated a regression-model-based quantitative transcription regulation (mqTrans) between one mRNA gene and multiple transcription factors (TFs). Computational pan-cancer screening was carried out to detect the quantitative dysregulation of transcription regulation in the regional lymph node metastasis of 18 cancer types. Only a few metastasis-dysregulated mqTrans models were shared among the cancer types. The mRNA genes of the metastasis-dysregulated mqTrans models were not differentially expressed in regional lymph node metastasis. The experimental data suggested that mqTrans technology provided a complementary approach to the evaluation of transcription regulation mechanisms and may facilitate its quantitative investigation in other phenotypes.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Lei Zhang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, Jilin Province, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Yusi Fan
- College of Software, Jilin University, Changchun, Jilin, 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Shuai Liu
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Qiong Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, Jilin Province, China
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
7
|
Deng D, Chen X, Zhang R, Lei Z, Wang X, Zhou F. XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties. J Chem Inf Model 2021; 61:2697-2705. [PMID: 34009965 DOI: 10.1021/acs.jcim.0c01489] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Determining the properties of chemical molecules is essential for screening candidates similar to a specific drug. These candidate molecules are further evaluated for their target binding affinities, side effects, target missing probabilities, etc. Conventional machine learning algorithms demonstrated satisfying prediction accuracies of molecular properties. A molecule cannot be directly loaded into a machine learning model, and a set of engineered features needs to be designed and calculated from a molecule. Such hand-crafted features rely heavily on the experiences of the investigating researchers. The concept of graph neural networks (GNNs) was recently introduced to describe the chemical molecules. The features may be automatically and objectively extracted from the molecules through various types of GNNs, e.g., GCN (graph convolution network), GGNN (gated graph neural network), DMPNN (directed message passing neural network), etc. However, the training of a stable GNN model requires a huge number of training samples and a large amount of computing power, compared with the conventional machine learning strategies. This study proposed the integrated framework XGraphBoost to extract the features using a GNN and build an accurate prediction model of molecular properties using the classifier XGBoost. The proposed framework XGraphBoost fully inherits the merits of the GNN-based automatic molecular feature extraction and XGBoost-based accurate prediction performance. Both classification and regression problems were evaluated using the framework XGraphBoost. The experimental results strongly suggest that XGraphBoost may facilitate the efficient and accurate predictions of various molecular properties. The source code is freely available to academic users at https://github.com/chenxiaowei-vincent/XGraphBoost.git.
Collapse
Affiliation(s)
- Daiguo Deng
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Xiaowei Chen
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Ruochi Zhang
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China.,College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| | - Zengrong Lei
- Fermion Technology Co., Ltd., Guangzhou, Guangdong 510000, P.R. China
| | - Xiaojian Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, P.R. China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, P.R. China
| |
Collapse
|
8
|
RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers. Comput Biol Med 2021; 133:104405. [PMID: 33930763 DOI: 10.1016/j.compbiomed.2021.104405] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/13/2021] [Accepted: 04/13/2021] [Indexed: 12/20/2022]
Abstract
The era of big data introduces both opportunities and challenges for biomedical researchers. One of the inherent difficulties in the biomedical research field is to recruit large cohorts of samples, while high-throughput biotechnologies may produce thousands or even millions of features for each sample. Researchers tend to evaluate the individual correlation of each feature with the class label and use the incremental feature selection (IFS) strategy to select the top-ranked features with the best prediction performance. Recent experimental data showed that a subset of continuously ranked features randomly restarted from a low-ranked feature (an RIFS block) may outperform the subset of top-ranked features. This study proposed a feature selection Algorithm RIFS2D by integrating multiple RIFS blocks. A comprehensive comparative experiment was conducted with the IFS, RIFS and existing feature selection algorithms and demonstrated that a subset of low-ranked features may also achieve promising prediction performance. This study suggested that a prediction model with promising performance may be trained by low-ranked features, even when top-ranked features did not achieve satisfying prediction performance. Further comparative experiments were conducted between RIFS2D and t-tests for the detection of early-stage breast cancer. The data showed that the RIFS2D-recommended features achieved better prediction accuracy and were targeted by more drugs than the t-test top-ranked features.
Collapse
|