1
|
Combrisson E, Di Rienzo F, Saive AL, Perrone-Bertolotti M, Soto JLP, Kahane P, Lachaux JP, Guillot A, Jerbi K. Human local field potentials in motor and non-motor brain areas encode upcoming movement direction. Commun Biol 2024; 7:506. [PMID: 38678058 PMCID: PMC11055917 DOI: 10.1038/s42003-024-06151-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 04/05/2024] [Indexed: 04/29/2024] Open
Abstract
Limb movement direction can be inferred from local field potentials in motor cortex during movement execution. Yet, it remains unclear to what extent intended hand movements can be predicted from brain activity recorded during movement planning. Here, we set out to probe the directional-tuning of oscillatory features during motor planning and execution, using a machine learning framework on multi-site local field potentials (LFPs) in humans. We recorded intracranial EEG data from implanted epilepsy patients as they performed a four-direction delayed center-out motor task. Fronto-parietal LFP low-frequency power predicted hand-movement direction during planning while execution was largely mediated by higher frequency power and low-frequency phase in motor areas. By contrast, Phase-Amplitude Coupling showed uniform modulations across directions. Finally, multivariate classification led to an increase in overall decoding accuracy (>80%). The novel insights revealed here extend our understanding of the role of neural oscillations in encoding motor plans.
Collapse
Affiliation(s)
- Etienne Combrisson
- Psychology Department, University of Montreal, Montreal, QC, Canada.
- University of Lyon, UCBL-Lyon 1, Laboratoire Interuniversitaire de Biologie de la Motricité UR 7424, F-69622, Villeurbanne, France.
- Institut de Neurosciences de la Timone, Aix Marseille Université, UMR 7289 CNRS, 13005, Marseille, France.
| | - Franck Di Rienzo
- University of Lyon, UCBL-Lyon 1, Laboratoire Interuniversitaire de Biologie de la Motricité UR 7424, F-69622, Villeurbanne, France
| | - Anne-Lise Saive
- Psychology Department, University of Montreal, Montreal, QC, Canada
- Cognitive Science Department, Lyfe Research and Innovation Center, Ecully, France
| | | | - Juan L P Soto
- Telecommunications and Control Engineering Department, University of Sao Paulo, Sao Paulo, Brazil
| | - Philippe Kahane
- Université Grenoble Alpes, Inserm, U1216, CHU Grenoble Alpes, Grenoble Institut Neurosciences, GIN, Grenoble, France
| | - Jean-Philippe Lachaux
- Lyon Neuroscience Research Center, EDUWELL team, INSERM UMRS 1028, CNRS UMR 5292, Université Claude Bernard Lyon 1, Université de Lyon, F-69000, Lyon, France
| | - Aymeric Guillot
- University of Lyon, UCBL-Lyon 1, Laboratoire Interuniversitaire de Biologie de la Motricité UR 7424, F-69622, Villeurbanne, France
| | - Karim Jerbi
- Psychology Department, University of Montreal, Montreal, QC, Canada.
- Mila (Quebec AI Institute), montreal, QC, Canada.
- UNIQUE Centre (Quebec Neuro-AI research Center), Montreal, QC, Canada.
| |
Collapse
|
2
|
Hu J, Yang X, Ren J, Zhong S, Fan Q, Li W. Identification and verification of characteristic differentially expressed ferroptosis-related genes in osteosarcoma using bioinformatics analysis. Toxicol Mech Methods 2023; 33:781-795. [PMID: 37488941 DOI: 10.1080/15376516.2023.2240879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 07/17/2023] [Accepted: 07/20/2023] [Indexed: 07/26/2023]
Abstract
BACKGROUND This study identified and verified the characteristic differentially expressed ferroptosis-related genes (CDEFRGs) in osteosarcoma (OS). METHODS We extracted ferroptosis-related genes (FRGs), identified differentially expressed FRGs (DEFRGs) in OS, and conducted correlation analysis between DEFRGs. Next, we conducted GO and KEGG analyses to explore the biological functions and pathways of DEFRGs. Furthermore, we used LASSO and SVM-RFE algorithms to screen CDEFRGs, and evaluated its accuracy in diagnosing OS through ROC curves. Then, we demonstrated the molecular function and pathway enrichment of CDEFRGs through GSEA analysis. In addition, we evaluated the differences in immune cell infiltration between OS and NC groups, as well as the correlation between CDEFRGs expressions and immune cell infiltrations. Finally, the expression of CDEFRGs was verified through qRT-PCR, western blotting, and immunohistochemistry experiments. RESULTS We identified 51 DEFRGs and the expression relationship between them. GO and KEGG analysis revealed their key functions and important pathways. Based on four CDEFRGs (PEX3, CPEB1, NOX1, and ALOX5), we built the OS diagnostic model, and verified its accuracy. GSEA analysis further revealed the important functions and pathways of CDEFRGs. In addition, there were differences in immune cell infiltration between OS group and NC group, and CDEFRGs showed significant correlation with certain infiltrating immune cells. Finally, we validated the differential expression levels of four CDEFRGs through external experiments. CONCLUSIONS This study has shed light on the molecular pathological mechanism of OS and has offered novel perspectives for the early diagnosis and immune-targeted therapy of OS patients.
Collapse
Affiliation(s)
- Jianhua Hu
- Department of Orthopedic Surgery, The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, P. R. China
- Faculty of Medical Science, Kunming University of Science and Technology, Kunming, P. R. China
| | - Xi Yang
- Department of Orthopedic Surgery, The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, P. R. China
- Yunnan Key Laboratory of Digital Orthopaedics, Kunming, P. R. China
| | - Jing Ren
- Department of Spinal Surgery, Qujing No. 1 Hospital, Affiliated Qujing Hospital of Kunming Medical University, Qujing, P. R. China
| | - Shixiao Zhong
- Faculty of Medical Science, Kunming University of Science and Technology, Kunming, P. R. China
- Yunnan Key Laboratory of Digital Orthopaedics, Kunming, P. R. China
| | - Qianbo Fan
- Faculty of Medical Science, Kunming University of Science and Technology, Kunming, P. R. China
- Yunnan Key Laboratory of Digital Orthopaedics, Kunming, P. R. China
| | - Weichao Li
- Department of Orthopedic Surgery, The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, P. R. China
- Faculty of Medical Science, Kunming University of Science and Technology, Kunming, P. R. China
- Yunnan Key Laboratory of Digital Orthopaedics, Kunming, P. R. China
| |
Collapse
|
3
|
Chromosomal abnormality, laboratory techniques, tools and databases in molecular Cytogenetics. Mol Biol Rep 2020; 47:9055-9073. [DOI: 10.1007/s11033-020-05895-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 10/03/2020] [Indexed: 11/30/2022]
|
4
|
He B, Zhang Y, Zhou Z, Wang B, Liang Y, Lang J, Lin H, Bing P, Yu L, Sun D, Luo H, Yang J, Tian G. A Neural Network Framework for Predicting the Tissue-of-Origin of 15 Common Cancer Types Based on RNA-Seq Data. Front Bioeng Biotechnol 2020; 8:737. [PMID: 32850691 PMCID: PMC7419649 DOI: 10.3389/fbioe.2020.00737] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/10/2020] [Indexed: 12/19/2022] Open
Abstract
Sequencing-based identification of tumor tissue-of-origin (TOO) is critical for patients with cancer of unknown primary lesions. Even if the TOO of a tumor can be diagnosed by clinicopathological observation, reevaluations by computational methods can help avoid misdiagnosis. In this study, we developed a neural network (NN) framework using the expression of a 150-gene panel to infer the tumor TOO for 15 common solid tumor cancer types, including lung, breast, liver, colorectal, gastroesophageal, ovarian, cervical, endometrial, pancreatic, bladder, head and neck, thyroid, prostate, kidney, and brain cancers. To begin with, we downloaded the RNA-Seq data of 7,460 primary tumor samples across the above mentioned 15 cancer types, with each type of cancer having between 142 and 1,052 samples, from the cancer genome atlas. Then, we performed feature selection by the Pearson correlation method and performed a 150-gene panel analysis; the genes were significantly enriched in the GO:2001242 Regulation of intrinsic apoptotic signaling pathway and the GO:0009755 Hormone-mediated signaling pathway and other similar functions. Next, we developed a novel NN model using the 150 genes to predict tumor TOO for the 15 cancer types. The average prediction sensitivity and precision of the framework are 93.36 and 94.07%, respectively, for the 7,460 tumor samples based on the 10-fold cross-validation; however, the prediction sensitivity and precision for a few specific cancers, like prostate cancer, reached 100%. We also tested the trained model on a 20-sample independent dataset with metastatic tumor, and achieved an 80% accuracy. In summary, we present here a highly accurate method to infer tumor TOO, which has potential clinical implementation.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha, China
| | | | - Zhen Zhou
- Department of Radiology, Beijing Chest Hospital, Capital Medical University, Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, China
| | - Bo Wang
- Geneis (Beijing) Co., Ltd., Beijing, China
| | | | | | - Huixin Lin
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Lan Yu
- Inner Mongolia People's Hospital, Huhhot, China
| | - Dejun Sun
- Inner Mongolia People's Hospital, Huhhot, China
| | - Huaiqing Luo
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China.,Geneis (Beijing) Co., Ltd., Beijing, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing, China
| |
Collapse
|
5
|
Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand. INFORMATION 2019. [DOI: 10.3390/info10060187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.
Collapse
|
6
|
Bhowmick SS, Bhattacharjee D, Rato L. Identification of tissue-specific tumor biomarker using different optimization algorithms. Genes Genomics 2018; 41:431-443. [PMID: 30535858 DOI: 10.1007/s13258-018-0773-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 12/03/2018] [Indexed: 11/25/2022]
Abstract
BACKGROUND Identification of differentially expressed genes, i.e., genes whose transcript abundance level differs across different biological or physiological conditions, was indeed a challenging task. However, the inception of transcriptome sequencing (RNA-seq) technology revolutionized the simultaneous measurement of the transcript abundance levels for thousands of genes. OBJECTIVE In this paper, such next-generation sequencing (NGS) data is used to identify biomarker signatures for several of the most common cancer types (bladder, colon, kidney, brain, liver, lung, prostate, skin, and thyroid) METHODS: Here, the problem is mapped into the comparison of optimization algorithms for selecting a set of genes that lead to the highest classification accuracy of a two-class classification task between healthy and tumor samples. As the optimization algorithms Artificial Bee Colony (ABC), Ant Colony Optimization, Differential Evolution, and Particle Swarm Optimization are chosen for this experiment. A standard statistical method called DESeq2 is used to select differentially expressed genes before being feed to the optimization algorithms. Classification of healthy and tumor samples is done by support vector machine RESULTS: Cancer-specific validation yields remarkably good results in terms of accuracy. Highest classification accuracy is achieved by the ABC algorithm for Brain lower grade glioma data is 99.10%. This validation is well supported by a statistical test, gene ontology enrichment analysis, and KEGG pathway enrichment analysis for each cancer biomarker signature CONCLUSION: The current study identified robust genes as biomarker signatures and these identified biomarkers might be helpful to accurately identify tumors of unknown origin.
Collapse
Affiliation(s)
- Shib Sankar Bhowmick
- Department of Electronics and Communication Engineering, Heritage Institute of Technology, Kolkata, 700107, India.
| | - Debotosh Bhattacharjee
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| | - Luis Rato
- Department of Informatics, University of Evora, 7004-516, Evora, Portugal
| |
Collapse
|
7
|
Zhu X, Wang Y, Li Y, Tan Y, Wang G, Song Q. A new unsupervised feature selection algorithm using similarity-based feature clustering. Comput Intell 2018. [DOI: 10.1111/coin.12192] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xiaoyan Zhu
- School of Electronic and Information Engineering; Xi'an Jiaotong University; Xi'an China
| | - Yu Wang
- School of Electronic and Information Engineering; Xi'an Jiaotong University; Xi'an China
| | - Yingbin Li
- School of Electronic and Information Engineering; Xi'an Jiaotong University; Xi'an China
| | - Yonghui Tan
- School of Electronic and Information Engineering; Xi'an Jiaotong University; Xi'an China
| | | | - Qinbao Song
- School of Electronic and Information Engineering; Xi'an Jiaotong University; Xi'an China
| |
Collapse
|
8
|
Crabtree NM, Moore JH, Bowyer JF, George NI. Multi-class computational evolution: development, benchmark evaluation and application to RNA-Seq biomarker discovery. BioData Min 2017; 10:13. [PMID: 28450890 PMCID: PMC5404302 DOI: 10.1186/s13040-017-0134-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 04/18/2017] [Indexed: 11/10/2022] Open
Abstract
Background A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. Results The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. Conclusion The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features. Electronic supplementary material The online version of this article (doi:10.1186/s13040-017-0134-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nathaniel M Crabtree
- Bioinformatics, Department of Information Science, University of Arkansas at Little Rock and University of Arkansas for Medical Sciences Joint Bioinformatics Graduate Program, Little Rock, AR USA
| | - Jason H Moore
- Division of Informatics, Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6021 USA
| | - John F Bowyer
- Division of Neurotoxicology, National Center for Toxicological Research, FDA, Jefferson, AR USA
| | - Nysia I George
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR USA
| |
Collapse
|
9
|
Zhang F, Bohlen P, Lewek MD, Huang H. Prediction of Intrinsically Caused Tripping Events in Individuals With Stroke. IEEE Trans Neural Syst Rehabil Eng 2016; 25:1202-1210. [PMID: 27740490 DOI: 10.1109/tnsre.2016.2614521] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This study investigated the feasibility of predicting intrinsically caused trips (ICTs) in individuals with stroke. Gait kinematics collected from 12 individuals with chronic stroke, who demonstrated ICTs in treadmill walking, were analyzed. A prediction algorithm based on the outlier principle was employed. Sequential forward selection (SFS) and minimum-redundancy-maximum-relevance (mRMR) were used separately to identify the precursors for accurate ICT prediction. The results showed that it was feasible to predict ICTs around 50-260 ms before ICTs occurred in the swing phase by monitoring lower limb kinematics during the preceding stance phase. Both SFS and mRMR were effective in identifying the precursors of ICTs. For 9 out of the 12 subjects, the paretic lower limb's shank orientation in the sagittal plane and the vertical velocity of the paretic foot's center of gravity were important in predicting ICTs accurately; the averaged area under receiver operating characteristic curve achieved 0.95 and above. For the other three subjects, kinematics of the less affected limb or proximal joints in the paretic side were identified as the precursors to an ICT, potentially due to the variations of neuromotor deficits among stroke survivors. Although additional engineering efforts are still needed to address the challenges in making our design clinically practical, the outcome of this study may lead to further proactive engineering mechanisms for ICT avoidance and therefore reduce the risk of falls in individuals with stroke.
Collapse
|
10
|
An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset. ScientificWorldJournal 2015; 2015:821798. [PMID: 26491718 PMCID: PMC4601565 DOI: 10.1155/2015/821798] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 08/14/2015] [Accepted: 08/20/2015] [Indexed: 11/17/2022] Open
Abstract
Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets.
Collapse
|
11
|
Joshi D, Nakamura BH, Hahn ME. High energy spectrogram with integrated prior knowledge for EMG-based locomotion classification. Med Eng Phys 2015; 37:518-24. [DOI: 10.1016/j.medengphy.2015.03.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 12/22/2014] [Accepted: 03/16/2015] [Indexed: 11/25/2022]
|
12
|
Source selection for real-time user intent recognition toward volitional control of artificial legs. IEEE J Biomed Health Inform 2015; 17:907-14. [PMID: 25055369 DOI: 10.1109/jbhi.2012.2236563] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Various types of data sources have been used to recognize user intent for volitional control of powered artificial legs. However, there is still a debate on what exact data sources are necessary for accurately and responsively recognizing the user's intended tasks. Motivated by this widely interested question, in this study we aimed to 1) investigate the usefulness of different data sources commonly suggested for user intent recognition and 2) determine an informative set of data sources for volitional control of prosthetic legs. The studied data sources included eight surface electromyography (EMG) signals from the residual thigh muscles of transfemoral (TF) amputees, ground reaction forces/moments from a prosthetic pylon, and kinematic measurements from the residual thigh and prosthetic knee. We then ranked and included data sources based on the usefulness for user intent recognition and selected a reduced number of data sources that ensured accurate recognition of the user's intended task by using three source selection algorithms. The results showed that EMG signals and ground reaction forces/moments were more informative than prosthesis kinematics. Nine to eleven of all the initial data sources were sufficient to maintain 95% accuracy for recognizing the studied seven tasks without missing additional task transitions in real time. The selected data sources produced consistent system performance across two experimental days for four recruited TF amputee subjects, indicating the potential robustness of the selected data sources. Finally, based on the study results, we suggested a protocol for determining the informative data sources and sensor configurations for future development of volitional control of powered artificial legs.
Collapse
|
13
|
Guha S, Ji Y, Baladandayuthapani V. Bayesian disease classification using copy number data. Cancer Inform 2014; 13:83-91. [PMID: 25336897 PMCID: PMC4196891 DOI: 10.4137/cin.s13785] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 07/29/2014] [Accepted: 07/29/2014] [Indexed: 11/30/2022] Open
Abstract
DNA copy number variations (CNVs) have been shown to be associated with cancer development and progression. The detection of these CNVs has the potential to impact the basic knowledge and treatment of many types of cancers, and can play a role in the discovery and development of molecular-based personalized cancer therapies. One of the most common types of high-resolution chromosomal microarrays is array-based comparative genomic hybridization (aCGH) methods that assay DNA CNVs across the whole genomic landscape in a single experiment. In this article we propose methods to use aCGH profiles to predict disease states. We employ a Bayesian classification model and treat disease states as outcome, and aCGH profiles as covariates in order to identify significant regions of the genome associated with disease subclasses. We propose a principled two-stage method where we first make inferences on the underlying copy number states associated with the aCGH emissions based on hidden Markov model (HMM) formulations to account for serial dependencies in neighboring probes. Subsequently, we infer associations with disease outcomes, conditional on the copy number states, using Bayesian linear variable selection procedures. The selected probes and their effects are parameters that are useful for predicting the disease categories of any additional individuals on the basis of their aCGH profiles. Using simulated datasets, we investigate the method’s accuracy in detecting disease category. Our methodology is motivated by and applied to a breast cancer dataset consisting of aCGH profiles assayed on patients from multiple disease subtypes.
Collapse
Affiliation(s)
- Subharup Guha
- Department of Statistics, University of Missouri, Columbia, MO, USA
| | - Yuan Ji
- Center for Biomedical Informatics, North Shore University Health System, Evanston, IL, USA. ; Department of Health Studies, The University of Chicago, IL, USA
| | | |
Collapse
|
14
|
Metsis V, Makedon F, Shen D, Huang H. DNA Copy Number Selection Using Robust Structured Sparsity-Inducing Norms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:168-181. [PMID: 26355516 DOI: 10.1109/tcbb.2013.141] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Array comparative genomic hybridization (aCGH) is a newly introduced method for the detection of copy number abnormalities associated with human diseases with special focus on cancer. Specific patterns in DNA copy number variations (CNVs) can be associated with certain disease types and can facilitate prognosis and progress monitoring of the disease. Machine learning techniques have been used to model the problem of tissue typing as a classification problem. Feature selection is an important part of the classification process, because many biological features are not related to the diseases and confuse the classification tasks. Multiple feature selection methods have been proposed in the different domains where classification has been applied. In this work, we will present a new feature selection method based on structured sparsity-inducing norms to identify the informative aCGH biomarkers which can help us classify different disease subtypes. To validate the performance of the proposed method, we experimentally compare it with existing feature selection methods on four publicly available aCGH data sets. In all empirical results, the proposed sparse learning based feature selection method consistently outperforms other related approaches. More important, we carefully investigate the aCGH biomarkers selected by our method, and the biological evidences in literature strongly support our results.
Collapse
|
15
|
Krajewski Z, Tkacz E. Feature Selection of Protein Structural Classification Using SVM Classifier. Biocybern Biomed Eng 2013. [DOI: 10.1016/s0208-5216(13)70055-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
16
|
Abstract
Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN).
Collapse
|
17
|
Tian Z, Kuang R. Integrative classification and analysis of multiple arrayCGH datasets with probe alignment. Bioinformatics 2010; 26:2313-20. [DOI: 10.1093/bioinformatics/btq428] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
18
|
Han X. Nonnegative principal component analysis for cancer molecular pattern discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:537-549. [PMID: 20671323 DOI: 10.1109/tcbb.2009.36] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
As a well-established feature selection algorithm, principal component analysis (PCA) is often combined with the state-of-the-art classification algorithms to identify cancer molecular patterns in microarray data. However, the algorithm's global feature selection mechanism prevents it from effectively capturing the latent data structures in the high-dimensional data. In this study, we investigate the benefit of adding nonnegative constraints on PCA and develop a nonnegative principal component analysis algorithm (NPCA) to overcome the global nature of PCA. A novel classification algorithm NPCA-SVM is proposed for microarray data pattern discovery. We report strong classification results from the NPCA-SVM algorithm on five benchmark microarray data sets by direct comparison with other related algorithms. We have also proved mathematically and interpreted biologically that microarray data will inevitably encounter overfitting for an SVM/PCA-SVM learning machine under a Gaussian kernel. In addition, we demonstrate that nonnegative principal component analysis can be used to capture meaningful biomarkers effectively.
Collapse
Affiliation(s)
- Xiaoxu Han
- Department of Mathematics, Eastern Michgan University, Ypsilanti, MI 48197, USA.
| |
Collapse
|
19
|
Kim KY, Kim J, Kim HJ, Nam W, Cha IH. A method for detecting significant genomic regions associated with oral squamous cell carcinoma using aCGH. Med Biol Eng Comput 2010; 48:459-68. [DOI: 10.1007/s11517-010-0595-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 02/26/2010] [Indexed: 12/14/2022]
|
20
|
Bandyopadhyay N, Kahveci T, Goodison S, Sun Y, Ranka S. Pathway-BasedFeature Selection Algorithm for Cancer Microarray Data. Adv Bioinformatics 2010; 2009:532989. [PMID: 20204186 PMCID: PMC2831238 DOI: 10.1155/2009/532989] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 11/30/2009] [Indexed: 01/09/2023] Open
Abstract
Classification of cancers based on gene expressions produces better accuracy when compared to that of the clinical markers. Feature selection improves the accuracy of these classification algorithms by reducing the chance of overfitting that happens due to large number of features. We develop a new feature selection method called Biological Pathway-based Feature Selection (BPFS) for microarray data. Unlike most of the existing methods, our method integrates signaling and gene regulatory pathways with gene expression data to minimize the chance of overfitting of the method and to improve the test accuracy. Thus, BPFS selects a biologically meaningful feature set that is minimally redundant. Our experiments on published breast cancer datasets demonstrate that all of the top 20 genes found by our method are associated with cancer. Furthermore, the classification accuracy of our signature is up to 18% better than that of vant Veers 70 gene signature, and it is up to 8% better accuracy than the best published feature selection method, I-RELIEF.
Collapse
Affiliation(s)
- Nirmalya Bandyopadhyay
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Tamer Kahveci
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Steve Goodison
- Anderson Cancer Center Orlando, Cancer Research Institute Orlando, FL 32827, USA
| | - Y. Sun
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL 32611, USA
| | - Sanjay Ranka
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
21
|
Pok G, Liu JCS, Ryu KH. Effective feature selection framework for cluster analysis of microarray data. Bioinformation 2010; 4:385-9. [PMID: 20975903 PMCID: PMC2951666 DOI: 10.6026/97320630004385] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2009] [Revised: 02/18/2010] [Accepted: 02/24/2010] [Indexed: 11/29/2022] Open
Abstract
The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results.
Collapse
Affiliation(s)
- Gouchol Pok
- Yanbian University of science and Technology, Dept. of Computer Science, Yanji, Jilin, China 133000
| | | | - Keun Ho Ryu
- Chungbuk National University, DB Bioinformatics Lab, Cheongju, Chungbuk, Korea
| |
Collapse
|
22
|
Kim KY, Lee GY, Kim J, Jeung HC, Chung HC, Rha SY. Identification of significant regional genetic variations using continuous CNV values in aCGH data. Genomics 2009; 94:317-23. [DOI: 10.1016/j.ygeno.2009.08.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Revised: 07/20/2009] [Accepted: 08/11/2009] [Indexed: 11/26/2022]
|
23
|
Huang J, Salim A, Lei K, O'Sullivan K, Pawitan Y. Classification of array CGH data using smoothed logistic regression model. Stat Med 2009; 28:3798-810. [DOI: 10.1002/sim.3753] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
24
|
|