1
|
Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]
Abstract
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML- and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
Collapse
Affiliation(s)
- Xuelian Jia
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Tong Wang
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| |
Collapse
|
2
|
He J, Li J, Leung K. Dynamic structural analysis-based epitope prediction of Exendin-4 in aqueous solution. Phys Rev E 2023; 108:024403. [PMID: 37723773 DOI: 10.1103/physreve.108.024403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/22/2023] [Indexed: 09/20/2023]
Abstract
The study of epitopes has a broad range of applications in drug discovery, vaccine design, and immunotherapy. In this study, an epitope prediction method was developed based on the dynamic structure of protein antigens. Solvent accessible surface area, charge, and root mean square fluctuation were introduced as the key residue property parameters. The epitope prediction algorithm was established by constructing a three-parameter complex metrics of seven-peptide groups. The method was applied to predict the epitopes of Exendin-4, an effective antidiabetic drug. The epitopes of both the natural and C-terminal amidated forms of Exendin-4 were predicted and compared in their folded and intermediate states. In the folded state, the epitopes of natural Exendin-4 (His1-Phe6 and Asp9-Val19) were found to be nearly identical to the epitopes of C-terminal aminated Exendin-4 (His1-Thr7 and Asp9-Val19). In the intermediate state, however, the epitopes of natural Exendin-4 (His1-Gly4, Phe6 and Lys12-Arg20) covered fewer amino acids than the epitopes of C-terminal aminated Exendin-4 (His1-Gly4, Phe6, Asp9-Val19 and Trp25-Lys27). The comparison with the results from other prediction tools demonstrates the reliability of our predicted epitopes of Exendin-4.
Collapse
Affiliation(s)
- Jianfeng He
- School of Physics, Beijing Institute of Technology, Beijing 100081, People's Republic of China
| | - Jing Li
- Research and Development Center, Beijing Genetech Pharmaceutical Co., Ltd., Beijing 102200, People's Republic of China
| | - Kingsley Leung
- Uni-Bioscience Pharm Company Limited, Hong Kong, People's Republic of China
| |
Collapse
|
3
|
Huang P, Feng Z, Shu X, Wu A, Wang Z, Hu T, Cao Y, Tu Y, Li Z. A bibliometric and visual analysis of publications on artificial intelligence in colorectal cancer (2002-2022). Front Oncol 2023; 13:1077539. [PMID: 36824138 PMCID: PMC9941644 DOI: 10.3389/fonc.2023.1077539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Accepted: 01/27/2023] [Indexed: 02/10/2023] Open
Abstract
Background Colorectal cancer (CRC) has the third-highest incidence and second-highest mortality rate of all cancers worldwide. Early diagnosis and screening of CRC have been the focus of research in this field. With the continuous development of artificial intelligence (AI) technology, AI has advantages in many aspects of CRC, such as adenoma screening, genetic testing, and prediction of tumor metastasis. Objective This study uses bibliometrics to analyze research in AI in CRC, summarize the field's history and current status of research, and predict future research directions. Method We searched the SCIE database for all literature on CRC and AI. The documents span the period 2002-2022. we used bibliometrics to analyze the data of these papers, such as authors, countries, institutions, and references. Co-authorship, co-citation, and co-occurrence analysis were the main methods of analysis. Citespace, VOSviewer, and SCImago Graphica were used to visualize the results. Result This study selected 1,531 articles on AI in CRC. China has published a maximum number of 580 such articles in this field. The U.S. had the most quality publications, boasting an average citation per article of 46.13. Mori Y and Ding K were the two authors with the highest number of articles. Scientific Reports, Cancers, and Frontiers in Oncology are this field's most widely published journals. Institutions from China occupy the top 9 positions among the most published institutions. We found that research on AI in this field mainly focuses on colonoscopy-assisted diagnosis, imaging histology, and pathology examination. Conclusion AI in CRC is currently in the development stage with good prospects. AI is currently widely used in colonoscopy, imageomics, and pathology. However, the scope of AI applications is still limited, and there is a lack of inter-institutional collaboration. The pervasiveness of AI technology is the main direction of future housing development in this field.
Collapse
Affiliation(s)
- Pan Huang
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Zongfeng Feng
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xufeng Shu
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Ahao Wu
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Zhonghao Wang
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Tengcheng Hu
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Yi Cao
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China
| | - Yi Tu
- Department of Pathology, The First Affiliated Hospital of Nanchang University, Nanchang, China,*Correspondence: Yi Tu, ; Zhengrong Li,
| | - Zhengrong Li
- Department of General Surgery, First Affiliated Hospital of Nanchang University, Nanchang, China,Department of Digestive Surgery, Digestive Disease Hospital, The First Affiliated Hospital of Nanchang University, Nanchang, China,Medical Innovation Center, The First Affiliated Hospital of Nanchang University, Nanchang, China,*Correspondence: Yi Tu, ; Zhengrong Li,
| |
Collapse
|
4
|
Abstract
AbstractResearchers are defining new types of interactions between humans and machine learning algorithms generically called human-in-the-loop machine learning. Depending on who is in control of the learning process, we can identify: active learning, in which the system remains in control; interactive machine learning, in which there is a closer interaction between users and learning systems; and machine teaching, where human domain experts have control over the learning process. Aside from control, humans can also be involved in the learning process in other ways. In curriculum learning human domain experts try to impose some structure on the examples presented to improve the learning; in explainable AI the focus is on the ability of the model to explain to humans why a given solution was chosen. This collaboration between AI models and humans should not be limited only to the learning process; if we go further, we can see other terms that arise such as Usable and Useful AI. In this paper we review the state of the art of the techniques involved in the new forms of relationship between humans and ML algorithms. Our contribution is not merely listing the different approaches, but to provide definitions clarifying confusing, varied and sometimes contradictory terms; to elucidate and determine the boundaries between the different methods; and to correlate all the techniques searching for the connections and influences between them.
Collapse
|
5
|
Lombardo T, Duquesnoy M, El-Bouysidy H, Årén F, Gallo-Bueno A, Jørgensen PB, Bhowmik A, Demortière A, Ayerbe E, Alcaide F, Reynaud M, Carrasco J, Grimaud A, Zhang C, Vegge T, Johansson P, Franco AA. Artificial Intelligence Applied to Battery Research: Hype or Reality? Chem Rev 2021; 122:10899-10969. [PMID: 34529918 PMCID: PMC9227745 DOI: 10.1021/acs.chemrev.1c00108] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
![]()
This is a critical
review of artificial intelligence/machine learning
(AI/ML) methods applied to battery research. It aims at providing
a comprehensive, authoritative, and critical, yet easily understandable,
review of general interest to the battery community. It addresses
the concepts, approaches, tools, outcomes, and challenges of using
AI/ML as an accelerator for the design and optimization of the next
generation of batteries—a current hot topic. It intends to
create both accessibility of these tools to the chemistry and electrochemical
energy sciences communities and completeness in terms of the different
battery R&D aspects covered.
Collapse
Affiliation(s)
- Teo Lombardo
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France
| | - Marc Duquesnoy
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France
| | - Hassna El-Bouysidy
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Physics, Chalmers University of Technology, SE-41296 Göteborg, Sweden
| | - Fabian Årén
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Physics, Chalmers University of Technology, SE-41296 Göteborg, Sweden
| | - Alfonso Gallo-Bueno
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Centre for Cooperative Research on Alternative Energies (CIC energiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park, Albert Einstein 48, 01510 Vitoria-Gasteiz, Spain
| | - Peter Bjørn Jørgensen
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej, Building 301, 2800 Kgs. Lyngby, Denmark
| | - Arghya Bhowmik
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej, Building 301, 2800 Kgs. Lyngby, Denmark
| | - Arnaud Demortière
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France
| | - Elixabete Ayerbe
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,CIDETEC, Basque Research and Technology Alliance (BRTA), Po. Miramón 196, 20014 Donostia-San Sebastián, Spain
| | - Francisco Alcaide
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,CIDETEC, Basque Research and Technology Alliance (BRTA), Po. Miramón 196, 20014 Donostia-San Sebastián, Spain
| | - Marine Reynaud
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Centre for Cooperative Research on Alternative Energies (CIC energiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park, Albert Einstein 48, 01510 Vitoria-Gasteiz, Spain
| | - Javier Carrasco
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Centre for Cooperative Research on Alternative Energies (CIC energiGUNE), Basque Research and Technology Alliance (BRTA), Alava Technology Park, Albert Einstein 48, 01510 Vitoria-Gasteiz, Spain
| | - Alexis Grimaud
- Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,UMR CNRS 8260 "Chimie du Solide et Energie", Collège de France, 11 Place Marcelin Berthelot, 75231 Paris Cedex 05, France Sorbonne Universités - UPMC Univ Paris 06, 4 Place Jussieu, F-75005 Paris, France
| | - Chao Zhang
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Chemistry - Ångström Laboratory, Box 538, 75121 Uppsala, Sweden
| | - Tejs Vegge
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Energy Conversion and Storage, Technical University of Denmark, Anker Engelunds Vej, Building 301, 2800 Kgs. Lyngby, Denmark
| | - Patrik Johansson
- ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Department of Physics, Chalmers University of Technology, SE-41296 Göteborg, Sweden
| | - Alejandro A Franco
- Laboratoire de Réactivité et Chimie des Solides (LRCS), UMR CNRS 7314, Université de Picardie Jules Verne, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Réseau sur le Stockage Electrochimique de l'Energie (RS2E), FR CNRS 3459, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,ALISTORE-European Research Institute, FR CNRS 3104, Hub de l'Energie, 15, rue Baudelocque, 80039 Amiens Cedex, France.,Institut Universitaire de France, 103 Boulevard Saint Michel, 75005 Paris, France
| |
Collapse
|
6
|
|
7
|
Yan M, Wang X, Wang B, Chang M, Muhammad I. Bearing remaining useful life prediction using support vector machine and hybrid degradation tracking model. ISA TRANSACTIONS 2020; 98:471-482. [PMID: 31492470 DOI: 10.1016/j.isatra.2019.08.058] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 08/24/2019] [Accepted: 08/28/2019] [Indexed: 06/10/2023]
Abstract
Rolling element bearing is one of the critical components in rotating machines, and its running state determines machinery Remaining Useful Life (RUL). Estimating impending failure and predicting RUL of bearing is beneficial to schedule maintenance strategy and avoid abrupt shutdowns. This paper presents a novel method of RUL prediction of bearings, which can evaluate the degradation stage of bearings through dimensionless measurements and exploit the optimal RUL prediction through hybrid degradation tracing model in degradation stage. Two new measurements reflect the vibration intensity of bearings regarding normal vibration value. They can eliminate individual differences of bearings, improve sensitivity to the incipient defect of bearings, and reduce fluctuation. Moreover, they are helpful to detect the time to start prediction and set dimensionless failure threshold. SVM classifier is used to assess the degradation stage of bearing, which shows a high classification accuracy because of its excellent generalization ability and mathematical foundation. As input, the fitted measurements based on the generalized degradation model are used to train the SVM classifier. As output, five degradation stages are defined. However, actual measurements are used as inputs in the prediction process. According to the classification results, a hybrid degradation tracing model is utilized to exploit the optimal RUL prediction by tracking the degradation process of bearings. The proposed method is validated on the public IMS and PRONOSTIA bearing datasets, and its performance is compared with other methods on PRONOSTIA bearing datasets. The results show that the proposed approach is an effective way for RUL prediction of bearings within the prescribed error range. Given that the proposed measurements are dimensionless, this method can be applied under different operating conditions.
Collapse
Affiliation(s)
- Mingming Yan
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, China
| | - Xingang Wang
- School of Control and Engineering, Northeastern University, Qinhuangdao, 066004, China.
| | - Bingxiang Wang
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, China
| | - Miaoxin Chang
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, China
| | - Isyaku Muhammad
- School of Mechanical Engineering and Automation, Northeastern University, Shenyang, 110819, China
| |
Collapse
|
8
|
Frey LJ, Talbert DA. Artificial Intelligence Pipeline to Bridge the Gap between Bench Researchers and Clinical Researchers in Precision Medicine. MED ONE 2020; 5:10.20900/mo20200001. [PMID: 33511289 PMCID: PMC7839064 DOI: 10.20900/mo20200001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Precision medicine informatics is a field of research that incorporates learning systems that generate new knowledge to improve individualized treatments using integrated data sets and models. Given the ever-increasing volumes of data that are relevant to patient care, artificial intelligence (AI) pipelines need to be a central component of such research to speed discovery. Applying AI methodology to complex multidisciplinary information retrieval can support efforts to discover bridging concepts within collaborating communities. This dovetails with precision medicine research, given the information rich multi-omic data that are used in precision medicine analysis pipelines. In this perspective article we define a prototype AI pipeline to facilitate discovering research connections between bioinformatics and clinical researchers. We propose building knowledge representations that are iteratively improved through AI and human-informed learning feedback loops supported through crowdsourcing. To illustrate this, we will explore the specific use case of nonalcoholic fatty liver disease, a growing health care problem. We will examine AI pipeline construction and utilization in relation to bench-to-bedside bridging concepts with interconnecting knowledge representations applicable to bioinformatics researchers and clinicians.
Collapse
Affiliation(s)
- Lewis J. Frey
- Department of Public Health Science, Biomedical Informatics Center, Hollings Cancer Center, Medical University of South Carolina (MUSC), 135 Cannon St, Charleston, SC 29425, USA
- Health Equity and Rural Outreach Innovation Center (HEROIC), Ralph H. Johnson Veteran Affairs Medical Center, Charleston, SC 29401, USA
| | - Douglas A. Talbert
- Department of Computer Science, Tennessee Tech University (TTU), 1 William L Jones Dr, Cookeville, TN 38505, USA
| |
Collapse
|
9
|
Non-convex approximation based l0-norm multiple indefinite kernel feature selection. APPL INTELL 2020. [DOI: 10.1007/s10489-018-01407-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
10
|
Bremer Hinckel BC, Marlais T, Airs S, Bhattacharyya T, Imamura H, Dujardin JC, El-Safi S, Singh OP, Sundar S, Falconar AK, Andersson B, Litvinov S, Miles MA, Mertens P. Refining wet lab experiments with in silico searches: A rational quest for diagnostic peptides in visceral leishmaniasis. PLoS Negl Trop Dis 2019; 13:e0007353. [PMID: 31059497 PMCID: PMC6522066 DOI: 10.1371/journal.pntd.0007353] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 05/16/2019] [Accepted: 04/01/2019] [Indexed: 11/19/2022] Open
Abstract
Background The search for diagnostic biomarkers has been profiting from a growing number of high quality sequenced genomes and freely available bioinformatic tools. These can be combined with wet lab experiments for a rational search. Improved, point-of-care diagnostic tests for visceral leishmaniasis (VL), early case detection and surveillance are required. Previous investigations demonstrated the potential of IgG1 as a biomarker for monitoring clinical status in rapid diagnostic tests (RDTs), although using a crude lysate antigen (CLA) as capturing antigen. Replacing the CLA by specific antigens would lead to more robust RDTs. Methodology Immunoblots revealed L. donovani protein bands detected by IgG1 from VL patients. Upon confident identification of these antigens by mass spectrometry (MS), we searched for evidence of constitutive protein expression and presence of antigenic domains or high accessibility to B-cells. Selected candidates had their linear epitopes mapped with in silico algorithms. Multiple high-scoring predicted epitopes from the shortlisted proteins were screened in peptide arrays. The most promising candidate was tested in RDT prototypes using VL and nonendemic healthy control (NEHC) patient sera. Results Over 90% of the proteins identified from the immunoblots did not satisfy the selection criteria and were excluded from the downstream epitope mapping. Screening of predicted epitope peptides from the shortlisted proteins identified the most reactive, for which the sensitivity for IgG1 was 84% (95% CI 60—97%) with Sudanese VL sera on RDT prototypes. None of the sera from NEHCs were positive. Conclusion We employed in silico searches to reduce drastically the output of wet lab experiments, focusing on promising candidates containing selected protein features. By predicting epitopes in silico we screened a large number of peptides using arrays, identifying the most promising one, for which IgG1 sensitivity and specificity, with limited sample size, supported this proof of concept strategy for diagnostics discovery, which can be applied to the development of more robust IgG1 RDTs for monitoring clinical status in VL. Visceral leishmaniasis (VL) is a neglected tropical disease caused by protozoan parasites of the Leishmania donovani complex. Without treatment, VL is fatal. Although diagnostic techniques, mainly based on the detection of anti-Leishmania antibodies are available, invasive procedures such as microscopy from spleen or bone marrow aspirates are still required for the diagnosis of seronegative VL suspects, for the detection of recurrent cases and to confirm cure after successful treatment. Previous investigations showed the potential of IgG1 as a biomarker of post-chemotherapeutic relapse for VL in rapid diagnostic tests (RDTs) sensitised with crude lysate antigen (CLA). Here we employed in silico tools to search for desired protein features in a large number of L. donovani antigens detected by human IgG1 in western blots. We then employed prediction algorithms to profile epitopes from the shortlisted proteins. We screened a panel of high-scoring peptides in a high-throughput manner using arrays, with low reagent consumption. The most reactive peptide was adapted to RDTs, showing promising results of both sensitivity and specificity. This peptide has the potential of replacing the CLAs in IgG1 RDTs. Thus we believe that in silico tools can be used to optimise wet lab experiments for a rational search of biomarkers.
Collapse
Affiliation(s)
- Bruno Cesar Bremer Hinckel
- Coris BioConcept, Gembloux, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- * E-mail:
| | - Tegwen Marlais
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Stephanie Airs
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Tapan Bhattacharyya
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Hideo Imamura
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | | | - Sayda El-Safi
- Faculty of Medicine, University of Khartoum, Khartoum, Sudan
| | - Om Prakash Singh
- Department of Medicine, Institute of Medical Sciences, Banaras Hindu University, Varanasi, Uttar Pradesh, India
| | - Shyam Sundar
- Department of Medicine, Institute of Medical Sciences, Banaras Hindu University, Varanasi, Uttar Pradesh, India
| | | | - Bjorn Andersson
- Department of Cell- and Molecular Biology, Karolinska Institutet, Stockholm, Sweden
| | | | - Michael A. Miles
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | | |
Collapse
|
11
|
Maniruzzaman M, Jahanur Rahman M, Ahammed B, Abedin MM, Suri HS, Biswas M, El-Baz A, Bangeas P, Tsoulfas G, Suri JS. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 176:173-193. [PMID: 31200905 DOI: 10.1016/j.cmpb.2019.04.008] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Revised: 02/28/2019] [Accepted: 04/08/2019] [Indexed: 02/08/2023]
Abstract
OBJECTIVE A colon microarray data is a repository of thousands of gene expressions with different strengths for each cancer cell. It is necessary to detect which genes are responsible for cancer growth. This study presents an exhaustive comparative study of different machine learning (ML) systems which serves two major purposes: (a) identification of high risk differential genes using statistical tests and (b) development of a ML strategy for predicting cancer genes. METHODS Four statistical tests namely: Wilcoxon sign rank sum (WCSRS), t test, Kruskal-Wallis (KW), and F-test were adapted for cancerous gene identification using their p-values. The extracted gene set was used to classify cancer patients using ten classifiers namely: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB), Gaussian process classification (GPC), support vector machine (SVM), artificial neural network (ANN), logistic regression (LR), decision tree (DT), Adaboost (AB), and random forest (RF). Performance was then evaluated using cross-validation protocols and standardized metrics viz. accuracy (ACC) and area under the curve (AUC). RESULTS The colon cancer dataset consists of 2000 genes from 62 patients (40 cancer vs. 22 control). The overall mean ACC of our ML system using all four statistical tests and all ten classifiers was 90.50%. The ML system showed an ACC of 99.81% using a combination WCSRS test and RF-based classifier. This is an improvement of 8% over previously published values in literature. CONCLUSIONS RF-based model with statistical tests for detection of high risk genes showed the best performance for accurate cancer classification in multi-center clinical trials.
Collapse
Affiliation(s)
- Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh; Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | | | | | - Mainak Biswas
- Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA
| | - Ayman El-Baz
- Department of Bioengineering, University of Louisville, Louisville, Kentucky, USA
| | - Petros Bangeas
- Department of Surgery, Papageorgiou Hospital, Aristotle University Thessaloniki, Greece
| | - Georgios Tsoulfas
- Department of Surgery, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Jasjit S Suri
- Advanced Knowledge Engineering Centre, Global Biomedical Technologies, Inc., Roseville, CA, USA; AtheroPoint, Roseville, CA, USA.
| |
Collapse
|
12
|
Gene selection from large-scale gene expression data based on fuzzy interactive multi-objective binary optimization for medical diagnosis. Biocybern Biomed Eng 2018. [DOI: 10.1016/j.bbe.2018.02.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Banjar H, Adelson D, Brown F, Chaudhri N. Intelligent Techniques Using Molecular Data Analysis in Leukaemia: An Opportunity for Personalized Medicine Support System. BIOMED RESEARCH INTERNATIONAL 2017; 2017:3587309. [PMID: 28812013 PMCID: PMC5547708 DOI: 10.1155/2017/3587309] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 06/12/2017] [Accepted: 06/15/2017] [Indexed: 12/05/2022]
Abstract
The use of intelligent techniques in medicine has brought a ray of hope in terms of treating leukaemia patients. Personalized treatment uses patient's genetic profile to select a mode of treatment. This process makes use of molecular technology and machine learning, to determine the most suitable approach to treating a leukaemia patient. Until now, no reviews have been published from a computational perspective concerning the development of personalized medicine intelligent techniques for leukaemia patients using molecular data analysis. This review studies the published empirical research on personalized medicine in leukaemia and synthesizes findings across studies related to intelligence techniques in leukaemia, with specific attention to particular categories of these studies to help identify opportunities for further research into personalized medicine support systems in chronic myeloid leukaemia. A systematic search was carried out to identify studies using intelligence techniques in leukaemia and to categorize these studies based on leukaemia type and also the task, data source, and purpose of the studies. Most studies used molecular data analysis for personalized medicine, but future advancement for leukaemia patients requires molecular models that use advanced machine-learning methods to automate decision-making in treatment management to deliver supportive medical information to the patient in clinical practice.
Collapse
Affiliation(s)
- Haneen Banjar
- School of Computer Science, University of Adelaide, Adelaide, SA, Australia
- Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - David Adelson
- School of Molecular and Biomedical Science, University of Adelaide, Adelaide, SA, Australia
| | - Fred Brown
- School of Computer Science, University of Adelaide, Adelaide, SA, Australia
| | - Naeem Chaudhri
- Oncology Centre, Section of Hematology, HSCT, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|
14
|
Kuo RJ, Huang SBL, Zulvia FE, Liao TW. Artificial bee colony-based support vector machines with feature selection and parameter optimization for rule extraction. Knowl Inf Syst 2017. [DOI: 10.1007/s10115-017-1083-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
15
|
Du W, Cao Z, Song T, Li Y, Liang Y. A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min 2017; 10:4. [PMID: 28184251 PMCID: PMC5288949 DOI: 10.1186/s13040-017-0124-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 01/11/2017] [Indexed: 11/28/2022] Open
Abstract
Background With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification. Results In this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set. Conclusions We not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements. Electronic supplementary material The online version of this article (doi:10.1186/s13040-017-0124-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Du
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Zhongbo Cao
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.,School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, 130012 China
| | - Tianci Song
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Yanchun Liang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.,Zhuhai Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai, 519041 China
| |
Collapse
|
16
|
Potocnakova L, Bhide M, Pulzova LB. An Introduction to B-Cell Epitope Mapping and In Silico Epitope Prediction. J Immunol Res 2016; 2016:6760830. [PMID: 28127568 PMCID: PMC5227168 DOI: 10.1155/2016/6760830] [Citation(s) in RCA: 198] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 11/21/2016] [Accepted: 12/13/2016] [Indexed: 01/09/2023] Open
Abstract
Identification of B-cell epitopes is a fundamental step for development of epitope-based vaccines, therapeutic antibodies, and diagnostic tools. Epitope-based antibodies are currently the most promising class of biopharmaceuticals. In the last decade, in-depth in silico analysis and categorization of the experimentally identified epitopes stimulated development of algorithms for epitope prediction. Recently, various in silico tools are employed in attempts to predict B-cell epitopes based on sequence and/or structural data. The main objective of epitope identification is to replace an antigen in the immunization, antibody production, and serodiagnosis. The accurate identification of B-cell epitopes still presents major challenges for immunologists. Advances in B-cell epitope mapping and computational prediction have yielded molecular insights into the process of biorecognition and formation of antigen-antibody complex, which may help to localize B-cell epitopes more precisely. In this paper, we have comprehensively reviewed state-of-the-art experimental methods for B-cell epitope identification, existing databases for epitopes, and novel in silico resources and prediction tools available online. We have also elaborated new trends in the antibody-based epitope prediction. The aim of this review is to assist researchers in identification of B-cell epitopes.
Collapse
Affiliation(s)
- Lenka Potocnakova
- Laboratory of Biomedical Microbiology and Immunology, Department of Microbiology and Immunology, The University of Veterinary Medicine and Pharmacy in Kosice, 041 81 Kosice, Slovakia
| | - Mangesh Bhide
- Laboratory of Biomedical Microbiology and Immunology, Department of Microbiology and Immunology, The University of Veterinary Medicine and Pharmacy in Kosice, 041 81 Kosice, Slovakia
- Institute of Neuroimmunology of Slovak Academy of Sciences, 845 10 Bratislava, Slovakia
| | - Lucia Borszekova Pulzova
- Laboratory of Biomedical Microbiology and Immunology, Department of Microbiology and Immunology, The University of Veterinary Medicine and Pharmacy in Kosice, 041 81 Kosice, Slovakia
| |
Collapse
|
17
|
Zhongxin W, Gang S, Jing Z, Jia Z. Feature Selection Algorithm Based on Mutual Information and Lasso for Microarray Data. ACTA ACUST UNITED AC 2016. [DOI: 10.2174/1874070701610010278] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
With the development of microarray technology, massive microarray data is produced by gene expression experiments, and it provides a new approach for the study of human disease. Due to the characteristics of high dimensionality, much noise and data redundancy for microarray data, it is difficult to my knowledge from microarray data profoundly and accurately,and it also brings enormous difficulty for information genes selection. Therefore, a new feature selection algorithm for high dimensional microarray data is proposed in this paper, which mainly involves two steps. In the first step, mutual information method is used to calculate all genes, and according to the mutual information value, information genes is selected as candidate genes subset and irrelevant genes are filtered. In the second step, an improved method based on Lasso is used to select information genes from candidate genes subset, which aims to remove the redundant genes. Experimental results show that the proposed algorithm can select fewer genes, and it has better classification ability, stable performance and strong generalization ability. It is an effective genes feature selection algorithm.
Collapse
|
18
|
Zhang Z, Ma H, Fu H, Zhang C. Scene-free multi-class weather classification on single images. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.05.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
19
|
Niño-Sandoval TC, Guevara Perez SV, González FA, Jaque RA, Infante-Contreras C. An automatic method for skeletal patterns classification using craniomaxillary variables on a Colombian population. Forensic Sci Int 2015; 261:159.e1-6. [PMID: 26782070 DOI: 10.1016/j.forsciint.2015.12.025] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 12/02/2015] [Accepted: 12/15/2015] [Indexed: 10/22/2022]
Abstract
BACKGROUND The mandibular bone is an important part of the forensic facial reconstruction and it has the possibility of getting lost in skeletonized remains; for this reason, it is necessary to facilitate the identification process simulating the mandibular position only through craniomaxillary measures, for this task, different modeling techniques have been performed, but they only contemplate a straight facial profile that belong to skeletal pattern Class I, but the 24.5% corresponding to the Colombian skeletal patterns Class II and III are not taking into account, besides, craniofacial measures do not follow a parametric trend or a normal distribution. OBJECTIVE The aim of this study was to employ an automatic non-parametric method as the Support Vector Machines to classify skeletal patterns through craniomaxillary variables, in order to simulate the natural mandibular position on a contemporary Colombian sample. MATERIALS AND METHODS Lateral cephalograms (229) of Colombian young adults of both sexes were collected. Landmark coordinates protocols were used to create craniomaxillary variables. A Support Vector Machine with a linear kernel classifier model was trained on a subset of the available data and evaluated over the remaining samples. The weights of the model were used to select the 10 best variables for classification accuracy. RESULTS An accuracy of 74.51% was obtained, defined by Pr-A-N, N-Pr-A, A-N-Pr, A-Te-Pr, A-Pr-Rhi, Rhi-A-Pr, Pr-A-Te, Te-Pr-A, Zm-A-Pr and PNS-A-Pr angles. The Class Precision and the Class Recall showed a correct distinction of the Class II from the Class III and vice versa. CONCLUSIONS Support Vector Machines created an important model of classification of skeletal patterns using craniomaxillary variables that are not commonly used in the literature and could be applicable to the 24.5% of the contemporary Colombian sample.
Collapse
Affiliation(s)
- Tania Camila Niño-Sandoval
- Universidad Nacional de Colombia - Bogotá. Faculty of Dentistry, Oral Health Department. Master of Dentistry. Craniofacial Growth and Development Research Group. Genetics Institute, Cll 53 - Cra. 37 Ed. 426 Of. 213. Bogotá Colombia.
| | - Sonia V Guevara Perez
- Universidad Nacional de Colombia - Sede Bogotá. Faculty of Dentistry, Oral Health Department-Orthodontics. Craniofacial Growth and Development Research Group. 11001 Bogotá Colombia.
| | - Fabio A González
- Universidad Nacional de Colombia - Bogotá, Faculty of Engineering, Computing Systems and Industrial Engineering Department, MindLab Research Group, Carrera 30 45-03, Bogotá Colombia.
| | - Robinson Andrés Jaque
- Universidad Nacional de Colombia - Bogotá, Faculty of Engineering, Computing Systems and Industrial Engineering Department, MindLab Research Group, Carrera 30 45-03, Bogotá Colombia.
| | - Clementina Infante-Contreras
- Universidad Nacional de Colombia - Bogotá. Faculty of Dentistry, Oral Health Department. Master of Dentistry. Craniofacial Growth and Development Research Group. Genetics Institute, Cll 53 - Cra. 37 Ed. 426 Of. 213. Bogotá Colombia.
| |
Collapse
|
20
|
Banwait JK, Bastola DR. Contribution of bioinformatics prediction in microRNA-based cancer therapeutics. Adv Drug Deliv Rev 2015; 81:94-103. [PMID: 25450261 DOI: 10.1016/j.addr.2014.10.030] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 10/13/2014] [Accepted: 10/30/2014] [Indexed: 12/15/2022]
Abstract
Despite enormous efforts, cancer remains one of the most lethal diseases in the world. With the advancement of high throughput technologies massive amounts of cancer data can be accessed and analyzed. Bioinformatics provides a platform to assist biologists in developing minimally invasive biomarkers to detect cancer, and in designing effective personalized therapies to treat cancer patients. Still, the early diagnosis, prognosis, and treatment of cancer are an open challenge for the research community. MicroRNAs (miRNAs) are small non-coding RNAs that serve to regulate gene expression. The discovery of deregulated miRNAs in cancer cells and tissues has led many to investigate the use of miRNAs as potential biomarkers for early detection, and as a therapeutic agent to treat cancer. Here we describe advancements in computational approaches to predict miRNAs and their targets, and discuss the role of bioinformatics in studying miRNAs in the context of human cancer.
Collapse
Affiliation(s)
- Jasjit K Banwait
- College of Information Science and Technology, University of Nebraska at Omaha, 1110 South 67th Street, PKI 172, Omaha, NE 68106, USA.
| | - Dhundy R Bastola
- College of Information Science and Technology, University of Nebraska at Omaha, 1110 South 67th Street, PKI 172, Omaha, NE 68106, USA.
| |
Collapse
|
21
|
Farquad M, Ravi V, Raju SB. Churn prediction using comprehensible support vector machine: An analytical CRM application. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.01.031] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
22
|
Luque-Baena RM, Urda D, Subirats JL, Franco L, Jerez JM. Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data. Theor Biol Med Model 2014; 11 Suppl 1:S7. [PMID: 25077572 PMCID: PMC4108856 DOI: 10.1186/1742-4682-11-s1-s7] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information. Methods Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome. Results Better cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings. Conclusions This study shows that if prediction accuracy is the objective, the GA-based approach lead to better results respect to the SFS approach, independently of the classifier used. Regarding classifiers, even if C-MANTEC did not achieve the best overall results, the performance was competitive with a very robust behaviour in terms of the parameters of the algorithm, and thus it can be considered as a candidate technique for future studies.
Collapse
|
23
|
Huang YH. A note on hyper ellipse method for classifying biological and medical data. Comput Biol Med 2013; 43:1978-86. [DOI: 10.1016/j.compbiomed.2013.08.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2011] [Revised: 08/12/2013] [Accepted: 08/15/2013] [Indexed: 11/28/2022]
|
24
|
|
25
|
Chen ZY, Fan ZP. Parallel multiple kernel learning: a hybrid alternating direction method of multipliers. Knowl Inf Syst 2013. [DOI: 10.1007/s10115-013-0655-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
26
|
Chen ZY, Fan ZP. Dynamic customer lifetime value prediction using longitudinal data: An improved multiple kernel SVR approach. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2013.01.022] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Wu X, Zhu X, He Y, Arslan AN. PMBC: pattern mining from biological sequences with wildcard constraints. Comput Biol Med 2013; 43:481-92. [PMID: 23566394 DOI: 10.1016/j.compbiomed.2013.02.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2008] [Revised: 02/05/2013] [Accepted: 02/07/2013] [Indexed: 11/25/2022]
Abstract
Patterns/subsequences frequently appearing in sequences provide essential knowledge for domain experts, such as molecular biologists, to discover rules or patterns hidden behind the data. Due to the inherent complex nature of the biological data, patterns rarely exactly reproduce and repeat themselves, but rather appear with a slightly different form in each of its appearances. A gap constraint (In this paper, a gap constraint (also referred to as a wildcard) is a character that can be substituted for any character predefined in an alphabet.) provides flexibility for users to capture useful patterns even if their appearances vary in the sequences. In order to find patterns, existing tools require users to explicitly specify gap constraints beforehand. In reality, it is often nontrivial or time-consuming for users to provide proper gap constraint values. In addition, a change made to the gap values may give completely different results, and require a separate time-consuming re-mining procedure. Therefore, it is desirable to automatically and efficiently find patterns without involving user-specified gap requirements. In this paper, we study the problem of frequent pattern mining without user-specified gap constraints and propose PMBC (namely P̲atternM̲ining from B̲iological sequences with wildcard C onstraints) to solve the problem. Given a sequence and a support threshold value (i.e. pattern frequency threshold), PMBC intends to discover all subsequences with their support values equal to or greater than the given threshold value. The frequent subsequences then form patterns later on. Two heuristic methods (one-way vs. two-way scans) are proposed to discover frequent subsequences and estimate their frequency in the sequences. Experimental results on both synthetic and real-world DNA sequences demonstrate the performance of both methods for frequent pattern mining and pattern frequency estimation.
Collapse
Affiliation(s)
- Xindong Wu
- Department of Computer Science, University of Vermont, Burlington, VT 05401, USA.
| | | | | | | |
Collapse
|
28
|
Zhu P, Hu Q. Rule extraction from support vector machines based on consistent region covering reduction. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2012.12.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
29
|
Zhao X, Deng W, Shi Y. Feature Selection with Attributes Clustering by Maximal Information Coefficient. ACTA ACUST UNITED AC 2013. [DOI: 10.1016/j.procs.2013.05.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
30
|
Chen ZY, Fan ZP. Distributed customer behavior prediction using multiplex data: A collaborative MK-SVM approach. Knowl Based Syst 2012. [DOI: 10.1016/j.knosys.2012.04.023] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
31
|
LING YUN, CAO QIUYAN, ZHANG HUA. CREDIT SCORING USING MULTI-KERNEL SUPPORT VECTOR MACHINE AND CHAOS PARTICLE SWARM OPTIMIZATION. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2012. [DOI: 10.1142/s1469026812500198] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Consumer credit scoring is considered as a crucial issue in the credit industry. SVM has been successfully utilized for classification in many areas including credit scoring. Kernel function is vital when applying SVM to classification problem for enhancing the prediction performance. Currently, most of kernel functions used in SVM are single kernel functions such as the radial basis function (RBF) which has been widely used. On the basis of the existing kernel functions, this paper proposes a multi-kernel function to improve the learning and generalization ability of SVM by integrating several single kernel functions. Chaos particle swarm optimization (CPSO) which is a kind of improved PSO algorithm is utilized to optimize parameters and to select features simultaneously. Two UCI credit data sets are used as the experimental data to evaluate the classification performance of the proposed method.
Collapse
Affiliation(s)
- YUN LING
- School of Computer and Information Engineering, Zhejiang Gongshang University, No. 18 Xuezheng Road, Hangzhou, 310018, China
| | - QIUYAN CAO
- School of Computer and Information Engineering, Zhejiang Gongshang University, No. 18 Xuezheng Road, Hangzhou, 310018, China
| | - HUA ZHANG
- School of Computer and Information Engineering, Zhejiang Gongshang University, No. 18 Xuezheng Road, Hangzhou, 310018, China
| |
Collapse
|
32
|
Evolution strategy based adaptive Lq penalty support vector machines with Gauss kernel for credit risk analysis. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2012.04.011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
33
|
|
34
|
Accurate Prediction of Coronary Artery Disease Using Reliable Diagnosis System. J Med Syst 2012; 36:3353-73. [DOI: 10.1007/s10916-012-9828-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 01/30/2012] [Indexed: 10/14/2022]
|
35
|
Florido JP, Pomares H, Rojas I. Generating balanced learning and test sets for function approximation problems. Int J Neural Syst 2011; 21:247-63. [PMID: 21656926 DOI: 10.1142/s0129065711002791] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In function approximation problems, one of the most common ways to evaluate a learning algorithm consists in partitioning the original data set (input/output data) into two sets: learning, used for building models, and test, applied for genuine out-of-sample evaluation. When the partition into learning and test sets does not take into account the variability and geometry of the original data, it might lead to non-balanced and unrepresentative learning and test sets and, thus, to wrong conclusions in the accuracy of the learning algorithm. How the partitioning is made is therefore a key issue and becomes more important when the data set is small due to the need of reducing the pessimistic effects caused by the removal of instances from the original data set. Thus, in this work, we propose a deterministic data mining approach for a distribution of a data set (input/output data) into two representative and balanced sets of roughly equal size taking the variability of the data set into consideration with the purpose of allowing both a fair evaluation of learning's accuracy and to make reproducible machine learning experiments usually based on random distributions. The sets are generated using a combination of a clustering procedure, especially suited for function approximation problems, and a distribution algorithm which distributes the data set into two sets within each cluster based on a nearest-neighbor approach. In the experiments section, the performance of the proposed methodology is reported in a variety of situations through an ANOVA-based statistical study of the results.
Collapse
Affiliation(s)
- J P Florido
- Department of Computer Architecture and Computer Technology, CITIC-UGR, University of Granada, Periodista Daniel Saucedo Aranda, Spain.
| | | | | |
Collapse
|
36
|
Petri T, Küfner R, Zimmer R. Experiment specific expression patterns. J Comput Biol 2011; 18:1423-35. [PMID: 21919744 DOI: 10.1089/cmb.2011.0159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The differential analysis of genes between microarrays from several experimental conditions or treatments routinely estimates which genes change significantly between groups. As genes are never regulated individually, observed behavior may be a consequence of changes in other genes. Existing approaches like co-expression analysis aim to resolve such patterns from a wide range of experiments. The knowledge of such a background set of experiments can be used to compute expected gene behavior based on known links. It is particularly interesting to detect previously unseen specific effects in other experiments. Here, a new method to spot genes deviating from expected behavior (PAttern DEviation SCOring--Padesco) is devised. It uses linear regression models learned from a background set to arrive at gene specific prediction accuracy distributions. For a given experiment, it is then decided whether each gene is predicted better or worse than expected. This provides a novel way to estimate the experiment specificity of each gene. We propose a validation procedure to estimate the detection of such specific candidates and show that these can be identified with an average accuracy of about 85%.
Collapse
Affiliation(s)
- Tobias Petri
- LMU Munich, Department of Informatics, Munich, Germany.
| | | | | |
Collapse
|
37
|
Saei AA, Omidi Y. A glance at DNA microarray technology and applications. BIOIMPACTS : BI 2011; 1:75-86. [PMID: 23678411 DOI: 10.5681/bi.2011.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 07/13/2011] [Accepted: 07/20/2011] [Indexed: 01/06/2023]
Abstract
INTRODUCTION Because of huge impacts of "OMICS" technologies in life sciences, many researchers aim to implement such high throughput approach to address cellular and/or molecular functions in response to any influential intervention in genomics, proteomics, or metabolomics levels. However, in many cases, use of such technologies often encounters some cybernetic difficulties in terms of knowledge extraction from a bunch of data using related softwares. In fact, there is little guidance upon data mining for novices. The main goal of this article is to provide a brief review on different steps of microarray data handling and mining for novices and at last to introduce different PC and/or web-based softwares that can be used in preprocessing and/or data mining of microarray data. METHODS To pursue such aim, recently published papers and microarray softwares were reviewed. RESULTS It was found that defining the true place of the genes in cell networks is the main phase in our understanding of programming and functioning of living cells. This can be obtained with global/selected gene expression profiling. CONCLUSION Studying the regulation patterns of genes in groups, using clustering and classification methods helps us understand different pathways in the cell, their functions, regulations and the way one component in the system affects the other one. These networks can act as starting points for data mining and hypothesis generation, helping us reverse engineer.
Collapse
Affiliation(s)
- Amir Ata Saei
- Research Center for Pharmaceutical Nanotechnology, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran
| | | |
Collapse
|
38
|
|
39
|
|
40
|
Hu Q, Pan W, An S, Ma P, Wei J. An efficient gene selection technique for cancer recognition based on neighborhood mutual information. INT J MACH LEARN CYB 2010. [DOI: 10.1007/s13042-010-0008-6] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
41
|
Barakat N, Bradley AP, Barakat MNH. Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus. ACTA ACUST UNITED AC 2010; 14:1114-20. [DOI: 10.1109/titb.2009.2039485] [Citation(s) in RCA: 177] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
42
|
Guan P, Huang D, He M, Zhou B. Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2009; 28:103. [PMID: 19615083 PMCID: PMC2719616 DOI: 10.1186/1756-9966-28-103] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Accepted: 07/18/2009] [Indexed: 01/13/2023]
Abstract
Background A reliable and precise classification is essential for successful diagnosis and treatment of cancer. Gene expression microarrays have provided the high-throughput platform to discover genomic biomarkers for cancer diagnosis and prognosis. Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips, but also avoid one-sided results of separate experiment. However, only some studies have been aware of the importance of prior information in cancer classification. Methods Together with the application of support vector machine as the discriminant approach, we proposed one modified method that incorporated prior knowledge into cancer classification based on gene expression data to improve accuracy. A public well-known dataset, Malignant pleural mesothelioma and lung adenocarcinoma gene expression database, was used in this study. Prior knowledge is viewed here as a means of directing the classifier using known lung adenocarcinoma related genes. The procedures were performed by software R 2.80. Results The modified method performed better after incorporating prior knowledge. Accuracy of the modified method improved from 98.86% to 100% in training set and from 98.51% to 99.06% in test set. The standard deviations of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set. Conclusion The method that incorporates prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise. This idea may have good future not only in practice but also in methodology.
Collapse
Affiliation(s)
- Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang 110001, PR China.
| | | | | | | |
Collapse
|
43
|
Alladi SM, P SS, Ravi V, Murthy US. Colon cancer prediction with genetic profiles using intelligent techniques. Bioinformation 2008; 3:130-3. [PMID: 19238250 PMCID: PMC2639687 DOI: 10.6026/97320630003130] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Revised: 08/28/2008] [Accepted: 09/13/2008] [Indexed: 11/23/2022] Open
Abstract
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t-statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t-statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques.
Collapse
Affiliation(s)
- Subha Mahadevi Alladi
- Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology, Tarnaka, Hyderabad 500007, Andhra Pradesh, India
| | - Shinde Santosh P
- Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology, Tarnaka, Hyderabad 500007, Andhra Pradesh, India
| | - Vadlamani Ravi
- Institute for Development and Research in Banking Technology, Castle Hills Road, Masab Tank, Hyderabad 500057, India
| | - Upadhyayula Suryanarayana Murthy
- Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology, Tarnaka, Hyderabad 500007, Andhra Pradesh, India
| |
Collapse
|
44
|
Wood SJ, Pantelis C, Velakoulis D, Yücel M, Fornito A, McGorry PD. Progressive changes in the development toward schizophrenia: studies in subjects at increased symptomatic risk. Schizophr Bull 2008; 34:322-9. [PMID: 18199631 PMCID: PMC2632412 DOI: 10.1093/schbul/sbm149] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Although the underlying neurobiology of emerging psychotic disorders is not well understood, there is a growing conviction that the study of patients at clinical high risk for the illness will provide important insights. Further, a better understanding of the transition period may help the development of novel therapies. In this review, we summarize the extant neuroimaging and neuropsychological studies of people at clinical high risk for psychosis. By and large, there are few definitive markers that distinguish those who go on to develop the illness from those who do not. The 2 most consistently abnormal brain regions in schizophrenia research, the hippocampi and the lateral ventricles, are not significantly different from healthy controls prior to psychosis onset. However, frontal lobe measures (eg, cortical thickness in the anterior cingulate) do show promise, as do cognitive measures sensitive to prefrontal cortex dysfunction. Further, longitudinal magnetic resonance imaging findings in individuals at ultrahigh risk for developing a psychotic illness show that there are excessive neuroanatomical changes in those who convert to psychosis. These aberrant changes are observed most prominently in medial temporal and prefrontal cortical regions. While the pathological processes underlying such changes remain unclear, speculatively they may reflect anomalies in genetic and/or other endogenous mechanisms responsible for brain maturation, the adverse effects of intense or prolonged stress, or other environmental factors. Active changes during transition to illness may present the potential to intervene and ameliorate these changes with potential benefit clinically.
Collapse
Affiliation(s)
- Stephen J Wood
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, University of Melbourne, Australia.
| | | | | | | | | | | |
Collapse
|
45
|
|