1
|
Mishra S, Shaw K, Mishra D, Patil S, Kotecha K, Kumar S, Bajaj S. Improving the Accuracy of Ensemble Machine Learning Classification Models Using a Novel Bit-Fusion Algorithm for Healthcare AI Systems. Front Public Health 2022; 10:858282. [PMID: 35602150 PMCID: PMC9114677 DOI: 10.3389/fpubh.2022.858282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 03/15/2022] [Indexed: 12/11/2022] Open
Abstract
Healthcare AI systems exclusively employ classification models for disease detection. However, with the recent research advances into this arena, it has been observed that single classification models have achieved limited accuracy in some cases. Employing fusion of multiple classifiers outputs into a single classification framework has been instrumental in achieving greater accuracy and performing automated big data analysis. The article proposes a bit fusion ensemble algorithm that minimizes the classification error rate and has been tested on various datasets. Five diversified base classifiers k- nearest neighbor (KNN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (D.T.), and Naïve Bayesian Classifier (N.B.), are used in the implementation model. Bit fusion algorithm works on the individual input from the classifiers. Decision vectors of the base classifier are weighted transformed into binary bits by comparing with high-reliability threshold parameters. The output of each base classifier is considered as soft class vectors (CV). These vectors are weighted, transformed and compared with a high threshold value of initialized δ = 0.9 for reliability. Binary patterns are extracted, and the model is trained and tested again. The standard fusion approach and proposed bit fusion algorithm have been compared by average error rate. The error rate of the Bit-fusion algorithm has been observed with the values 5.97, 12.6, 4.64, 0, 0, 27.28 for Leukemia, Breast cancer, Lung Cancer, Hepatitis, Lymphoma, Embryonal Tumors, respectively. The model is trained and tested over datasets from UCI, UEA, and UCR repositories as well which also have shown reduction in the error rates.
Collapse
Affiliation(s)
- Sashikala Mishra
- Symbiosis Institute of Technology, Symbiosis International University, Pune, India
| | - Kailash Shaw
- Symbiosis Institute of Technology, Symbiosis International University, Pune, India
| | - Debahuti Mishra
- Department of Computer Science and Engineering, Siksha O Anusandhan Deemed to be University, Bhubaneshwar, India
| | - Shruti Patil
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Satish Kumar
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Simi Bajaj
- School of Computer Data and Mathematical Sciences, University of Western Sydney, Sydney, NSW, Australia
| |
Collapse
|
2
|
Adams SH, Anthony JC, Carvajal R, Chae L, Khoo CSH, Latulippe ME, Matusheski NV, McClung HL, Rozga M, Schmid CH, Wopereis S, Yan W. Perspective: Guiding Principles for the Implementation of Personalized Nutrition Approaches That Benefit Health and Function. Adv Nutr 2020; 11:25-34. [PMID: 31504115 PMCID: PMC7442375 DOI: 10.1093/advances/nmz086] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 07/17/2019] [Accepted: 07/22/2019] [Indexed: 01/05/2023] Open
Abstract
Personalized nutrition (PN) approaches have been shown to help drive behavior change and positively influence health outcomes. This has led to an increase in the development of commercially available PN programs, which utilize various forms of individual-level information to provide services and products for consumers. The lack of a well-accepted definition of PN or an established set of guiding principles for the implementation of PN creates barriers for establishing credibility and efficacy. To address these points, the North American Branch of the International Life Sciences Institute convened a multidisciplinary panel. In this article, a definition for PN is proposed: "Personalized nutrition uses individual-specific information, founded in evidence-based science, to promote dietary behavior change that may result in measurable health benefits." In addition, 10 guiding principles for PN approaches are proposed: 1) define potential users and beneficiaries; 2) use validated diagnostic methods and measures; 3) maintain data quality and relevance; 4) derive data-driven recommendations from validated models and algorithms; 5) design PN studies around validated individual health or function needs and outcomes; 6) provide rigorous scientific evidence for an effect on health or function; 7) deliver user-friendly tools; 8) for healthy individuals, align with population-based recommendations; 9) communicate transparently about potential effects; and 10) protect individual data privacy and act responsibly. These principles are intended to establish a basis for responsible approaches to the evidence-based research and practice of PN and serve as an invitation for further public dialog. Several challenges were identified for PN to continue gaining acceptance, including defining the health-disease continuum, identification of biomarkers, changing regulatory landscapes, accessibility, and measuring success. Although PN approaches hold promise for public health in the future, further research is needed on the accuracy of dietary intake measurement, utilization and standardization of systems approaches, and application and communication of evidence.
Collapse
Affiliation(s)
- Sean H Adams
- Arkansas Children's Nutrition Center and Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | | | | | - Lee Chae
- Brightseed, San Francisco, CA, USA
| | - Chor San H Khoo
- International Life Sciences Institute North America, Washington, DC, USA
| | - Marie E Latulippe
- International Life Sciences Institute North America, Washington, DC, USA,Address correspondence to MEL (e-mail: )
| | | | - Holly L McClung
- US Army Research Institute of Environmental Medicine, Natick, MA, USA
| | - Mary Rozga
- Academy of Nutrition and Dietetics, Chicago, IL, USA
| | | | - Suzan Wopereis
- Research Group Microbiology & Systems Biology, TNO, Zeist, Netherlands
| | | |
Collapse
|
3
|
|
4
|
Chen B, Altman RB. Opportunities for developing therapies for rare genetic diseases: focus on gain-of-function and allostery. Orphanet J Rare Dis 2017; 12:61. [PMID: 28412959 PMCID: PMC5392956 DOI: 10.1186/s13023-017-0614-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 03/19/2017] [Indexed: 11/28/2022] Open
Abstract
Background Advances in next generation sequencing technologies have revolutionized our ability to discover the causes of rare genetic diseases. However, developing treatments for these diseases remains challenging. In fact, when we systematically analyze the US FDA orphan drug list, we find that only 8% of rare diseases have an FDA-designated drug. Our approach leverages three primary insights: first, diseases with gain-of-function mutations and late onset are more likely to have drug options; second, drugs are more often inhibitors than activators; and third, some disease-causing proteins can be rescued by allosteric activators in diseases due to loss-of-function mutations. Results We have developed a pipeline that combines natural language processing and human curation to mine promising targets for drug development from the Online Mendelian Inheritance in Man (OMIM) database. This pipeline targets diseases caused by well-characterized gain-of-function mutations or loss-of-function proteins with known allosteric activators. Applying this pipeline across thousands of rare genetic diseases, we discover 34 rare genetic diseases that are promising candidates for drug development. Conclusion Our analysis has revealed uneven coverage of rare diseases in the current US FDA orphan drug space. Diseases with gain-of-function mutations or loss-of-function mutations and known allosteric activators should be prioritized for drug treatments. Electronic supplementary material The online version of this article (doi:10.1186/s13023-017-0614-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Binbin Chen
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Russ B Altman
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA. .,Department of Bioengineering, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
5
|
Al-Harazi O, Al Insaif S, Al-Ajlan MA, Kaya N, Dzimiri N, Colak D. Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network. J Genet Genomics 2015; 43:349-67. [PMID: 27318646 DOI: 10.1016/j.jgg.2015.11.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 10/22/2015] [Accepted: 11/20/2015] [Indexed: 12/16/2022]
Abstract
A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Sadiq Al Insaif
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Monirah A Al-Ajlan
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia; College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
| | - Namik Kaya
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Nduna Dzimiri
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Dilek Colak
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia.
| |
Collapse
|
6
|
Wang J, Zhang Y, Marian C, Ressom HW. Identification of aberrant pathways and network activities from high-throughput data. Brief Bioinform 2012; 13:406-19. [PMID: 22287794 PMCID: PMC3404398 DOI: 10.1093/bib/bbs001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Revised: 01/03/2012] [Indexed: 02/06/2023] Open
Abstract
Many complex diseases such as cancer are associated with changes in biological pathways and molecular networks rather than being caused by single gene alterations. A major challenge in the diagnosis and treatment of such diseases is to identify characteristic aberrancies in the biological pathways and molecular network activities and elucidate their relationship to the disease. This review presents recent progress in using high-throughput biological assays to decipher aberrant pathways and network activities. In particular, this review provides specific examples in which high-throughput data have been applied to identify relationships between diseases and aberrant pathways and network activities. The achievements in this field have been remarkable, but many challenges have yet to be addressed.
Collapse
|
7
|
Quiala E, Cañal MJ, Rodríguez R, Yagüe N, Chávez M, Barbón R, Valledor L. Proteomic profiling of Tectona grandisL. leaf. Proteomics 2012; 12:1039-44. [DOI: 10.1002/pmic.201100183] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Elisa Quiala
- Instituto de Biotecnología de Las Plantas; Universidad Central “Marta Abreu” de Las Villas, Santa Clara; Villa Clara,; CP; Cuba
| | - María Jesús Cañal
- Area de Fisiología Vegetal; Dpto. Biología de Organismos y Sistemas, Universidad de Oviedo; C/Catedrático Rodrigo Uría; Oviedo, Asturias; Spain
| | - Roberto Rodríguez
- Area de Fisiología Vegetal; Dpto. Biología de Organismos y Sistemas, Universidad de Oviedo; C/Catedrático Rodrigo Uría; Oviedo, Asturias; Spain
| | - Norma Yagüe
- Area de Fisiología Vegetal; Dpto. Biología de Organismos y Sistemas, Universidad de Oviedo; C/Catedrático Rodrigo Uría; Oviedo, Asturias; Spain
| | - Maité Chávez
- Instituto de Biotecnología de Las Plantas; Universidad Central “Marta Abreu” de Las Villas, Santa Clara; Villa Clara,; CP; Cuba
| | - Raúl Barbón
- Instituto de Biotecnología de Las Plantas; Universidad Central “Marta Abreu” de Las Villas, Santa Clara; Villa Clara,; CP; Cuba
| | | |
Collapse
|