1
|
Gupta A, Kumar S, Kumar A. Big Data in Bioinformatics and Computational Biology: Basic Insights. Methods Mol Biol 2024; 2719:153-166. [PMID: 37803117 DOI: 10.1007/978-1-0716-3461-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
The human genome was first sequenced in 1994. It took 10 years of cooperation between numerous international research organizations to reveal a preliminary human DNA sequence. Genomics labs can now sequence an entire genome in only a few days. Here, we talk about how the advent of high-performance sequencing platforms has paved the way for Big Data in biology and contributed to the development of modern bioinformatics, which in turn has helped to expand the scope of biology and allied sciences. New technologies and methodologies for the storage, management, analysis, and visualization of big data have been shown to be necessary. Not only does modern bioinformatics have to deal with the challenge of processing massive amounts of heterogeneous data, but it also has to deal with different ways of interpreting and presenting those results, as well as the use of different software programs and file formats. Solutions to these problems are tried to present in this chapter. In order to store massive amounts of data and provide a reasonable period for completing search queries, new database management systems other than relational ones will be necessary. Emerging advance programing approaches, such as machine learning, Hadoop, and MapReduce, aim to provide the capacity to easily construct one's own scripts for data processing and address the issue of the diversity of genomic and proteomic data formats in bioinformatics.
Collapse
Affiliation(s)
- Aanchal Gupta
- University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
| | - Shubham Kumar
- University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
| | - Ashwani Kumar
- University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
| |
Collapse
|
2
|
Chadha A, Dara R, Pearl DL, Sharif S, Poljak Z. Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques. Prev Vet Med 2023; 216:105924. [PMID: 37224663 DOI: 10.1016/j.prevetmed.2023.105924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 03/17/2023] [Accepted: 04/21/2023] [Indexed: 05/26/2023]
Abstract
Over the past decades, avian influenza (AI) outbreaks have been reported across different parts of the globe, resulting in large-scale economic and livestock loss and, in some cases raising concerns about their zoonotic potential. The virulence and pathogenicity of H5Nx (e.g., H5N1, H5N2) AI strains for poultry could be inferred through various approaches, and it has been frequently performed by detecting certain pathogenicity markers in their haemagglutinin (HA) gene. The utilization of predictive modeling methods represents a possible approach to exploring this genotypic-phenotypic relationship for assisting experts in determining the pathogenicity of circulating AI viruses. Therefore, the main objective of this study was to evaluate the predictive performance of different machine learning (ML) techniques for in-silico prediction of pathogenicity of H5Nx viruses in poultry, using complete genetic sequences of the HA gene. We annotated 2137 H5Nx HA gene sequences based on the presence of the polybasic HA cleavage site (HACS) with 46.33% and 53.67% of sequences previously identified as highly pathogenic (HP) and low pathogenic (LP), respectively. We compared the performance of different ML classifiers (e.g., logistic regression (LR) with the lasso and ridge regularization, random forest (RF), K-nearest neighbor (KNN), Naïve Bayes (NB), support vector machine (SVM), and convolutional neural network (CNN)) for pathogenicity classification of raw H5Nx nucleotide and protein sequences using a 10-fold cross-validation technique. We found that different ML techniques can be successfully used for the pathogenicity classification of H5 sequences with ∼99% classification accuracy. Our results indicate that for pathogenicity classification of (1) aligned deoxyribonucleic acid (DNA) and protein sequences, with NB classifier had the lowest accuracies of 98.41% (+/-0.89) and 98.31% (+/-1.06), respectively; (2) aligned DNA and protein sequences, with LR (L1/L2), KNN, SVM (radial basis function (RBF)) and CNN classifiers had the highest accuracies of 99.20% (+/-0.54) and 99.20% (+/-0.38), respectively; (3) unaligned DNA and protein sequences, with CNN's achieved accuracies of 98.54% (+/-0.68) and 99.20% (+/-0.50), respectively. ML methods show potential for regular classification of H5Nx virus pathogenicity for poultry species, particularly when sequences containing regular markers were frequently present in the training dataset.
Collapse
Affiliation(s)
- Akshay Chadha
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada.
| | - Rozita Dara
- School of Computer Science, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - David L Pearl
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Shayan Sharif
- Department of Pathobiology, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| | - Zvonimir Poljak
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, Ontario N1G 2W1, Canada
| |
Collapse
|
3
|
Khan S, Khan HU, Nazir S. Systematic analysis of healthcare big data analytics for efficient care and disease diagnosing. Sci Rep 2022; 12:22377. [PMID: 36572709 PMCID: PMC9792582 DOI: 10.1038/s41598-022-26090-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/09/2022] [Indexed: 12/27/2022] Open
Abstract
Big data has revolutionized the world by providing tremendous opportunities for a variety of applications. It contains a gigantic amount of data, especially a plethora of data types that has been significantly useful in diverse research domains. In healthcare domain, the researchers use computational devices to extract enriched relevant information from this data and develop smart applications to solve real-life problems in a timely fashion. Electronic health (eHealth) and mobile health (mHealth) facilities alongwith the availability of new computational models have enabled the doctors and researchers to extract relevant information and visualize the healthcare big data in a new spectrum. Digital transformation of healthcare systems by using of information system, medical technology, handheld and smart wearable devices has posed many challenges to researchers and caretakers in the form of storage, minimizing treatment cost, and processing time (to extract enriched information, and minimize error rates to make optimum decisions). In this research work, the existing literature is analysed and assessed, to identify gaps that result in affecting the overall performance of the available healthcare applications. Also, it aims to suggest enhanced solutions to address these gaps. In this comprehensive systematic research work, the existing literature reported during 2011 to 2021, is thoroughly analysed for identifying the efforts made to facilitate the doctors and practitioners for diagnosing diseases using healthcare big data analytics. A set of rresearch questions are formulated to analyse the relevant articles for identifying the key features and optimum management solutions, and laterally use these analyses to achieve effective outcomes. The results of this systematic mapping conclude that despite of hard efforts made in the domains of healthcare big data analytics, the newer hybrid machine learning based systems and cloud computing-based models should be adapted to reduce treatment cost, simulation time and achieve improved quality of care. This systematic mapping will also result in enhancing the capabilities of doctors, practitioners, researchers, and policymakers to use this study as evidence for future research.
Collapse
Affiliation(s)
- Sulaiman Khan
- Department of Accounting and Information Systems, College of Business and Economics, Qatar University, Doha, Qatar
| | - Habib Ullah Khan
- Department of Accounting and Information Systems, College of Business and Economics, Qatar University, Doha, Qatar
| | - Shah Nazir
- Department of Computer Science, University of Swabi, Swabi, Pakistan
| |
Collapse
|
4
|
Wang Z, Huang J, Xie D, He D, Lu A, Liang C. Toward Overcoming Treatment Failure in Rheumatoid Arthritis. Front Immunol 2021; 12:755844. [PMID: 35003068 PMCID: PMC8732378 DOI: 10.3389/fimmu.2021.755844] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 12/06/2021] [Indexed: 12/29/2022] Open
Abstract
Rheumatoid arthritis (RA) is an autoimmune disorder characterized by inflammation and bone erosion. The exact mechanism of RA is still unknown, but various immune cytokines, signaling pathways and effector cells are involved. Disease-modifying antirheumatic drugs (DMARDs) are commonly used in RA treatment and classified into different categories. Nevertheless, RA treatment is based on a "trial-and-error" approach, and a substantial proportion of patients show failed therapy for each DMARD. Over the past decades, great efforts have been made to overcome treatment failure, including identification of biomarkers, exploration of the reasons for loss of efficacy, development of sequential or combinational DMARDs strategies and approval of new DMARDs. Here, we summarize these efforts, which would provide valuable insights for accurate RA clinical medication. While gratifying, researchers realize that these efforts are still far from enough to recommend specific DMARDs for individual patients. Precision medicine is an emerging medical model that proposes a highly individualized and tailored approach for disease management. In this review, we also discuss the potential of precision medicine for overcoming RA treatment failure, with the introduction of various cutting-edge technologies and big data.
Collapse
Affiliation(s)
- Zhuqian Wang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Institute of Integrated Bioinfomedicine and Translational Science (IBTS), School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Jie Huang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Duoli Xie
- Institute of Integrated Bioinfomedicine and Translational Science (IBTS), School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| | - Dongyi He
- Institute of Arthritis Research in Integrative Medicine, Shanghai Academy of Traditional Chinese Medicine, Shanghai, China
- Department of Rheumatology, Shanghai Guanghua Hospital of Integrative Medicine, Shanghai, China
| | - Aiping Lu
- Institute of Integrated Bioinfomedicine and Translational Science (IBTS), School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Institute of Arthritis Research in Integrative Medicine, Shanghai Academy of Traditional Chinese Medicine, Shanghai, China
- Guangdong-Hong Kong-Macau Joint Lab on Chinese Medicine and Immune Disease Research, Guangzhou, China
| | - Chao Liang
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- Institute of Integrated Bioinfomedicine and Translational Science (IBTS), School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Possas C, Marques ETA, Risi JB, Homma A. COVID-19 and Future Disease X in Circular Economy Transition: Redesigning Pandemic Preparedness to Prevent a Global Disaster. CIRCULAR ECONOMY AND SUSTAINABILITY 2021; 1:1463-1478. [PMID: 34888566 PMCID: PMC8238518 DOI: 10.1007/s43615-021-00060-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/06/2021] [Indexed: 01/13/2023]
Abstract
The COVID-19 pandemic exposed a world surprisingly unprepared to respond to the new epidemiological scenario, even the developed countries, in spite of warnings from scientists since the 1990s. These alerts warned on the risks of an exponential increase in emergence of potentially pandemic zoonotic infectious diseases related to disruptive ecological niches in different regions of the globe, such as H1N1 Influenza, SARS, MERS, Zika, avian flu, swine flu, and Ebola, and also on the risks of a future and more lethal Disease X. We examine this global public health failure in anticipating and responding to the pandemic, stressing the urgent need for an innovative global pandemic preparedness system in the current transition from linear economy to a circular economy. Evidence provided here indicates that this novel preventive-based and resource-saving preparedness system could contribute to reverse the detrimental impacts of the pandemic on global economy and increase its resilience. Individual protection, contact tracing, and lockdown have proved to be just partially effective to respond to the spillover of viral zoonosis into the human population, and for most of these pathogens, vaccines are not yet available. As for COVID-19 vaccines, in spite of the extraordinary investments and unprecedented advances in innovative vaccines in few months, most of these products are expected to be available to more vulnerable developing countries’ populations only by mid-2022. Furthermore, even when these vaccines are available, constraints such as low efficacy, waning immunity, new concerning COVID-19 variants, adverse events, and vaccine hesitancy might possibly restrict their public health impact and could contribute to aggravate the pandemic scenario. Considering these constraints and the severe global economic and social crises resulting from the lack of adequate preparedness and delayed effective response to COVID-19 and possibly to a future Disease X, we propose a pro-active global eco-social pandemic preparedness system. This novel system, based on One Health paradigm and on artificial intelligence and machine learning, is expected to incorporate “spillover” foresight and management into global preparedness and timely response. Designed to mitigate damage from outbreaks and minimize human morbidity and mortality, this approach to pandemic foresight and preparedness will be key to prevent a global disaster.
Collapse
Affiliation(s)
- Cristina Possas
- Bio-Manguinhos, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Ernesto T A Marques
- Aggeu Magalhães Institute, Oswaldo Cruz Foundation, Pernambuco, Brazil.,Department of Infectious Diseases and Microbiology, University of Pittsburgh, Pittsburgh, USA
| | | | - Akira Homma
- Bio-Manguinhos, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| |
Collapse
|
6
|
Pashazadeh A, Navimipour NJ. Big data handling mechanisms in the healthcare applications: A comprehensive and systematic literature review. J Biomed Inform 2018; 82:47-62. [PMID: 29655946 DOI: 10.1016/j.jbi.2018.03.014] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Revised: 11/19/2017] [Accepted: 03/23/2018] [Indexed: 01/08/2023]
Abstract
Healthcare provides many services such as diagnosing, treatment, prevention of diseases, illnesses, injuries, and other physical and mental disorders. Large-scale distributed data processing applications in healthcare as a basic concept operates on large amounts of data. Therefore, big data application functions are the main part of healthcare operations, but there was not any comprehensive and systematic survey about studying and evaluating the important techniques in this field. Therefore, this paper aims at providing the comprehensive, detailed, and systematic study of the state-of-the-art mechanisms in the big data related to healthcare applications in five categories, including machine learning, cloud-based, heuristic-based, agent-based, and hybrid mechanisms. Also, this paper displayed a systematic literature review (SLR) of the big data applications in the healthcare literature up to the end of 2016. Initially, 205 papers were identified, but a paper selection process reduced the number of papers to 29 important studies.
Collapse
Affiliation(s)
- Asma Pashazadeh
- Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Nima Jafari Navimipour
- Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran.
| |
Collapse
|
7
|
Abstract
Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein-protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast and a few other model organisms. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called ComFiR to detect such protein complexes and further rank diseased complexes based on a query disease. We have shown that it has better performance in identifying protein complexes from human PPI data. This method is evaluated in terms of positive predictive value, sensitivity and accuracy. We have introduced a ranking approach and showed its application on Alzheimer's disease.
Collapse
|