1
|
Ab Rasid AM, Muazu Musa R, Abdul Majeed APP, Musawi Maliki ABH, Abdullah MR, Mohd Razmaan MA, Abu Osman NA. Physical fitness and motor ability parameters as predictors for skateboarding performance: A logistic regression modelling analysis. PLoS One 2024; 19:e0296467. [PMID: 38329954 PMCID: PMC10852284 DOI: 10.1371/journal.pone.0296467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 12/13/2023] [Indexed: 02/10/2024] Open
Abstract
The identification and prediction of athletic talent are pivotal in the development of successful sporting careers. Traditional subjective assessment methods have proven unreliable due to their inherent subjectivity, prompting the rise of data-driven techniques favoured for their objectivity. This evolution in statistical analysis facilitates the extraction of pertinent athlete information, enabling the recognition of their potential for excellence in their respective sporting careers. In the current study, we applied a logistic regression-based machine learning pipeline (LR) to identify potential skateboarding athletes from a combination of fitness and motor skills performance variables. Forty-five skateboarders recruited from a variety of skateboarding parks were evaluated on various skateboarding tricks while their fitness and motor skills abilities that consist of stork stance test, dynamic balance, sit ups, plank test, standing broad jump, as well as vertical jump, were evaluated. The performances of the skateboarders were clustered and the LR model was developed to classify the classes of the skateboarders. The cluster analysis identified two groups of skateboarders: high and low potential skateboarders. The LR model achieved 90% of mean accuracy specifying excellent prediction of the skateboarder classes. Further sensitivity analysis revealed that static and dynamic balance, lower body strength, and endurance were the most important factors that contributed to the model's performance. These factors are therefore essential for successful performance in skateboarding. The application of machine learning in talent prediction can greatly assist coaches and other relevant stakeholders in making informed decisions regarding athlete performance.
Collapse
Affiliation(s)
- Aina Munirah Ab Rasid
- Centre for Fundamental and Continuing Education, Department of Credited Co-Curriculum, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Rabiu Muazu Musa
- Centre for Fundamental and Continuing Education, Department of Credited Co-Curriculum, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Anwar P. P. Abdul Majeed
- School of Robotics, XJTLU Entrepreneur College (Taicang), Xi’an Jiaotong-Liverpool University, Suzhou, China
| | | | - Mohamad Razali Abdullah
- Faculty of Health Science, Universiti Sultan Zainal Abidin, Kuala Nerus, Terengganu, Malaysia
| | - Mohd Azraai Mohd Razmaan
- Innovative Manufacturing, Mechatronics and Sports Laboratory, Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
| | - Noor Azuan Abu Osman
- Centre for Applied Biomechanics, Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
2
|
Hasegawa S, Sawada T, Serizawa T. Identification of Water-Soluble Polymers through Machine Learning of Fluorescence Signals from Multiple Peptide Sensors. ACS APPLIED BIO MATERIALS 2023; 6:4598-4602. [PMID: 37889623 PMCID: PMC10664068 DOI: 10.1021/acsabm.3c00736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/22/2023] [Accepted: 10/23/2023] [Indexed: 10/29/2023]
Abstract
Recently, there has been growing concern about the discharge of water-soluble polymers (especially synthetic polymers) into the environment. Therefore, the identification of water-soluble polymers in water samples is becoming increasingly crucial. In this study, a chemical tongue system to simply and precisely identify water-soluble polymers using multiple fluorescently responsive peptide sensors was demonstrated. Fluorescence spectra obtained from the mixture of each peptide sensor and water-soluble polymer were changed depending on the combination of the polymer species and peptide sensors. Water-soluble polymers were successfully identified through the supervised or unsupervised machine learning of multidimensional fluorescence signals from the peptide sensors.
Collapse
Affiliation(s)
- Shion Hasegawa
- Department of Chemical Science and
Engineering, School of Materials and Chemical Technology, Tokyo Institute of Technology, 2-12-1-H121 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Toshiki Sawada
- Department of Chemical Science and
Engineering, School of Materials and Chemical Technology, Tokyo Institute of Technology, 2-12-1-H121 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| | - Takeshi Serizawa
- Department of Chemical Science and
Engineering, School of Materials and Chemical Technology, Tokyo Institute of Technology, 2-12-1-H121 Ookayama, Meguro-ku, Tokyo 152-8550, Japan
| |
Collapse
|
3
|
Sousa H, Musa RM, Clemente FM, Sarmento H, Gouveia ÉR. Physical predictors for retention and dismissal of professional soccer head coaches: an analysis of locomotor variables using logistic regression pipeline. Front Sports Act Living 2023; 5:1301845. [PMID: 38053523 PMCID: PMC10694450 DOI: 10.3389/fspor.2023.1301845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/07/2023] [Indexed: 12/07/2023] Open
Abstract
Introduction Soccer has enormous global popularity, increasing pressure on clubs to optimize performance. In failure, the tendency is to replace the Head coach (HC). This study aimed to check the physical effects of mid-season replacements of HCs, investigating which external load variables can predict retention or dismissal. Methods The data was collected in training and matches of a professional adult male soccer team during three complete seasons (2020/21-2022/2023). The sample included 6 different HCs (48.8 ± 7.4 years of age; 11.2 ± 3.9 years as a HC). The 4 weeks and 4 games before and after the replacement of HCs were analysed. External load variables were collected with Global Positioning System (GPS) devices. A logistic regression (LR) model was developed to classify the HCs' retention or dismissal. A sensitivity analysis was also conducted to determine the specific locomotive variables that could predict the likelihood of HC retention or dismissal. Results In competition, locomotor performance was better under the dismissed HCs, whereas the new HC had better values during training. The LR model demonstrated a good prediction accuracy of 80% with a recall and precision of 85% and 78%, respectively, amongst other model performance indicators. Meters per minute in games was the only significant variable that could serve as a potential physical marker to signal performance decline and predict the potential dismissal of an HC with an odd ratio of 32.4%. Discussion An in-depth analysis and further studies are needed to understand other factors' effects on HC replacement or retention.
Collapse
Affiliation(s)
- Honorato Sousa
- Research Unit for Sport and Physical Activity, Faculty of Sport Sciences and Physical Education, University of Coimbra, Coimbra, Portugal
| | - Rabiu Muazu Musa
- Centre for Fundamental and Continuing Education, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Filipe Manuel Clemente
- Escola Superior Desporto e Lazer, Instituto Politécnico de Viana do Castelo, Rua Escola Industrial e Comercial de Nun’Álvares, Viana do Castelo, Portugal
- Gdansk University of Physical Education and Sport, Gdańsk, Poland
- Sport Physical Activity and Health Research & Innovation Center SPRINT, Melgaço, Portugal
| | - Hugo Sarmento
- Research Unit for Sport and Physical Activity, Faculty of Sport Sciences and Physical Education, University of Coimbra, Coimbra, Portugal
| | - Élvio R. Gouveia
- Department of Physical Education and Sport, University of Madeira, Funchal, Portugal
- LARSyS, Interactive Technologies Institute, Funchal, Portugal
| |
Collapse
|
4
|
Greenberg ZF, Graim KS, He M. Towards artificial intelligence-enabled extracellular vesicle precision drug delivery. Adv Drug Deliv Rev 2023:114974. [PMID: 37356623 DOI: 10.1016/j.addr.2023.114974] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 06/27/2023]
Abstract
Extracellular Vesicles (EVs), particularly exosomes, recently exploded into nanomedicine as an emerging drug delivery approach due to their superior biocompatibility, circulating stability, and bioavailability in vivo. However, EV heterogeneity makes molecular targeting precision a critical challenge. Deciphering key molecular drivers for controlling EV tissue targeting specificity is in great need. Artificial intelligence (AI) brings powerful prediction ability for guiding the rational design of engineered EVs in precision control for drug delivery. This review focuses on cutting-edge nano-delivery via integrating large-scale EV data with AI to develop AI-directed EV therapies and illuminate the clinical translation potential. We briefly review the current status of EVs in drug delivery, including the current frontier, limitations, and considerations to advance the field. Subsequently, we detail the future of AI in drug delivery and its impact on precision EV delivery. Our review discusses the current universal challenge of standardization and critical considerations when using AI combined with EVs for precision drug delivery. Finally, we will conclude this review with a perspective on future clinical translation led by a combined effort of AI and EV research.
Collapse
Affiliation(s)
- Zachary F Greenberg
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA
| | - Kiley S Graim
- Department of Computer & Information Science & Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, 32610, USA
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA.
| |
Collapse
|
5
|
Nghiem N, Atkinson J, Nguyen BP, Tran-Duy A, Wilson N. Predicting high health-cost users among people with cardiovascular disease using machine learning and nationwide linked social administrative datasets. HEALTH ECONOMICS REVIEW 2023; 13:9. [PMID: 36738348 PMCID: PMC9898915 DOI: 10.1186/s13561-023-00422-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 01/23/2023] [Indexed: 06/18/2023]
Abstract
OBJECTIVES To optimise planning of public health services, the impact of high-cost users needs to be considered. However, most of the existing statistical models for costs do not include many clinical and social variables from administrative data that are associated with elevated health care resource use, and are increasingly available. This study aimed to use machine learning approaches and big data to predict high-cost users among people with cardiovascular disease (CVD). METHODS We used nationally representative linked datasets in New Zealand to predict CVD prevalent cases with the most expensive cost belonging to the top quintiles by cost. We compared the performance of four popular machine learning models (L1-regularised logistic regression, classification trees, k-nearest neighbourhood (KNN) and random forest) with the traditional regression models. RESULTS The machine learning models had far better accuracy in predicting high health-cost users compared with the logistic models. The harmony score F1 (combining sensitivity and positive predictive value) of the machine learning models ranged from 30.6% to 41.2% (compared with 8.6-9.1% for the logistic models). Previous health costs, income, age, chronic health conditions, deprivation, and receiving a social security benefit were among the most important predictors of the CVD high-cost users. CONCLUSIONS This study provides additional evidence that machine learning can be used as a tool together with big data in health economics for identification of new risk factors and prediction of high-cost users with CVD. As such, machine learning may potentially assist with health services planning and preventive measures to improve population health while potentially saving healthcare costs.
Collapse
Affiliation(s)
- Nhung Nghiem
- Department of Public Health, University of Otago, Wellington, New Zealand.
| | - June Atkinson
- Department of Public Health, University of Otago, Wellington, New Zealand
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
| | - An Tran-Duy
- Centre for Health Policy, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, Australia
| | - Nick Wilson
- Department of Public Health, University of Otago, Wellington, New Zealand
| |
Collapse
|
6
|
Baldrighi GN, Nova A, Bernardinelli L, Fazia T. A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122030. [PMID: 36556394 PMCID: PMC9781110 DOI: 10.3390/life12122030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/09/2022]
Abstract
Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.
Collapse
|
7
|
Szymborski J, Emad A. RAPPPID: towards generalizable protein interaction prediction with AWD-LSTM twin networks. Bioinformatics 2022; 38:3958-3967. [PMID: 35771595 DOI: 10.1093/bioinformatics/btac429] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 04/30/2022] [Accepted: 06/27/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Computational methods for the prediction of protein-protein interactions (PPIs), while important tools for researchers, are plagued by challenges in generalizing to unseen proteins. Datasets used for modelling protein-protein predictions are particularly predisposed to information leakage and sampling biases. RESULTS In this study, we introduce RAPPPID, a method for the Regularized Automatic Prediction of Protein-Protein Interactions using Deep Learning. RAPPPID is a twin Averaged Weight-Dropped Long Short-Term memory network which employs multiple regularization methods during training time to learn generalized weights. Testing on stringent interaction datasets composed of proteins not seen during training, RAPPPID outperforms state-of-the-art methods. Further experiments show that RAPPPID's performance holds regardless of the particular proteins in the testing set and its performance is higher for experimentally supported edges. This study serves to demonstrate that appropriate regularization is an important component of overcoming the challenges of creating models for PPI prediction that generalize to unseen proteins. Additionally, as part of this study, we provide datasets corresponding to several data splits of various strictness, in order to facilitate assessment of PPI reconstruction methods by others in the future. AVAILABILITY AND IMPLEMENTATION Code and datasets are freely available at https://github.com/jszym/rapppid and Zenodo.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joseph Szymborski
- Department of Electrical and Computer Engineering, McGill University, Montréal, QC H3A 0G4, Canada.,Mila, Québec AI Institute, Montréal, QC H2S 3H1, Canada
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montréal, QC H3A 0G4, Canada.,Mila, Québec AI Institute, Montréal, QC H2S 3H1, Canada.,The Rosalind and Morris Goodman Cancer Institute, Montréal, QC H3A 1A3, Canada
| |
Collapse
|
8
|
Wang CW, Chang CC, Lee YC, Lin YJ, Lo SC, Hsu PC, Liou YA, Wang CH, Chao TK. Weakly supervised deep learning for prediction of treatment effectiveness on ovarian cancer from histopathology images. Comput Med Imaging Graph 2022; 99:102093. [PMID: 35752000 DOI: 10.1016/j.compmedimag.2022.102093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 05/13/2022] [Accepted: 06/03/2022] [Indexed: 11/30/2022]
Abstract
Despite the progress made during the last two decades in the surgery and chemotherapy of ovarian cancer, more than 70 % of advanced patients are with recurrent cancer and decease. Surgical debulking of tumors following chemotherapy is the conventional treatment for advanced carcinoma, but patients with such treatment remain at great risk for recurrence and developing drug resistance, and only about 30 % of the women affected will be cured. Bevacizumab is a humanized monoclonal antibody, which blocks VEGF signaling in cancer, inhibits angiogenesis and causes tumor shrinkage, and has been recently approved by FDA as a monotherapy for advanced ovarian cancer in combination with chemotherapy. Considering the cost, potential toxicity, and finding that only a portion of patients will benefit from these drugs, the identification of new predictive method for the treatment of ovarian cancer remains an urgent unmet medical need. In this study, we develop weakly supervised deep learning approaches to accurately predict therapeutic effect for bevacizumab of ovarian cancer patients from histopathological hematoxylin and eosin stained whole slide images, without any pathologist-provided locally annotated regions. To the authors' best knowledge, this is the first model demonstrated to be effective for prediction of the therapeutic effect of patients with epithelial ovarian cancer to bevacizumab. Quantitative evaluation of a whole section dataset shows that the proposed method achieves high accuracy, 0.882 ± 0.06; precision, 0.921 ± 0.04, recall, 0.912 ± 0.03; F-measure, 0.917 ± 0.07 using 5-fold cross validation and outperforms two state-of-the art deep learning approaches Coudray et al. (2018), Campanella et al. (2019). For an independent TMA testing set, the three proposed methods obtain promising results with high recall (sensitivity) 0.946, 0.893 and 0.964, respectively. The results suggest that the proposed method could be useful for guiding treatment by assisting in filtering out patients without positive therapeutic response to suffer from further treatments while keeping patients with positive response in the treatment process. Furthermore, according to the statistical analysis of the Cox Proportional Hazards Model, patients who were predicted to be invalid by the proposed model had a very high risk of cancer recurrence (hazard ratio = 13.727) than patients predicted to be effective with statistical signifcance (p < 0.05).
Collapse
Affiliation(s)
- Ching-Wei Wang
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan; Graduate Institute of Applied Science and Technology, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Cheng-Chang Chang
- Department of Gynecology and Obstetrics, Tri-Service General Hospital, Taipei, Taiwan; Graduate Institute of Medical Sciences, National Defense Medical Center, Taipei, Taiwan
| | - Yu-Ching Lee
- Graduate Institute of Applied Science and Technology, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Yi-Jia Lin
- Department of Pathology, Tri-Service General Hospital, Taipei, Taiwan; Institute of Pathology and Parasitology, National Defense Medical Center, Taipei, Taiwan
| | - Shih-Chang Lo
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Po-Chao Hsu
- Department of Gynecology and Obstetrics, Tri-Service General Hospital, Taipei, Taiwan; Graduate Institute of Medical Sciences, National Defense Medical Center, Taipei, Taiwan
| | - Yi-An Liou
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Chih-Hung Wang
- Department of Otolaryngology-Head and Neck Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Tai-Kuang Chao
- Department of Pathology, Tri-Service General Hospital, Taipei, Taiwan; Institute of Pathology and Parasitology, National Defense Medical Center, Taipei, Taiwan.
| |
Collapse
|
9
|
Rajendran P, Pramanik M. High frame rate (∼3 Hz) circular photoacoustic tomography using single-element ultrasound transducer aided with deep learning. JOURNAL OF BIOMEDICAL OPTICS 2022; 27:066005. [PMID: 36452448 PMCID: PMC9209813 DOI: 10.1117/1.jbo.27.6.066005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 06/01/2022] [Indexed: 05/29/2023]
Abstract
SIGNIFICANCE In circular scanning photoacoustic tomography (PAT), it takes several minutes to generate an image of acceptable quality, especially with a single-element ultrasound transducer (UST). The imaging speed can be enhanced by faster scanning (with high repetition rate light sources) and using multiple-USTs. However, artifacts arising from the sparse signal acquisition and low signal-to-noise ratio at higher scanning speeds limit the imaging speed. Thus, there is a need to improve the imaging speed of the PAT systems without hampering the quality of the PAT image. AIM To improve the frame rate (or imaging speed) of the PAT system by using deep learning (DL). APPROACH For improving the frame rate (or imaging speed) of the PAT system, we propose a novel U-Net-based DL framework to reconstruct PAT images from fast scanning data. RESULTS The efficiency of the network was evaluated on both single- and multiple-UST-based PAT systems. Both phantom and in vivo imaging demonstrate that the network can improve the imaging frame rate by approximately sixfold in single-UST-based PAT systems and by approximately twofold in multi-UST-based PAT systems. CONCLUSIONS We proposed an innovative method to improve the frame rate (or imaging speed) by using DL and with this method, the fastest frame rate of ∼ 3 Hz imaging is achieved without hampering the quality of the reconstructed image.
Collapse
Affiliation(s)
| | - Manojit Pramanik
- Nanyang Technological University, School of Chemical and Biomedical Engineering, Singapore
| |
Collapse
|
10
|
Chintalapudi N, Angeloni U, Battineni G, di Canio M, Marotta C, Rezza G, Sagaro GG, Silenzi A, Amenta F. LASSO Regression Modeling on Prediction of Medical Terms among Seafarers’ Health Documents Using Tidy Text Mining. Bioengineering (Basel) 2022; 9:bioengineering9030124. [PMID: 35324813 PMCID: PMC8945331 DOI: 10.3390/bioengineering9030124] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/02/2022] [Accepted: 03/16/2022] [Indexed: 12/31/2022] Open
Abstract
Generally, seafarers face a higher risk of illnesses and accidents than land workers. In most cases, there are no medical professionals on board seagoing vessels, which makes disease diagnosis even more difficult. When this occurs, onshore doctors may be able to provide medical advice through telemedicine by receiving better symptomatic and clinical details in the health abstracts of seafarers. The adoption of text mining techniques can assist in extracting diagnostic information from clinical texts. We applied lexicon sentimental analysis to explore the automatic labeling of positive and negative healthcare terms to seafarers’ text healthcare documents. This was due to the lack of experimental evaluations using computational techniques. In order to classify diseases and their associated symptoms, the LASSO regression algorithm is applied to analyze these text documents. A visualization of symptomatic data frequency for each disease can be achieved by analyzing TF-IDF values. The proposed approach allows for the classification of text documents with 93.8% accuracy by using a machine learning model called LASSO regression. It is possible to classify text documents effectively with tidy text mining libraries. In addition to delivering health assistance, this method can be used to classify diseases and establish health observatories. Knowledge developed in the present work will be applied to establish an Epidemiological Observatory of Seafarers’ Pathologies and Injuries. This Observatory will be a collaborative initiative of the Italian Ministry of Health, University of Camerino, and International Radio Medical Centre (C.I.R.M.), the Italian TMAS.
Collapse
Affiliation(s)
- Nalini Chintalapudi
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
- Correspondence: ; Tel.: +39-35-33776704
| | - Ulrico Angeloni
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Gopi Battineni
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
| | - Marzio di Canio
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
- Research Department, International Radio Medical Centre (C.I.R.M.), 00144 Rome, Italy
| | - Claudia Marotta
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Giovanni Rezza
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Getu Gamo Sagaro
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
| | - Andrea Silenzi
- General Directorate of Health Prevention, Ministry of Health, 00144 Rome, Italy; (U.A.); (C.M.); (G.R.); (A.S.)
| | - Francesco Amenta
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (G.B.); (M.d.C.); (G.G.S.); (F.A.)
- Research Department, International Radio Medical Centre (C.I.R.M.), 00144 Rome, Italy
| |
Collapse
|
11
|
Li F, Zhou Y, Zhang Y, Yin J, Qiu Y, Gao J, Zhu F. POSREG: proteomic signature discovered by simultaneously optimizing its reproducibility and generalizability. Brief Bioinform 2022; 23:6532538. [PMID: 35183059 DOI: 10.1093/bib/bbac040] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/21/2022] [Accepted: 01/27/2022] [Indexed: 12/17/2022] Open
Abstract
Mass spectrometry-based proteomic technique has become indispensable in current exploration of complex and dynamic biological processes. Instrument development has largely ensured the effective production of proteomic data, which necessitates commensurate advances in statistical framework to discover the optimal proteomic signature. Current framework mainly emphasizes the generalizability of the identified signature in predicting the independent data but neglects the reproducibility among signatures identified from independently repeated trials on different sub-dataset. These problems seriously restricted the wide application of the proteomic technique in molecular biology and other related directions. Thus, it is crucial to enable the generalizable and reproducible discovery of the proteomic signature with the subsequent indication of phenotype association. However, no such tool has been developed and available yet. Herein, an online tool, POSREG, was therefore constructed to identify the optimal signature for a set of proteomic data. It works by (i) identifying the proteomic signature of good reproducibility and aggregating them to ensemble feature ranking by ensemble learning, (ii) assessing the generalizability of ensemble feature ranking to acquire the optimal signature and (iii) indicating the phenotype association of discovered signature. POSREG is unique in its capacity of discovering the proteomic signature by simultaneously optimizing its reproducibility and generalizability. It is now accessible free of charge without any registration or login requirement at https://idrblab.org/posreg/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang 310000, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang 310000, China
| | - Jianqing Gao
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
12
|
Jamdade R, Upadhyay M, Al Shaer K, Al Harthi E, Al Sallani M, Al Jasmi M, Al Ketbi A. Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10122741. [PMID: 34961211 PMCID: PMC8708657 DOI: 10.3390/plants10122741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/16/2021] [Accepted: 09/23/2021] [Indexed: 06/14/2023]
Abstract
Arabia is the largest peninsula in the world, with >3000 species of vascular plants. Not much effort has been made to generate a multi-locus marker barcode library to identify and discriminate the recorded plant species. This study aimed to determine the reliability of the available Arabian plant barcodes (>1500; rbcL and matK) at the public repository (NCBI GenBank) using the unsupervised and supervised methods. Comparative analysis was carried out with the standard dataset (FINBOL) to assess the methods and markers' reliability. Our analysis suggests that from the unsupervised method, TaxonDNA's All Species Barcode criterion (ASB) exhibits the highest accuracy for rbcL barcodes, followed by the matK barcodes using the aligned dataset (FINBOL). However, for the Arabian plant barcode dataset (GBMA), the supervised method performed better than the unsupervised method, where the Random Forest and K-Nearest Neighbor (gappy kernel) classifiers were robust enough. These classifiers successfully recognized true species from both barcode markers belonging to the aligned and alignment-free datasets, respectively. The multi-class classifier showed high species resolution following the two classifiers, though its performance declined when employed to recognize true species. Similar results were observed for the FINBOL dataset through the supervised learning approach; overall, matK marker showed higher accuracy than rbcL. However, the lower rate of species identification in matK in GBMA data could be due to the higher evolutionary rate or gaps and missing data, as observed for the ASB criterion in the FINBOL dataset. Further, a lower number of sequences and singletons could also affect the rate of species resolution, as observed in the GBMA dataset. The GBMA dataset lacks sufficient species membership. We would encourage the taxonomists from the Arabian Peninsula to join our campaign on the Arabian Barcode of Life at the Barcode of Life Data (BOLD) systems. Our efforts together could help improve the rate of species identification for the Arabian Vascular plants.
Collapse
Affiliation(s)
- Rahul Jamdade
- Sharjah Seed Bank and Herbarium, Environment and Protected Areas Authority, Sharjah P.O. Box 2926, United Arab Emirates; (K.A.S.); (E.A.H.); (M.A.S.); (M.A.J.); (A.A.K.)
| | - Maulik Upadhyay
- Population Genomics Group, Department of Veterinary Sciences, Ludwig Maximillians University, 80539 Munich, Germany;
| | - Khawla Al Shaer
- Sharjah Seed Bank and Herbarium, Environment and Protected Areas Authority, Sharjah P.O. Box 2926, United Arab Emirates; (K.A.S.); (E.A.H.); (M.A.S.); (M.A.J.); (A.A.K.)
| | - Eman Al Harthi
- Sharjah Seed Bank and Herbarium, Environment and Protected Areas Authority, Sharjah P.O. Box 2926, United Arab Emirates; (K.A.S.); (E.A.H.); (M.A.S.); (M.A.J.); (A.A.K.)
| | - Mariam Al Sallani
- Sharjah Seed Bank and Herbarium, Environment and Protected Areas Authority, Sharjah P.O. Box 2926, United Arab Emirates; (K.A.S.); (E.A.H.); (M.A.S.); (M.A.J.); (A.A.K.)
| | - Mariam Al Jasmi
- Sharjah Seed Bank and Herbarium, Environment and Protected Areas Authority, Sharjah P.O. Box 2926, United Arab Emirates; (K.A.S.); (E.A.H.); (M.A.S.); (M.A.J.); (A.A.K.)
| | - Asma Al Ketbi
- Sharjah Seed Bank and Herbarium, Environment and Protected Areas Authority, Sharjah P.O. Box 2926, United Arab Emirates; (K.A.S.); (E.A.H.); (M.A.S.); (M.A.J.); (A.A.K.)
| |
Collapse
|
13
|
Guo MG, Sosa DN, Altman RB. Challenges and opportunities in network-based solutions for biological questions. Brief Bioinform 2021; 23:6438103. [PMID: 34849568 PMCID: PMC8769687 DOI: 10.1093/bib/bbab437] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 09/09/2021] [Accepted: 09/22/2021] [Indexed: 11/28/2022] Open
Abstract
Network biology is useful for modeling complex biological phenomena; it has attracted attention with the advent of novel graph-based machine learning methods. However, biological applications of network methods often suffer from inadequate follow-up. In this perspective, we discuss obstacles for contemporary network approaches—particularly focusing on challenges representing biological concepts, applying machine learning methods, and interpreting and validating computational findings about biology—in an effort to catalyze actionable biological discovery.
Collapse
Affiliation(s)
- Margaret G Guo
- Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA.,Program in Epithelial Biology, Stanford University, Stanford, CA, USA
| | - Daniel N Sosa
- Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
14
|
Wang CW, Huang SC, Lee YC, Shen YJ, Meng SI, Gaol JL. Deep learning for bone marrow cell detection and classification on whole-slide images. Med Image Anal 2021; 75:102270. [PMID: 34710655 DOI: 10.1016/j.media.2021.102270] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 10/06/2021] [Accepted: 10/13/2021] [Indexed: 12/19/2022]
Abstract
Bone marrow (BM) examination is an essential step in both diagnosing and managing numerous hematologic disorders. BM nucleated differential count (NDC) analysis, as part of BM examination, holds the most fundamental and crucial information. However, there are many challenges to perform automated BM NDC analysis on whole-slide images (WSIs), including large dimensions of data to process, complicated cell types with subtle differences. To the authors best knowledge, this is the first study on fully automatic BM NDC using WSIs with 40x objective magnification, which can replace traditional manual counting relying on light microscopy via oil-immersion 100x objective lens with a total 1000x magnification. In this study, we develop an efficient and fully automatic hierarchical deep learning framework for BM NDC WSI analysis in seconds. The proposed hierarchical framework consists of (1) a deep learning model for rapid localization of BM particles and cellular trails generating regions of interest (ROI) for further analysis, (2) a patch-based deep learning model for cell identification of 16 cell types, including megakaryocytes, mitotic cells, and four stages of erythroblasts which have not been demonstrated in previous studies before, and (3) a fast stitching model for integrating patch-based results and producing final outputs. In evaluation, the proposed method is firstly tested on a dataset with a total of 12,426 annotated cells using cross validation, achieving high recall and accuracy of 0.905 ± 0.078 and 0.989 ± 0.006, respectively, and taking only 44 seconds to perform BM NDC analysis for a WSI. To further examine the generalizability of our model, we conduct an evaluation on the second independent dataset with a total of 3005 cells, and the results show that the proposed method also obtains high recall and accuracy of 0.842 and 0.988, respectively. In comparison with the existing small-image-based benchmark methods, the proposed method demonstrates superior performance in recall, accuracy and computational time.
Collapse
Affiliation(s)
- Ching-Wei Wang
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 106, Taiwan; Graduate Institute of Applied Science and Technology, National Taiwan University of Science and Technology, Taipei, 106, Taiwan.
| | - Sheng-Chuan Huang
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, 100, Taiwan; Department of Hematology and Oncology, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan; Department of Clinical Pathology, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan
| | - Yu-Ching Lee
- Graduate Institute of Applied Science and Technology, National Taiwan University of Science and Technology, Taipei, 106, Taiwan
| | - Yu-Jie Shen
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 106, Taiwan
| | - Shwu-Ing Meng
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, 100, Taiwan
| | - Jeff L Gaol
- Graduate Institute of Biomedical Engineering, National Taiwan University of Science and Technology, Taipei, 106, Taiwan
| |
Collapse
|
15
|
Pourashraf T, Shokri S, Yousefi M, Ahmadi A, Azar PA. Implementing Machine Learning in Laboratory Synthesis by Hybrid of SVR Model and Optimization Algorithms. ADVANCED THEORY AND SIMULATIONS 2021. [DOI: 10.1002/adts.202100225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Tolou Pourashraf
- Department of Chemistry Science and Research Branch Islamic Azad University Tehran 1477893855 Iran
| | - Saeid Shokri
- Technology and Innovation Group Research Institute of Petroleum Industry (RIPI) Tehran 1485733111 Iran
| | - Mohammad Yousefi
- Department of Chemistry Faculty of Pharmaceutical Chemistry Tehran Medical Sciences Islamic Azad University Tehran 1949635881 Iran
| | - Abbas Ahmadi
- Department of Chemistry Faculty of Science Karaj Branch Islamic Azad University Karaj 3149968111 Iran
| | - Parviz Aberoomand Azar
- Department of Chemistry Science and Research Branch Islamic Azad University Tehran 1477893855 Iran
| |
Collapse
|
16
|
Molina Mora JA, Montero-Manso P, García-Batán R, Campos-Sánchez R, Vilar-Fernández J, García F. A first perturbome of Pseudomonas aeruginosa: Identification of core genes related to multiple perturbations by a machine learning approach. Biosystems 2021; 205:104411. [PMID: 33757842 DOI: 10.1016/j.biosystems.2021.104411] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 03/11/2021] [Accepted: 03/12/2021] [Indexed: 01/27/2023]
Abstract
Tolerance to stress conditions is vital for organismal survival, including bacteria under specific environmental conditions, antibiotics, and other perturbations. Some studies have described common modulation and shared genes during stress response to different types of disturbances (termed as perturbome), leading to the idea of central control at the molecular level. We implemented a robust machine learning approach to identify and describe genes associated with multiple perturbations or perturbome in a Pseudomonas aeruginosa PAO1 model. Using microarray datasets from the Gene Expression Omnibus (GEO), we evaluated six approaches to rank and select genes: using two methodologies, data single partition (SP method) or multiple partitions (MP method) for training and testing datasets, we evaluated three classification algorithms (SVM Support Vector Machine, KNN K-Nearest neighbor and RF Random Forest). Gene expression patterns and topological features at the systems level were included to describe the perturbome elements. We were able to select and describe 46 core response genes associated with multiple perturbations in P. aeruginosa PAO1 and it can be considered a first report of the P. aeruginosa perturbome. Molecular annotations, patterns in expression levels, and topological features in molecular networks revealed biological functions of biosynthesis, binding, and metabolism, many of them related to DNA damage repair and aerobic respiration in the context of tolerance to stress. We also discuss different issues related to implemented and assessed algorithms, including data partitioning, classification approaches, and metrics. Altogether, this work offers a different and robust framework to select genes using a machine learning approach.
Collapse
Affiliation(s)
- Jose Arturo Molina Mora
- Centro de Investigacion en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San Jose, Costa Rica.
| | | | - Raquel García-Batán
- Centro de Investigacion en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San Jose, Costa Rica.
| | - Rebeca Campos-Sánchez
- Centro de Investigación en Biología Celular y Molecular (CIBCM), Universidad de Costa Rica, San José, Costa Rica.
| | | | - Fernando García
- Centro de Investigacion en Enfermedades Tropicales (CIET) and Facultad de Microbiología, Universidad de Costa Rica, San Jose, Costa Rica.
| |
Collapse
|
17
|
Syed M, Syed S, Sexton K, Syeda HB, Garza M, Zozus M, Syed F, Begum S, Syed AU, Sanford J, Prior F. Application of Machine Learning in Intensive Care Unit (ICU) Settings Using MIMIC Dataset: Systematic Review. INFORMATICS-BASEL 2021; 8. [PMID: 33981592 DOI: 10.3390/informatics8010016] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Modern Intensive Care Units (ICUs) provide continuous monitoring of critically ill patients susceptible to many complications affecting morbidity and mortality. ICU settings require a high staff-to-patient ratio and generates a sheer volume of data. For clinicians, the real-time interpretation of data and decision-making is a challenging task. Machine Learning (ML) techniques in ICUs are making headway in the early detection of high-risk events due to increased processing power and freely available datasets such as the Medical Information Mart for Intensive Care (MIMIC). We conducted a systematic literature review to evaluate the effectiveness of applying ML in the ICU settings using the MIMIC dataset. A total of 322 articles were reviewed and a quantitative descriptive analysis was performed on 61 qualified articles that applied ML techniques in ICU settings using MIMIC data. We assembled the qualified articles to provide insights into the areas of application, clinical variables used, and treatment outcomes that can pave the way for further adoption of this promising technology and possible use in routine clinical decision-making. The lessons learned from our review can provide guidance to researchers on application of ML techniques to increase their rate of adoption in healthcare.
Collapse
Affiliation(s)
- Mahanazuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Shorabuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Kevin Sexton
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
- Department of Surgery, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
- Department of Health Policy and Management, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Hafsa Bareen Syeda
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Maryam Garza
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Meredith Zozus
- Department of Population Health Sciences, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229, USA
| | - Farhanuddin Syed
- Shadan Institute of Medical Sciences, College of Medicine, Hyderabad, Telangana 500086, India
| | - Salma Begum
- Department of Information Technology, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Abdullah Usama Syed
- Department of Information Science, University of Arkansas at Little Rock (UALR), Little Rock, Arkansas 72205, USA
| | - Joseph Sanford
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
- Department of Anesthesiology, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| | - Fred Prior
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas 72205, USA
| |
Collapse
|
18
|
Lüftinger L, Májek P, Beisken S, Rattei T, Posch AE. Learning From Limited Data: Towards Best Practice Techniques for Antimicrobial Resistance Prediction From Whole Genome Sequencing Data. Front Cell Infect Microbiol 2021; 11:610348. [PMID: 33659219 PMCID: PMC7917081 DOI: 10.3389/fcimb.2021.610348] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 01/11/2021] [Indexed: 01/20/2023] Open
Abstract
Antimicrobial resistance prediction from whole genome sequencing data (WGS) is an emerging application of machine learning, promising to improve antimicrobial resistance surveillance and outbreak monitoring. Despite significant reductions in sequencing cost, the availability and sampling diversity of WGS data with matched antimicrobial susceptibility testing (AST) profiles required for training of WGS-AST prediction models remains limited. Best practice machine learning techniques are required to ensure trained models generalize to independent data for optimal predictive performance. Limited data restricts the choice of machine learning training and evaluation methods and can result in overestimation of model performance. We demonstrate that the widely used random k-fold cross-validation method is ill-suited for application to small bacterial genomics datasets and offer an alternative cross-validation method based on genomic distance. We benchmarked three machine learning architectures previously applied to the WGS-AST problem on a set of 8,704 genome assemblies from five clinically relevant pathogens across 77 species-compound combinations collated from public databases. We show that individual models can be effectively ensembled to improve model performance. By combining models via stacked generalization with cross-validation, a model ensembling technique suitable for small datasets, we improved average sensitivity and specificity of individual models by 1.77% and 3.20%, respectively. Furthermore, stacked models exhibited improved robustness and were thus less prone to outlier performance drops than individual component models. In this study, we highlight best practice techniques for antimicrobial resistance prediction from WGS data and introduce the combination of genome distance aware cross-validation and stacked generalization for robust and accurate WGS-AST.
Collapse
Affiliation(s)
- Lukas Lüftinger
- Ares Genetics GmbH, Vienna, Austria
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | | | | | - Thomas Rattei
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | | |
Collapse
|
19
|
Bobak CA, Kang L, Workman L, Bateman L, Khan MS, Prins M, May L, Franchina FA, Baard C, Nicol MP, Zar HJ, Hill JE. Breath can discriminate tuberculosis from other lower respiratory illness in children. Sci Rep 2021; 11:2704. [PMID: 33526828 PMCID: PMC7851130 DOI: 10.1038/s41598-021-80970-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/28/2020] [Indexed: 01/30/2023] Open
Abstract
Pediatric tuberculosis (TB) remains a global health crisis. Despite progress, pediatric patients remain difficult to diagnose, with approximately half of all childhood TB patients lacking bacterial confirmation. In this pilot study (n = 31), we identify a 4-compound breathprint and subsequent machine learning model that accurately classifies children with confirmed TB (n = 10) from children with another lower respiratory tract infection (LRTI) (n = 10) with a sensitivity of 80% and specificity of 100% observed across cross validation folds. Importantly, we demonstrate that the breathprint identified an additional nine of eleven patients who had unconfirmed clinical TB and whose symptoms improved while treated for TB. While more work is necessary to validate the utility of using patient breath to diagnose pediatric TB, it shows promise as a triage instrument or paired as part of an aggregate diagnostic scheme.
Collapse
Affiliation(s)
- Carly A. Bobak
- grid.254880.30000 0001 2179 2404Thayer School of Engineering, Dartmouth College, Hanover, NH USA ,grid.254880.30000 0001 2179 2404Geisel School of Medicine, Dartmouth College, Hanover, NH USA
| | - Lili Kang
- grid.254880.30000 0001 2179 2404Thayer School of Engineering, Dartmouth College, Hanover, NH USA
| | - Lesley Workman
- grid.415742.10000 0001 2296 3850Department of Pediatrics and Child Health, MRC Unit on Child and Adolescent Health, University of Cape Town and Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
| | - Lindy Bateman
- grid.415742.10000 0001 2296 3850Department of Pediatrics and Child Health, MRC Unit on Child and Adolescent Health, University of Cape Town and Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
| | - Mohammad S. Khan
- grid.254880.30000 0001 2179 2404Thayer School of Engineering, Dartmouth College, Hanover, NH USA
| | - Margaretha Prins
- grid.415742.10000 0001 2296 3850Department of Pediatrics and Child Health, MRC Unit on Child and Adolescent Health, University of Cape Town and Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
| | - Lloyd May
- grid.254880.30000 0001 2179 2404Thayer School of Engineering, Dartmouth College, Hanover, NH USA
| | - Flavio A. Franchina
- grid.254880.30000 0001 2179 2404Thayer School of Engineering, Dartmouth College, Hanover, NH USA ,grid.4861.b0000 0001 0805 7253Molecular Systems, Organic and Biological Analytical Chemistry Group, University of Liège, Liège, Belgium
| | - Cynthia Baard
- grid.415742.10000 0001 2296 3850Department of Pediatrics and Child Health, MRC Unit on Child and Adolescent Health, University of Cape Town and Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
| | - Mark P. Nicol
- grid.7836.a0000 0004 1937 1151Division of Medical Microbiology and Institute for Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa ,grid.1012.20000 0004 1936 7910School of Biomedical Sciences, University of Western Australia, Perth, Australia
| | - Heather J. Zar
- grid.415742.10000 0001 2296 3850Department of Pediatrics and Child Health, MRC Unit on Child and Adolescent Health, University of Cape Town and Red Cross War Memorial Children’s Hospital, Cape Town, South Africa
| | - Jane E. Hill
- grid.254880.30000 0001 2179 2404Thayer School of Engineering, Dartmouth College, Hanover, NH USA
| |
Collapse
|
20
|
Alafeef M, Srivastava I, Pan D. Machine Learning for Precision Breast Cancer Diagnosis and Prediction of the Nanoparticle Cellular Internalization. ACS Sens 2020; 5:1689-1698. [PMID: 32466640 DOI: 10.1021/acssensors.0c00329] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In the field of theranostics, diagnostic nanoparticles are designed to collect highly patient-selective disease profiles, which is then leveraged by a set of nanotherapeutics to improve the therapeutic results. Despite their early promise, high interpatient and intratumoral heterogeneities make any rational design and analysis of these theranostics platforms extremely problematic. Recent advances in deep-learning-based tools may help bridge this gap, using pattern recognition algorithms for better diagnostic precision and therapeutic outcome. Triple-negative breast cancer (TNBC) is a conundrum because of the complex molecular diversity, making its diagnosis and therapy challenging. To address these challenges, we propose a method to predict the cellular internalization of nanoparticles (NPs) against different cancer stages using artificial intelligence. Here, we demonstrate for the first time that a combination of machine-learning (ML) algorithm and characteristic cellular uptake responses for individual cancer cell types can be successfully used to classify various cancer cell types. Utilizing this approach, we can optimize the nanomaterials to get an optimum structure-internalization response for a given particle. This methodology predicted the structure-internalization response of the evaluated nanoparticles with remarkable accuracy (Q2 = 0.9). We anticipate that it can reduce the effort by minimizing the number of nanoparticles that need to be tested and could be utilized as a screening tool for designing nanotherapeutics. Following this, we have proposed a diagnostic nanomaterial-based platform used to assemble a patient-specific cancer profile with the assistance of machine learning (ML). The platform is composed of eight carbon nanoparticles (CNPs) with multifarious surface chemistries that can differentiate healthy breast cells from cancerous cells and then subclassify TNBC cells vs non-TNBC cells, within the TNBC group. The artificial neural network (ANN) algorithm has been successfully used in identifying the type of cancer cells from 36 unknown cancer samples with an overall accuracy of >98%, providing potential applications in cancer diagnostics.
Collapse
Affiliation(s)
- Maha Alafeef
- Bioengineering Department, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Biomedical Engineering Department, Jordan University of Science and Technology, Irbid 22110, Jordan
| | - Indrajit Srivastava
- Bioengineering Department, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Dipanjan Pan
- Bioengineering Department, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
- Departments of Diagnostic Radiology and Nuclear Medicine and Pediatrics and Chemical, Biochemical and Environmental Engineering, University of Maryland, Baltimore, Maryland 21250, United States
- University of Maryland Baltimore County, Baltimore, Maryland 21250, United States
| |
Collapse
|
21
|
Machine learning-based lifetime breast cancer risk reclassification compared with the BOADICEA model: impact on screening recommendations. Br J Cancer 2020; 123:860-867. [PMID: 32565540 PMCID: PMC7463251 DOI: 10.1038/s41416-020-0937-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 05/13/2020] [Accepted: 05/29/2020] [Indexed: 12/17/2022] Open
Abstract
Background The clinical utility of machine-learning (ML) algorithms for breast cancer risk prediction and screening practices is unknown. We compared classification of lifetime breast cancer risk based on ML and the BOADICEA model. We explored the differences in risk classification and their clinical impact on screening practices. Methods We used three different ML algorithms and the BOADICEA model to estimate lifetime breast cancer risk in a sample of 112,587 individuals from 2481 families from the Oncogenetic Unit, Geneva University Hospitals. Performance of algorithms was evaluated using the area under the receiver operating characteristic (AU-ROC) curve. Risk reclassification was compared for 36,146 breast cancer-free women of ages 20–80. The impact on recommendations for mammography surveillance was based on the Swiss Surveillance Protocol. Results The predictive accuracy of ML-based algorithms (0.843 ≤ AU-ROC ≤ 0.889) was superior to BOADICEA (AU-ROC = 0.639) and reclassified 35.3% of women in different risk categories. The largest reclassification (20.8%) was observed in women characterised as ‘near population’ risk by BOADICEA. Reclassification had the largest impact on screening practices of women younger than 50. Conclusion ML-based reclassification of lifetime breast cancer risk occurred in approximately one in three women. Reclassification is important for younger women because it impacts clinical decision- making for the initiation of screening.
Collapse
|
22
|
Mayhew MB, Buturovic L, Luethy R, Midic U, Moore AR, Roque JA, Shaller BD, Asuni T, Rawling D, Remmel M, Choi K, Wacker J, Khatri P, Rogers AJ, Sweeney TE. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. Nat Commun 2020; 11:1177. [PMID: 32132525 PMCID: PMC7055276 DOI: 10.1038/s41467-020-14975-w] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 02/13/2020] [Indexed: 02/07/2023] Open
Abstract
Improved identification of bacterial and viral infections would reduce morbidity from sepsis, reduce antibiotic overuse, and lower healthcare costs. Here, we develop a generalizable host-gene-expression-based classifier for acute bacterial and viral infections. We use training data (N = 1069) from 18 retrospective transcriptomic studies. Using only 29 preselected host mRNAs, we train a neural-network classifier with a bacterial-vs-other area under the receiver-operating characteristic curve (AUROC) 0.92 (95% CI 0.90–0.93) and a viral-vs-other AUROC 0.92 (95% CI 0.90–0.93). We then apply this classifier, inflammatix-bacterial-viral-noninfected-version 1 (IMX-BVN-1), without retraining, to an independent cohort (N = 163). In this cohort, IMX-BVN-1 AUROCs are: bacterial-vs.-other 0.86 (95% CI 0.77–0.93), and viral-vs.-other 0.85 (95% CI 0.76–0.93). In patients enrolled within 36 h of hospital admission (N = 70), IMX-BVN-1 AUROCs are: bacterial-vs.-other 0.92 (95% CI 0.83–0.99), and viral-vs.-other 0.91 (95% CI 0.82–0.98). With further study, IMX-BVN-1 could provide a tool for assessing patients with suspected infection and sepsis at hospital admission. Diagnosing acute infections based on transcriptional host response shows promise, but generalizability is wanting. Here, the authors use a co-normalization framework to train a classifier to diagnose acute infections and apply it to independent data on a targeted diagnostic platform.
Collapse
Affiliation(s)
- Michael B Mayhew
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | | | - Roland Luethy
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | - Uros Midic
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | - Andrew R Moore
- Department of Medicine, Stanford University, Palo Alto, CA, 94305, USA
| | - Jonasel A Roque
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Stanford University, Palo Alto, CA, 94305, USA
| | - Brian D Shaller
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Stanford University, Palo Alto, CA, 94305, USA
| | - Tola Asuni
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Stanford University, Palo Alto, CA, 94305, USA
| | - David Rawling
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | - Melissa Remmel
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | - Kirindi Choi
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | - James Wacker
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA
| | - Purvesh Khatri
- Institute for Immunity, Transplantation and Infections, Stanford University, Palo Alto, CA, 94305, USA.,Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Palo Alto, CA, 94305, USA
| | - Angela J Rogers
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Medicine, Stanford University, Palo Alto, CA, 94305, USA
| | - Timothy E Sweeney
- Inflammatix, Inc., 863 Mitten Rd, Suite 104, Burlingame, CA, 94010, USA.
| |
Collapse
|
23
|
Nair JKR, Saeed UA, McDougall CC, Sabri A, Kovacina B, Raidu BVS, Khokhar RA, Probst S, Hirsh V, Chankowsky J, Van Kempen LC, Taylor J. Radiogenomic Models Using Machine Learning Techniques to Predict EGFR Mutations in Non-Small Cell Lung Cancer. Can Assoc Radiol J 2020; 72:109-119. [PMID: 32063026 DOI: 10.1177/0846537119899526] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The purpose of this study was to build radiogenomics models from texture signatures derived from computed tomography (CT) and 18F-FDG PET-CT (FDG PET-CT) images of non-small cell lung cancer (NSCLC) with and without epidermal growth factor receptor (EGFR) mutations. METHODS Fifty patients diagnosed with NSCLC between 2011 and 2015 and with known EGFR mutation status were retrospectively identified. Texture features extracted from pretreatment CT and FDG PET-CT images by manual contouring of the primary tumor were used to develop multivariate logistic regression (LR) models to predict EGFR mutations in exon 19 and exon 20. RESULTS An LR model evaluating FDG PET-texture features was able to differentiate EGFR mutant from wild type with an area under the curve (AUC), sensitivity, specificity, and accuracy of 0.87, 0.76, 0.66, and 0.71, respectively. The model derived from CT texture features had an AUC, sensitivity, specificity, and accuracy of 0.83, 0.84, 0.73, and 0.78, respectively. FDG PET-texture features that could discriminate between mutations in EGFR exon 19 and 21 demonstrated AUC, sensitivity, specificity, and accuracy of 0.86, 0.84, 0.73, and 0.78, respectively. Based on CT texture features, the AUC, sensitivity, specificity, and accuracy were 0.75, 0.81, 0.69, and 0.75, respectively. CONCLUSION Non-small cell lung cancer texture analysis using FGD-PET and CT images can identify tumors with mutations in EGFR. Imaging signatures could be valuable for pretreatment assessment and prognosis in precision therapy.
Collapse
Affiliation(s)
- Jay Kumar Raghavan Nair
- Department of Radiology, 54473McGill University Health Centre, Montreal, Québec, Canada.,Department of Radiology, McMaster University Faculty of Health Sciences, Hamilton, Ontario, Canada.,Department of Radiology, 2129University of Calgary, Calgary, Alberta, Canada
| | - Umar Abid Saeed
- Department of Radiology, 54473McGill University Health Centre, Montreal, Québec, Canada.,Department of Radiology, 2129University of Calgary, Calgary, Alberta, Canada
| | - Connor C McDougall
- Department of Mechanical Engineering, 2129University of Calgary, Calgary, Alberta, Canada
| | - Ali Sabri
- Department of Radiology, McMaster University, Hamilton, Ontario, Canada.,Department of Radiology, Jewish General Hospital, Montreal, Québec, Canada
| | - Bojan Kovacina
- Department of Radiology, Jewish General Hospital, Montreal, Québec, Canada
| | - B V S Raidu
- Raidu Analysts and Associates, Mumbai, India
| | - Riaz Ahmed Khokhar
- Department of Radiology, 54473McGill University Health Centre, Montreal, Québec, Canada.,Department of Surgery, Khokhar Medical Centre, Rawalpindi, Pakistan
| | - Stephan Probst
- Department of Nuclear Medicine, Jewish General Hospital, Québec, Montreal, Canada
| | - Vera Hirsh
- Department of Oncology, 5620McGill University Health Centre, Montreal, Québec, Canada
| | - Jeffrey Chankowsky
- Department of Radiology, 54473McGill University Health Centre, Montreal, Québec, Canada
| | - Léon C Van Kempen
- Department of Pathology, 10173University Medical Center Groningen, University of Groningen, Groningen, the Netherlands.,Department of Pathology, Jewish General Hospital, Montreal, Québec, Canada
| | - Jana Taylor
- Department of Radiology, 54473McGill University Health Centre, Montreal, Québec, Canada
| |
Collapse
|
24
|
Huang EW, Bhope A, Lim J, Sinha S, Emad A. Tissue-guided LASSO for prediction of clinical drug response using preclinical samples. PLoS Comput Biol 2020; 16:e1007607. [PMID: 31967990 PMCID: PMC6975549 DOI: 10.1371/journal.pcbi.1007607] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/15/2019] [Indexed: 12/12/2022] Open
Abstract
Prediction of clinical drug response (CDR) of cancer patients, based on their clinical and molecular profiles obtained prior to administration of the drug, can play a significant role in individualized medicine. Machine learning models have the potential to address this issue but training them requires data from a large number of patients treated with each drug, limiting their feasibility. While large databases of drug response and molecular profiles of preclinical in-vitro cancer cell lines (CCLs) exist for many drugs, it is unclear whether preclinical samples can be used to predict CDR of real patients. We designed a systematic approach to evaluate how well different algorithms, trained on gene expression and drug response of CCLs, can predict CDR of patients. Using data from two large databases, we evaluated various linear and non-linear algorithms, some of which utilized information on gene interactions. Then, we developed a new algorithm called TG-LASSO that explicitly integrates information on samples' tissue of origin with gene expression profiles to improve prediction performance. Our results showed that regularized regression methods provide better prediction performance. However, including the network information or common methods of including information on the tissue of origin did not improve the results. On the other hand, TG-LASSO improved the predictions and distinguished resistant and sensitive patients for 7 out of 13 drugs. Additionally, TG-LASSO identified genes associated with the drug response, including known targets and pathways involved in the drugs' mechanism of action. Moreover, genes identified by TG-LASSO for multiple drugs in a tissue were associated with patient survival. In summary, our analysis suggests that preclinical samples can be used to predict CDR of patients and identify biomarkers of drug sensitivity and survival.
Collapse
Affiliation(s)
- Edward W. Huang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, United States of America
| | - Ameya Bhope
- Department of Electrical and Computer Engineering, McGill University, Canada
| | - Jing Lim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Illinois, United States of America
| | - Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Canada
| |
Collapse
|
25
|
|
26
|
Picart-Armada S, Barrett SJ, Willé DR, Perera-Lluna A, Gutteridge A, Dessailly BH. Benchmarking network propagation methods for disease gene identification. PLoS Comput Biol 2019; 15:e1007276. [PMID: 31479437 PMCID: PMC6743778 DOI: 10.1371/journal.pcbi.1007276] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 09/13/2019] [Accepted: 07/16/2019] [Indexed: 12/17/2022] Open
Abstract
In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes. The use of biological network data has proven its effectiveness in many areas from computational biology. Networks consist of nodes, usually genes or proteins, and edges that connect pairs of nodes, representing information such as physical interactions, regulatory roles or co-occurrence. In order to find new candidate nodes for a given biological property, the so-called network propagation algorithms start from the set of known nodes with that property and leverage the connections from the biological network to make predictions. Here, we assess the performance of several network propagation algorithms to find sensible gene targets for 22 common non-cancerous diseases, i.e. those that have been found promising enough to start the clinical trials with any compound. We focus on obtaining performance metrics that reflect a practical scenario in drug development where only a small set of genes can be essayed. We found that the presence of protein complexes biased the performance estimates, leading to over-optimistic conclusions, and introduced two novel strategies to address it. Our results support that network propagation is still a viable approach to find drug targets, but that special care needs to be put on the validation strategy. Algorithms benefitted from the use of a larger -although noisier- network and of direct evidence data, rather than indirect genetic associations to disease.
Collapse
Affiliation(s)
- Sergio Picart-Armada
- B2SLab, Departament d’Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, Spain
- Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Esplugues de Llobregat, Spain
- * E-mail:
| | | | | | - Alexandre Perera-Lluna
- B2SLab, Departament d’Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, Spain
- Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Esplugues de Llobregat, Spain
| | - Alex Gutteridge
- Computational Biology and Statistics, GSK, Stevenage, United Kingdom
| | | |
Collapse
|
27
|
Zhang W, Li W, Zhang J, Wang N. Data Integration of Hybrid Microarray and Single Cell Expression Data to Enhance Gene Network Inference. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190104142228] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background:
Gene Regulatory Network (GRN) inference algorithms aim to explore
casual interactions between genes and transcriptional factors. High-throughput transcriptomics
data including DNA microarray and single cell expression data contain complementary
information in network inference.
Objective:
To enhance GRN inference, data integration across various types of expression data
becomes an economic and efficient solution.
Method:
In this paper, a novel E-alpha integration rule-based ensemble inference algorithm is
proposed to merge complementary information from microarray and single cell expression data.
This paper implements a Gradient Boosting Tree (GBT) inference algorithm to compute
importance scores for candidate gene-gene pairs. The proposed E-alpha rule quantitatively
evaluates the credibility levels of each information source and determines the final ranked list.
Results:
Two groups of in silico gene networks are applied to illustrate the effectiveness of the
proposed E-alpha integration. Experimental outcomes with size50 and size100 in silico gene
networks suggest that the proposed E-alpha rule significantly improves performance metrics
compared with single information source.
Conclusion:
In GRN inference, the integration of hybrid expression data using E-alpha rule
provides a feasible and efficient way to enhance performance metrics than solely increasing
sample sizes.
Collapse
Affiliation(s)
- Wei Zhang
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| | - Wenchao Li
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| | - Jianming Zhang
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| | - Ning Wang
- Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, 310013, China
| |
Collapse
|