1
|
Hasegawa N, Sugiyama M, Igarashi K. Random forest machine-learning algorithm classifies white- and brown-rot fungi according to the number of the genes encoding Carbohydrate-Active enZyme families. Appl Environ Microbiol 2024; 90:e0048224. [PMID: 38832775 PMCID: PMC11267879 DOI: 10.1128/aem.00482-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/04/2024] [Indexed: 06/05/2024] Open
Abstract
Wood-rotting fungi play an important role in the global carbon cycle because they are the only known organisms that digest wood, the largest carbon stock in nature. In the present study, we used linear discriminant analysis and random forest (RF) machine learning algorithms to predict white- or brown-rot decay modes from the numbers of genes encoding Carbohydrate-Active enZymes with over 98% accuracy. Unlike other algorithms, RF identified specific genes involved in cellulose and lignin degradation, including auxiliary activities (AAs) family 9 lytic polysaccharide monooxygenases, glycoside hydrolase family 7 cellobiohydrolases, and AA family 2 peroxidases, as critical factors. This study sheds light on the complex interplay between genetic information and decay modes and underscores the potential of RF for comparative genomics studies of wood-rotting fungi. IMPORTANCE Wood-rotting fungi are categorized as either white- or brown-rot modes based on the coloration of decomposed wood. The process of classification can be influenced by human biases. The random forest machine learning algorithm effectively distinguishes between white- and brown-rot fungi based on the presence of Carbohydrate-Active enZyme genes. These findings not only aid in the classification of wood-rotting fungi but also facilitate the identification of the enzymes responsible for degrading woody biomass.
Collapse
Affiliation(s)
- Natsuki Hasegawa
- Department of Biomaterial Sciences, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Masashi Sugiyama
- Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
- UT7 Next Life Research Group, The University of Tokyo, Tokyo, Japan
| | - Kiyohiko Igarashi
- Department of Biomaterial Sciences, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- UT7 Next Life Research Group, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
2
|
Levin MA, Kia A, Timsina P, Cheng FY, Nguyen KAN, Kohli-Seth R, Lin HM, Ouyang Y, Freeman R, Reich DL. Real-Time Machine Learning Alerts to Prevent Escalation of Care: A Nonrandomized Clustered Pragmatic Clinical Trial. Crit Care Med 2024; 52:1007-1020. [PMID: 38380992 DOI: 10.1097/ccm.0000000000006243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
OBJECTIVES Machine learning algorithms can outperform older methods in predicting clinical deterioration, but rigorous prospective data on their real-world efficacy are limited. We hypothesized that real-time machine learning generated alerts sent directly to front-line providers would reduce escalations. DESIGN Single-center prospective pragmatic nonrandomized clustered clinical trial. SETTING Academic tertiary care medical center. PATIENTS Adult patients admitted to four medical-surgical units. Assignment to intervention or control arms was determined by initial unit admission. INTERVENTIONS Real-time alerts stratified according to predicted likelihood of deterioration sent either to the primary team or directly to the rapid response team (RRT). Clinical care and interventions were at the providers' discretion. For the control units, alerts were generated but not sent, and standard RRT activation criteria were used. MEASUREMENTS AND MAIN RESULTS The primary outcome was the rate of escalation per 1000 patient bed days. Secondary outcomes included the frequency of orders for fluids, medications, and diagnostic tests, and combined in-hospital and 30-day mortality. Propensity score modeling with stabilized inverse probability of treatment weight (IPTW) was used to account for differences between groups. Data from 2740 patients enrolled between July 2019 and March 2020 were analyzed (1488 intervention, 1252 control). Average age was 66.3 years and 1428 participants (52%) were female. The rate of escalation was 12.3 vs. 11.3 per 1000 patient bed days (difference, 1.0; 95% CI, -2.8 to 4.7) and IPTW adjusted incidence rate ratio 1.43 (95% CI, 1.16-1.78; p < 0.001). Patients in the intervention group were more likely to receive cardiovascular medication orders (16.1% vs. 11.3%; 4.7%; 95% CI, 2.1-7.4%) and IPTW adjusted relative risk (RR) (1.74; 95% CI, 1.39-2.18; p < 0.001). Combined in-hospital and 30-day-mortality was lower in the intervention group (7% vs. 9.3%; -2.4%; 95% CI, -4.5% to -0.2%) and IPTW adjusted RR (0.76; 95% CI, 0.58-0.99; p = 0.045). CONCLUSIONS Real-time machine learning alerts do not reduce the rate of escalation but may reduce mortality.
Collapse
Affiliation(s)
- Matthew A Levin
- Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
- Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY
- Institute for Critical Care Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- Department of Anesthesiology and Yale Center for Analytical Sciences, Yale School of Medicine, New Haven, CT
| | - Arash Kia
- Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Prem Timsina
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Fu-Yuan Cheng
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Kim-Anh-Nhi Nguyen
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Roopa Kohli-Seth
- Institute for Critical Care Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Hung-Mo Lin
- Department of Anesthesiology and Yale Center for Analytical Sciences, Yale School of Medicine, New Haven, CT
| | - Yuxia Ouyang
- Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Robert Freeman
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY
| | - David L Reich
- Department of Anesthesiology, Perioperative, and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
3
|
Lee CS, Lin CR, Chua HH, Wu JF, Chang KC, Ni YH, Chang MH, Chen HL. Gut Bifidobacterium longum is associated with better native liver survival in patients with biliary atresia. JHEP Rep 2024; 6:101090. [PMID: 39006502 PMCID: PMC11246047 DOI: 10.1016/j.jhepr.2024.101090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 03/08/2024] [Accepted: 04/03/2024] [Indexed: 07/16/2024] Open
Abstract
Background & Aims The gut microbiome plays an important role in liver diseases, but its specific impact on biliary atresia (BA) remains to be explored. We aimed to investigate the microbial signature in the early life of patients with BA and to analyze its influence on long-term outcomes. Methods Fecal samples (n = 42) were collected from infants with BA before and after Kasai portoenterostomy (KPE). The stool microbiota was analyzed using 16S rRNA next-generation sequencing and compared with that of age-matched healthy controls (HCs). Shotgun metagenomic sequencing analysis was employed to confirm the bacterial composition in 10 fecal samples before KPE. The correlation of the microbiome signature with liver function and long-term outcomes was assessed. Results In the 16S rRNA next-generation sequencing analysis of fecal microbiota, the alpha and beta diversity analyses revealed significant differences between HCs and patients with BA before and after KPE. The difference in microbial composition analyzed by linear discriminant analysis and random forest classification revealed that the abundance of Bifidobacterium longum (B. longum) was significantly lower in patients before and after KPE than in HCs. The abundance of B. longum was negatively correlated with the gamma-glutamyltransferase level after KPE (p <0.05). Patients with early detectable B. longum had significantly lower total and direct bilirubin 3 months after KPE (p <0.005) and had a significantly lower liver transplantation rate (hazard ratio: 0.16, 95% CI 0.03-0.83, p = 0.029). Shotgun metagenomic sequencing also revealed that patients with BA and detectable B. longum had reduced total and direct bilirubin after KPE. Conclusion The gut microbiome of patients with BA differed from that of HCs, with a notable abundance of B. longum in early infancy correlating with better long-term outcomes. Impact and implications Bifidobacterium longum (B. longum) is a beneficial bacterium commonly found in the human gut. It has been studied for its potential impacts on various health conditions. In patients with biliary atresia, we found that a greater abundance of B. longum in the fecal microbiome is associated with improved clinical outcomes. This suggests that early colonization and increasing B. longum levels in the gut could be a therapeutic strategy to improve the prognosis of patients with biliary atresia.
Collapse
Affiliation(s)
- Chee-Seng Lee
- Department of Pediatrics, National Taiwan University Hospital Hsin-Chu Branch, Hsin-Chu, Taiwan
- Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Chia-Ray Lin
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
| | - Huey-Huey Chua
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
| | - Jia-Feng Wu
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
| | - Kai-Chi Chang
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
| | - Yen-Hsuan Ni
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
- Hepatitis Research Center, National Taiwan University Hospital, Taipei, Taiwan
- Center of Genomic and Precision Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
- Medical Microbiota Center, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Mei-Hwei Chang
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
| | - Huey-Ling Chen
- Department of Pediatrics, National Taiwan University College of Medicine and Children's Hospital, Taipei, Taiwan
- Hepatitis Research Center, National Taiwan University Hospital, Taipei, Taiwan
- Department and Graduate Institute of Medical Education and Bioethics, National Taiwan University College of Medicine, Taipei, Taiwan
| |
Collapse
|
4
|
Jiam ML, Xin KZ, Ha PK, Jiam NT. A supervised machine learning model for identifying predictive factors for recommending head and neck cancer surgery. Head Neck 2024; 46:1001-1008. [PMID: 38344931 DOI: 10.1002/hed.27674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/08/2024] [Accepted: 01/23/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND New patient referrals are often processed by practice coordinators with little-to-no medical background. Treatment delays due to incorrect referral processing, however, have detrimental consequences. Identifying variables that are associated with a higher likelihood of surgical oncological resection may improve patient referral processing and expedite the time to treatment. The study objective is to develop a supervised machine learning (ML) platform that identifies relevant variables associated with head and neck surgical resection. METHODS A retrospective cohort study was conducted on 64 222 patient datapoints from the SEER database. RESULTS The random forest ML model correctly classified patients who were offered head and neck surgery with an 81% accuracy rate. The sensitivity and specificity rates were 86% and 71%. The positive and negative predictive values were 85% and 73%. CONCLUSIONS ML modeling accurately predicts head and neck cancer surgery recommendations based on patient and cancer information from a large population-based dataset. ML adjuncts for referral processing may decrease the time to treatment for patients with cancer.
Collapse
Affiliation(s)
- Max L Jiam
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Kevin Z Xin
- Department of Radiology, University of California - Irvine, Irvine, California, USA
| | - Patrick K Ha
- Department of Otolaryngology - Head & Neck Surgery, University of California - San Francisco, San Francisco, California, USA
| | - Nicole T Jiam
- Department of Otolaryngology - Head & Neck Surgery, University of California - San Francisco, San Francisco, California, USA
- Department of Otolaryngology - Head & Neck Surgery, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
5
|
Manrique PD, Leus IV, López CA, Mehla J, Malloci G, Gervasoni S, Vargiu AV, Kinthada RK, Herndon L, Hengartner NW, Walker JK, Rybenkov VV, Ruggerone P, Zgurskaya HI, Gnanakaran S. Predicting permeation of compounds across the outer membrane of P. aeruginosa using molecular descriptors. Commun Chem 2024; 7:84. [PMID: 38609430 PMCID: PMC11015012 DOI: 10.1038/s42004-024-01161-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 03/27/2024] [Indexed: 04/14/2024] Open
Abstract
The ability Gram-negative pathogens have at adapting and protecting themselves against antibiotics has increasingly become a public health threat. Data-driven models identifying molecular properties that correlate with outer membrane (OM) permeation and growth inhibition while avoiding efflux could guide the discovery of novel classes of antibiotics. Here we evaluate 174 molecular descriptors in 1260 antimicrobial compounds and study their correlations with antibacterial activity in Gram-negative Pseudomonas aeruginosa. The descriptors are derived from traditional approaches quantifying the compounds' intrinsic physicochemical properties, together with, bacterium-specific from ensemble docking of compounds targeting specific MexB binding pockets, and all-atom molecular dynamics simulations in different subregions of the OM model. Using these descriptors and the measured inhibitory concentrations, we design a statistical protocol to identify predictors of OM permeation/inhibition. We find consistent rules across most of our data highlighting the role of the interaction between the compounds and the OM. An implementation of the rules uncovered in our study is shown, and it demonstrates the accuracy of our approach in a set of previously unseen compounds. Our analysis sheds new light on the key properties drug candidates need to effectively permeate/inhibit P. aeruginosa, and opens the gate to similar data-driven studies in other Gram-negative pathogens.
Collapse
Affiliation(s)
- Pedro D Manrique
- Physics Department, George Washington University, Washington, 20052, DC, USA.
| | - Inga V Leus
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, 73019, OK, USA
| | - César A López
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, 87545, NM, USA
| | - Jitender Mehla
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, 73019, OK, USA
| | - Giuliano Malloci
- Department of Physics, University of Cagliari, Monserrato, 20052, CA, Italy
| | - Silvia Gervasoni
- Department of Physics, University of Cagliari, Monserrato, 20052, CA, Italy
| | - Attilio V Vargiu
- Department of Physics, University of Cagliari, Monserrato, 20052, CA, Italy
| | - Rama K Kinthada
- Department of Pharmacology and Physiology, Saint Louis University, St. Louis, 63103, MO, USA
| | - Liam Herndon
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, 87545, NM, USA
| | - Nicolas W Hengartner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, 87545, NM, USA
| | - John K Walker
- Department of Pharmacology and Physiology, Saint Louis University, St. Louis, 63103, MO, USA
| | - Valentin V Rybenkov
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, 73019, OK, USA
| | - Paolo Ruggerone
- Department of Physics, University of Cagliari, Monserrato, 20052, CA, Italy
| | - Helen I Zgurskaya
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, 73019, OK, USA
| | - S Gnanakaran
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, 87545, NM, USA.
| |
Collapse
|
6
|
Rudar J, Kruczkiewicz P, Vernygora O, Golding GB, Hajibabaei M, Lung O. Sequence signatures within the genome of SARS-CoV-2 can be used to predict host source. Microbiol Spectr 2024; 12:e0358423. [PMID: 38436242 PMCID: PMC10986507 DOI: 10.1128/spectrum.03584-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 02/11/2024] [Indexed: 03/05/2024] Open
Abstract
We conducted an in silico analysis to better understand the potential factors impacting host adaptation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in white-tailed deer, humans, and mink due to the strong evidence of sustained transmission within these hosts. Classification models trained on single nucleotide and amino acid differences between samples effectively identified white-tailed deer-, human-, and mink-derived SARS-CoV-2. For example, the balanced accuracy score of Extremely Randomized Trees classifiers was 0.984 ± 0.006. Eighty-eight commonly identified predictive mutations are found at sites under strong positive and negative selective pressure. A large fraction of sites under selection (86.9%) or identified by machine learning (87.1%) are found in genes other than the spike. Some locations encoded by these gene regions are predicted to be B- and T-cell epitopes or are implicated in modulating the immune response suggesting that host adaptation may involve the evasion of the host immune system, modulation of the class-I major-histocompatibility complex, and the diminished recognition of immune epitopes by CD8+ T cells. Our selection and machine learning analysis also identified that silent mutations, such as C7303T and C9430T, play an important role in discriminating deer-derived samples across multiple clades. Finally, our investigation into the origin of the B.1.641 lineage from white-tailed deer in Canada discovered an additional human sequence from Michigan related to the B.1.641 lineage sampled near the emergence of this lineage. These findings demonstrate that machine-learning approaches can be used in combination with evolutionary genomics to identify factors possibly involved in the cross-species transmission of viruses and the emergence of novel viral lineages.IMPORTANCESevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible virus capable of infecting and establishing itself in human and wildlife populations, such as white-tailed deer. This fact highlights the importance of developing novel ways to identify genetic factors that contribute to its spread and adaptation to new host species. This is especially important since these populations can serve as reservoirs that potentially facilitate the re-introduction of new variants into human populations. In this study, we apply machine learning and phylogenetic methods to uncover biomarkers of SARS-CoV-2 adaptation in mink and white-tailed deer. We find evidence demonstrating that both non-synonymous and silent mutations can be used to differentiate animal-derived sequences from human-derived ones and each other. This evidence also suggests that host adaptation involves the evasion of the immune system and the suppression of antigen presentation. Finally, the methods developed here are general and can be used to investigate host adaptation in viruses other than SARS-CoV-2.
Collapse
Affiliation(s)
- Josip Rudar
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| | - Peter Kruczkiewicz
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
| | - Oksana Vernygora
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| | - Oliver Lung
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
- Department of Biological Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| |
Collapse
|
7
|
Wu G, Zaker A, Ebrahimi A, Tripathi S, Mer AS. Text-mining-based feature selection for anticancer drug response prediction. BIOINFORMATICS ADVANCES 2024; 4:vbae047. [PMID: 38606185 PMCID: PMC11009020 DOI: 10.1093/bioadv/vbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 04/13/2024]
Abstract
Motivation Predicting anticancer treatment response from baseline genomic data is a critical obstacle in personalized medicine. Machine learning methods are commonly used for predicting drug response from gene expression data. In the process of constructing these machine learning models, one of the most significant challenges is identifying appropriate features among a massive number of genes. Results In this study, we utilize features (genes) extracted using the text-mining of scientific literatures. Using two independent cancer pharmacogenomic datasets, we demonstrate that text-mining-based features outperform traditional feature selection techniques in machine learning tasks. In addition, our analysis reveals that text-mining feature-based machine learning models trained on in vitro data also perform well when predicting the response of in vivo cancer models. Our results demonstrate that text-mining-based feature selection is an easy to implement approach that is suitable for building machine learning models for anticancer drug response prediction. Availability and implementation https://github.com/merlab/text_features.
Collapse
Affiliation(s)
- Grace Wu
- Division of Engineering Science, University of Toronto, Toronto, M5S2E4, Canada
| | - Arvin Zaker
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Amirhosein Ebrahimi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Shivanshi Tripathi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Arvind Singh Mer
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
- School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa, K1N6N5, Canada
| |
Collapse
|
8
|
Andorra M, Freire A, Zubizarreta I, de Rosbo NK, Bos SD, Rinas M, Høgestøl EA, de Rodez Benavent SA, Berge T, Brune-Ingebretse S, Ivaldi F, Cellerino M, Pardini M, Vila G, Pulido-Valdeolivas I, Martinez-Lapiscina EH, Llufriu S, Saiz A, Blanco Y, Martinez-Heras E, Solana E, Bäcker-Koduah P, Behrens J, Kuchling J, Asseyer S, Scheel M, Chien C, Zimmermann H, Motamedi S, Kauer-Bonin J, Brandt A, Saez-Rodriguez J, Alexopoulos LG, Paul F, Harbo HF, Shams H, Oksenberg J, Uccelli A, Baeza-Yates R, Villoslada P. Predicting disease severity in multiple sclerosis using multimodal data and machine learning. J Neurol 2024; 271:1133-1149. [PMID: 38133801 PMCID: PMC10896787 DOI: 10.1007/s00415-023-12132-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 10/28/2023] [Accepted: 11/22/2023] [Indexed: 12/23/2023]
Abstract
BACKGROUND Multiple sclerosis patients would benefit from machine learning algorithms that integrates clinical, imaging and multimodal biomarkers to define the risk of disease activity. METHODS We have analysed a prospective multi-centric cohort of 322 MS patients and 98 healthy controls from four MS centres, collecting disability scales at baseline and 2 years later. Imaging data included brain MRI and optical coherence tomography, and omics included genotyping, cytomics and phosphoproteomic data from peripheral blood mononuclear cells. Predictors of clinical outcomes were searched using Random Forest algorithms. Assessment of the algorithm performance was conducted in an independent prospective cohort of 271 MS patients from a single centre. RESULTS We found algorithms for predicting confirmed disability accumulation for the different scales, no evidence of disease activity (NEDA), onset of immunotherapy and the escalation from low- to high-efficacy therapy with intermediate to high-accuracy. This accuracy was achieved for most of the predictors using clinical data alone or in combination with imaging data. Still, in some cases, the addition of omics data slightly increased algorithm performance. Accuracies were comparable in both cohorts. CONCLUSION Combining clinical, imaging and omics data with machine learning helps identify MS patients at risk of disability worsening.
Collapse
Affiliation(s)
- Magi Andorra
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Ana Freire
- School of Management, Pompeu Fabra University, Barcelona, Spain
- UPF Barcelona School of Management, Balmes 132, 08008, Barcelona, Spain
| | - Irati Zubizarreta
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Nicole Kerlero de Rosbo
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Steffan D Bos
- University of Oslo, Oslo, Norway
- Oslo University Hospital, Oslo, Norway
| | - Melanie Rinas
- Institute for Computational Biomedicine, Heidelberg University Hospital, and Heidelberg University, Heidelberg, Germany
| | - Einar A Høgestøl
- University of Oslo, Oslo, Norway
- Oslo University Hospital, Oslo, Norway
| | | | - Tone Berge
- Oslo University Hospital, Oslo, Norway
- Oslo Metropolitan University, Oslo, Norway
| | | | - Federico Ivaldi
- Department of Internal Medicine, University of Genoa, Genoa, Italy
| | - Maria Cellerino
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy
| | - Matteo Pardini
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Gemma Vila
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Irene Pulido-Valdeolivas
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Elena H Martinez-Lapiscina
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Sara Llufriu
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Albert Saiz
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Yolanda Blanco
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Eloy Martinez-Heras
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | - Elisabeth Solana
- Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
| | | | | | | | - Susanna Asseyer
- Charité Universitaetsmedizin Berlin, Berlin, Germany
- Max Delbrueck Center for Molecular Medicine, Berlin, Germany
| | | | - Claudia Chien
- Charité Universitaetsmedizin Berlin, Berlin, Germany
- Max Delbrueck Center for Molecular Medicine, Berlin, Germany
| | - Hanna Zimmermann
- Charité Universitaetsmedizin Berlin, Berlin, Germany
- Max Delbrueck Center for Molecular Medicine, Berlin, Germany
| | | | | | - Alex Brandt
- Charité Universitaetsmedizin Berlin, Berlin, Germany
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Heidelberg University Hospital, and Heidelberg University, Heidelberg, Germany
| | - Leonidas G Alexopoulos
- ProtATonce Ltd, Athens, Greece
- School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece
| | - Friedemann Paul
- Charité Universitaetsmedizin Berlin, Berlin, Germany
- Max Delbrueck Center for Molecular Medicine, Berlin, Germany
| | - Hanne F Harbo
- University of Oslo, Oslo, Norway
- Oslo University Hospital, Oslo, Norway
| | - Hengameh Shams
- Department of Neurology, University of California, San Francisco, USA
| | - Jorge Oksenberg
- Department of Neurology, University of California, San Francisco, USA
| | - Antonio Uccelli
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy
- IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Pablo Villoslada
- Department of Medicine and Life Sciences, Pompeu Fabra University, Barcelona, Spain.
- Hospital del Mar Research Institute, Barcelona, Spain.
| |
Collapse
|
9
|
Torigoe T, Takahashi M, Heravizadeh O, Ikeda K, Nakatani K, Bamba T, Izumi Y. Predicting Retention Time in Unified-Hydrophilic-Interaction/Anion-Exchange Liquid Chromatography High-Resolution Tandem Mass Spectrometry (Unified-HILIC/AEX/HRMS/MS) for Comprehensive Structural Annotation of Polar Metabolome. Anal Chem 2024; 96:1275-1283. [PMID: 38186224 DOI: 10.1021/acs.analchem.3c04618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
The accuracy of the structural annotation of unidentified peaks obtained in metabolomic analysis using liquid chromatography/tandem mass spectrometry (LC/MS/MS) can be enhanced using retention time (RT) information as well as precursor and product ions. Unified-hydrophilic-interaction/anion-exchange liquid chromatography high-resolution tandem mass spectrometry (unified-HILIC/AEX/HRMS/MS) has been recently developed as an innovative method ideal for nontargeted polar metabolomics. However, the RT prediction for unified-HILIC/AEX has not been developed because of the complex separation mechanism characterized by the continuous transition of the separation modes from HILIC to AEX. In this study, we propose an RT prediction model of unified-HILIC/AEX/HRMS/MS, which enables the comprehensive structural annotation of polar metabolites. With training data for 203 polar metabolites, we ranked the feature importance using a random forest among 12,420 molecular descriptors (MDs) and constructed an RT prediction model with 26 selected MDs. The accuracy of the RT model was evaluated using test data for 51 polar metabolites, and 86.3% of the ΔRTs (difference between measured and predicted RTs) were within ±1.50 min, with a mean absolute error of 0.80 min, indicating high RT prediction accuracy. Nontargeted metabolomic data from the NIST SRM 1950-Metabolites in frozen human plasma were analyzed using the developed RT model and in silico MS/MS prediction, resulting in a successful structural estimation of 216 polar metabolites, in addition to the 62 identified based on standards. The proposed model can help accelerate the structural annotation of unknown hydrophilic metabolites, which is a key issue in metabolomic research.
Collapse
Affiliation(s)
- Taihei Torigoe
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Masatomo Takahashi
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Omidreza Heravizadeh
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Kazuki Ikeda
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Kohta Nakatani
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Takeshi Bamba
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Yoshihiro Izumi
- Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
- Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| |
Collapse
|
10
|
Ke TM, Lophatananon A, Muir KR. An Integrative Pancreatic Cancer Risk Prediction Model in the UK Biobank. Biomedicines 2023; 11:3206. [PMID: 38137427 PMCID: PMC10740416 DOI: 10.3390/biomedicines11123206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 11/20/2023] [Accepted: 11/26/2023] [Indexed: 12/24/2023] Open
Abstract
Pancreatic cancer (PaCa) is a lethal cancer with an increasing incidence, highlighting the need for early prevention strategies. There is a lack of a comprehensive PaCa predictive model derived from large prospective cohorts. Therefore, we have developed an integrated PaCa risk prediction model for PaCa using data from the UK Biobank, incorporating lifestyle-related, genetic-related, and medical history-related variables for application in healthcare settings. We used a machine learning-based random forest approach and a traditional multivariable logistic regression method to develop a PaCa predictive model for different purposes. Additionally, we employed dynamic nomograms to visualize the probability of PaCa risk in the prediction model. The top five influential features in the random forest model were age, PRS, pancreatitis, DM, and smoking. The significant risk variables in the logistic regression model included male gender (OR = 1.17), age (OR = 1.10), non-O blood type (OR = 1.29), higher polygenic score (PRS) (Q5 vs. Q1, OR = 2.03), smoking (OR = 1.82), alcohol consumption (OR = 1.27), pancreatitis (OR = 3.99), diabetes (DM) (OR = 2.57), and gallbladder-related disease (OR = 2.07). The area under the receiver operating curve (AUC) of the logistic regression model is 0.78. Internal validation and calibration performed well in both models. Our integrative PaCa risk prediction model with the PRS effectively stratifies individuals at future risk of PaCa, aiding targeted prevention efforts and supporting community-based cancer prevention initiatives.
Collapse
Affiliation(s)
| | | | - Kenneth R. Muir
- Division of Population Health, Health Services Research and Primary Care, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PT, UK; (T.-M.K.); (A.L.)
| |
Collapse
|
11
|
Nguyen LN, Le TH, Nguyen LQ, Tran VQ. Machine learning approaches for predicting Cracking Tolerance Index (CTIndex) of asphalt concrete containing reclaimed asphalt pavement. PLoS One 2023; 18:e0287255. [PMID: 37883340 PMCID: PMC10602248 DOI: 10.1371/journal.pone.0287255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 06/01/2023] [Indexed: 10/28/2023] Open
Abstract
One of the various sorts of damage to asphalt concrete is cracking. Repeated loads, the deterioration or aging of material combinations, or structural factors can contribute to the development of cracks. Asphalt concrete's crack resistance is represented by the CT index. 107 CT Index data samples from the University of Transport Technology's lab are measured. These data samples are used to establish a database from which a Machine Learning (ML) model for predicting the CT Index of asphalt concrete can be built. For creating the highest performing machine learning model, three well-known machine learning methods are introduced: Random Forest (RF), K-Nearest Neighbors (KNN), and Multivariable Adaptive Regression Spines (MARS). Monte Carlo simulation is used to verify the accuracy of the ML model, which includes the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and coefficient of determination (R2). The RF model is the most effective ML model, according to analysis and evaluation of performance indicators. By SHAPley Additive exPlanations based on RF model, the input Aggregate content passing 4.75 mm sieve (AP4.75) has a significant effect on the variation of CT Index value. In following, the descending order is Reclaimed Asphalt Pavement content (RAP) > Bitumen content (BC) > Flash point (FP) > Softening point > Rejuvenator content (RC) > Aggregate content passing 0.075mm sieve (AP0.075) > Penetration at 25°C (P). The results study contributes to selecting a suitable AI approach to quickly and accurately determine the CT Index of asphalt concrete mixtures.
Collapse
Affiliation(s)
| | - Thanh-Hai Le
- University of Transport Technology, Hanoi, Vietnam
| | | | | |
Collapse
|
12
|
Nguyen QTN, Nguyen P, Wang C, Phuc PT, Lin R, Hung C, Kuo N, Cheng Y, Lin S, Hsieh Z, Cheng C, Hsu M, Hsu JC. Machine learning approaches for predicting 5-year breast cancer survival: A multicenter study. Cancer Sci 2023; 114:4063-4072. [PMID: 37489252 PMCID: PMC10551582 DOI: 10.1111/cas.15917] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 06/27/2023] [Accepted: 07/05/2023] [Indexed: 07/26/2023] Open
Abstract
The study used clinical data to develop a prediction model for breast cancer survival. Breast cancer prognostic factors were explored using machine learning techniques. We conducted a retrospective study using data from the Taipei Medical University Clinical Research Database, which contains electronic medical records from three affiliated hospitals in Taiwan. The study included female patients aged over 20 years who were diagnosed with primary breast cancer and had medical records in hospitals between January 1, 2009 and December 31, 2020. The data were divided into training and external testing datasets. Nine different machine learning algorithms were applied to develop the models. The performances of the algorithms were measured using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. A total of 3914 patients were included in the study. The highest AUC of 0.95 was observed with the artificial neural network model (accuracy, 0.90; sensitivity, 0.71; specificity, 0.73; PPV, 0.28; NPV, 0.94; and F1-score, 0.37). Other models showed relatively high AUC, ranging from 0.75 to 0.83. According to the optimal model results, cancer stage, tumor size, diagnosis age, surgery, and body mass index were the most critical factors for predicting breast cancer survival. The study successfully established accurate 5-year survival predictive models for breast cancer. Furthermore, the study found key factors that could affect breast cancer survival in Taiwanese women. Its results might be used as a reference for the clinical practice of breast cancer treatment.
Collapse
Affiliation(s)
- Quynh Thi Nhu Nguyen
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Phung‐Anh Nguyen
- Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan
- Clinical Big Data Research CenterTaipei Medical University Hospital, Taipei Medical UniversityTaipei CityTaiwan
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Chun‐Jung Wang
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Phan Thanh Phuc
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Ruo‐Kai Lin
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Chin‐Sheng Hung
- Department of Surgery, School of Medicine, College of MedicineTaipei Medical UniversityTaipei CityTaiwan
| | - Nei‐Hui Kuo
- Oncology CenterTaipei Medical University HospitalTaipei CityTaiwan
| | - Yu‐Wen Cheng
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Shwu‐Jiuan Lin
- School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
| | - Zong‐You Hsieh
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Chi‐Tsun Cheng
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Min‐Huei Hsu
- Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan
- Graduate Institute of Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| | - Jason C. Hsu
- Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan
- Clinical Big Data Research CenterTaipei Medical University Hospital, Taipei Medical UniversityTaipei CityTaiwan
- Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
- International Ph.D. Program in Biotech and Healthcare Management, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
| |
Collapse
|
13
|
Fradera-Soler M, Mravec J, Harholt J, Grace OM, Jørgensen B. Cell wall polysaccharide and glycoprotein content tracks growth-form diversity and an aridity gradient in the leaf-succulent genus Crassula. PHYSIOLOGIA PLANTARUM 2023; 175:e14007. [PMID: 37882271 DOI: 10.1111/ppl.14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 06/22/2023] [Accepted: 08/14/2023] [Indexed: 10/27/2023]
Abstract
Cell wall traits are believed to be a key component of the succulent syndrome, an adaptive syndrome to drought, yet the variability of such traits remains largely unknown. In this study, we surveyed the leaf polysaccharide and glycoprotein composition in a wide sampling of Crassula species that occur naturally along an aridity gradient in southern Africa, and we interpreted its adaptive significance in relation to growth form and arid adaptation. To study the glycomic diversity, we sampled leaf material from 56 Crassula taxa and performed comprehensive microarray polymer profiling to obtain the relative content of cell wall polysaccharides and glycoproteins. This analysis was complemented by the determination of monosaccharide composition and immunolocalization in leaf sections using glycan-targeting antibodies. We found that compact and non-compact Crassula species occupy distinct phenotypic spaces in terms of leaf glycomics, particularly in regard to rhamnogalacturonan I, its arabinan side chains, and arabinogalactan proteins (AGPs). Moreover, these cell wall components also correlated positively with increasing aridity, which suggests that they are likely advantageous in terms of arid adaptation. These differences point to compact Crassula species having more elastic cell walls with plasticizing properties, which can be interpreted as an adaptation toward increased drought resistance. Furthermore, we report an intracellular pool of AGPs associated with oil bodies and calcium oxalate crystals, which could be a peculiarity of Crassula and could be linked to increased drought resistance. Our results indicate that glycomics may be underlying arid adaptation and drought resistance in succulent plants.
Collapse
Affiliation(s)
- Marc Fradera-Soler
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
- Royal Botanic Gardens, London, UK
| | - Jozef Mravec
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
- Plant Science and Biodiversity Center, Institute of Plant Genetics and Biotechnology, Slovak Academy of Sciences, Nitra, Slovakia
| | | | - Olwen M Grace
- Royal Botanic Gardens, London, UK
- Royal Botanic Garden Edinburgh, Edinburgh, UK
| | - Bodil Jørgensen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| |
Collapse
|
14
|
Veeramani A, Zhang AS, Blackburn AZ, Etzel CM, DiSilvestro KJ, McDonald CL, Daniels AH. An Artificial Intelligence Approach to Predicting Unplanned Intubation Following Anterior Cervical Discectomy and Fusion. Global Spine J 2023; 13:1849-1855. [PMID: 35132907 PMCID: PMC10556901 DOI: 10.1177/21925682211053593] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
STUDY DESIGN Level III retrospective database study. OBJECTIVES The purpose of this study is to determine if machine learning algorithms are effective in predicting unplanned intubation following anterior cervical discectomy and fusion (ACDF). METHODS The National Surgical Quality Initiative Program (NSQIP) was queried to select patients who had undergone ACDF. Machine learning analysis was conducted in Python and multivariate regression analysis was conducted in R. C-Statistics area under the curve (AUC) and prediction accuracy were used to measure the classifier's effectiveness in distinguishing cases. RESULTS In total, 54 502 patients met the study criteria. Of these patients, .51% underwent an unplanned re-intubation. Machine learning algorithms accurately classified between 72%-100% of the test cases with AUC values of between .52-.77. Multivariable regression indicated that the number of levels fused, male sex, COPD, American Society of Anesthesiologists (ASA) > 2, increased operating time, Age > 65, pre-operative weight loss, dialysis, and disseminated cancer were associated with increased risk of unplanned intubation. CONCLUSIONS The models presented here achieved high accuracy in predicting risk factors for re-intubation following ACDF surgery. Machine learning analysis may be useful in identifying patients who are at a higher risk of unplanned post-operative re-intubation and their treatment plans can be modified to prophylactically prevent respiratory compromise and consequently unplanned re-intubation.
Collapse
Affiliation(s)
- Ashwin Veeramani
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Andrew S Zhang
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Amy Z. Blackburn
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Christine M. Etzel
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Kevin J. DiSilvestro
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Christopher L. McDonald
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| | - Alan H. Daniels
- Department of Orthopedic Surgery, Rhode Island Hospital, Warren Alpert Medical School of Brown University, Providence, RI, USA
| |
Collapse
|
15
|
Fife DA, D'Onofrio J. Common, uncommon, and novel applications of random forest in psychological research. Behav Res Methods 2023; 55:2447-2466. [PMID: 35915361 DOI: 10.3758/s13428-022-01901-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2022] [Indexed: 01/08/2023]
Abstract
Recent reform efforts have pushed toward a better understanding of the distinction between exploratory and confirmatory research, and appropriate use of each. As some utilize more exploratory tools, it may be tempting to employ multiple linear regression models. In this paper, we advocate for the use of random forest (RF) models. RF is able to obtain better predictive performance than traditional regression, while also inherently protecting against overfitting as well as detecting nonlinear effects and interactions among predictors. Given the advantages of RF compared to other statistical procedures, it is a tool commonly used within a plethora of industries, including stock trading, banking, pharmaceuticals, and patient healthcare planning. However, we find RF is used within the field of psychology comparatively less frequently. In the current paper, we advocate for RF as an important statistical tool within the context of behavioral and psychological research. In hopes of increasing the use of RF in the field of psychology, we provide information pertaining to the limitations one might confront in using RF and how to overcome such limitations. Moreover, we discuss various methods for how to optimally utilize RF with psychological data, such as nonparametric modeling, interaction and nonlinearity detection, variable selection, prediction and classification modeling, and assessing parameters of Monte Carlo simulations. Throughout, we illustrate the use of RF with visualization strategies, aimed to make RF models more comprehensible and intuitive.
Collapse
|
16
|
Patro A, Perkins EL, Ortega CA, Lindquist NR, Dawant BM, Gifford R, Haynes DS, Chowdhury N. Machine Learning Approach for Screening Cochlear Implant Candidates: Comparing With the 60/60 Guideline. Otol Neurotol 2023; 44:e486-e491. [PMID: 37400135 PMCID: PMC10524241 DOI: 10.1097/mao.0000000000003927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
OBJECTIVE To develop a machine learning-based referral guideline for patients undergoing cochlear implant candidacy evaluation (CICE) and to compare with the widely used 60/60 guideline. STUDY DESIGN Retrospective cohort. SETTING Tertiary referral center. PATIENTS 772 adults undergoing CICE from 2015 to 2020. INTERVENTIONS Variables included demographics, unaided thresholds, and word recognition score. A random forest classification model was trained on patients undergoing CICE, and bootstrap cross-validation was used to assess the modeling approach's performance. MAIN OUTCOME MEASURES The machine learning-based referral tool was evaluated against the 60/60 guideline based on ability to identify CI candidates under traditional and expanded criteria. RESULTS Of 587 patients with complete data, 563 (96%) met candidacy at our center, and the 60/60 guideline identified 512 (87%) patients. In the random forest model, word recognition score; thresholds at 3000, 2000, and 125; and age at CICE had the largest impact on candidacy (mean decrease in Gini coefficient, 2.83, 1.60, 1.20, 1.17, and 1.16, respectively). The 60/60 guideline had a sensitivity of 0.91, a specificity of 0.42, and an accuracy of 0.89 (95% confidence interval, 0.86-0.91). The random forest model obtained higher sensitivity (0.96), specificity (1.00), and accuracy (0.96; 95% confidence interval, 0.95-0.98). Across 1,000 bootstrapped iterations, the model yielded a median sensitivity of 0.92 (interquartile range [IQR], 0.85-0.98), specificity of 1.00 (IQR, 0.88-1.00), accuracy of 0.93 (IQR, 0.85-0.97), and area under the curve of 0.96 (IQR, 0.93-0.98). CONCLUSIONS A novel machine learning-based screening model is highly sensitive, specific, and accurate in predicting CI candidacy. Bootstrapping confirmed that this approach is potentially generalizable with consistent results.
Collapse
Affiliation(s)
- Ankita Patro
- Department of Otolaryngology–Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Elizabeth L. Perkins
- Department of Otolaryngology–Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | | | - Nathan R. Lindquist
- Department of Otolaryngology–Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Benoit M. Dawant
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee
| | - René Gifford
- Department of Hearing and Speech Science, Vanderbilt University Medical Center, Nashville, Tennessee
| | - David S. Haynes
- Department of Otolaryngology–Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Naweed Chowdhury
- Department of Otolaryngology–Head and Neck Surgery, Vanderbilt University Medical Center, Nashville, Tennessee
| |
Collapse
|
17
|
Tian L, Wu W, Yu T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules 2023; 13:1153. [PMID: 37509188 PMCID: PMC10377046 DOI: 10.3390/biom13071153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023] Open
Abstract
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
Collapse
Affiliation(s)
- Leqi Tian
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
| | - Wenbin Wu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tianwei Yu
- School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China
- Guangdong Provincial Key Laboratory of Big Data Computing, Shenzhen 518172, China
| |
Collapse
|
18
|
Ribeiro C, Farmer CK, de Magalhães JP, Freitas AA. Predicting lifespan-extending chemical compounds for C. elegans with machine learning and biologically interpretable features. Aging (Albany NY) 2023; 15:6073-6099. [PMID: 37450404 PMCID: PMC10373959 DOI: 10.18632/aging.204866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/19/2023] [Indexed: 07/18/2023]
Abstract
Recently, there has been a growing interest in the development of pharmacological interventions targeting ageing, as well as in the use of machine learning for analysing ageing-related data. In this work, we use machine learning methods to analyse data from DrugAge, a database of chemical compounds (including drugs) modulating lifespan in model organisms. To this end, we created four types of datasets for predicting whether or not a compound extends the lifespan of C. elegans (the most frequent model organism in DrugAge), using four different types of predictive biological features, based on: compound-protein interactions, interactions between compounds and proteins encoded by ageing-related genes, and two types of terms annotated for proteins targeted by the compounds, namely Gene Ontology (GO) terms and physiology terms from the WormBase's Phenotype Ontology. To analyse these datasets, we used a combination of feature selection methods in a data pre-processing phase and the well-established random forest algorithm for learning predictive models from the selected features. In addition, we interpreted the most important features in the two best models in light of the biology of ageing. One noteworthy feature was the GO term "Glutathione metabolic process", which plays an important role in cellular redox homeostasis and detoxification. We also predicted the most promising novel compounds for extending lifespan from a list of previously unlabelled compounds. These include nitroprusside, which is used as an antihypertensive medication. Overall, our work opens avenues for future work in employing machine learning to predict novel life-extending compounds.
Collapse
Affiliation(s)
- Caio Ribeiro
- School of Computing, University of Kent, Canterbury, Kent, UK
| | | | - João Pedro de Magalhães
- Genomics of Ageing and Rejuvenation Lab, Institute of Inflammation and Ageing, University of Birmingham, Birmingham, UK
| | - Alex A. Freitas
- School of Computing, University of Kent, Canterbury, Kent, UK
| |
Collapse
|
19
|
Mulder FAA, Tenori L, Licari C, Luchinat C. Practical considerations for rapid and quantitative NMR-based metabolomics. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023; 352:107462. [PMID: 37141802 DOI: 10.1016/j.jmr.2023.107462] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/23/2023] [Accepted: 04/21/2023] [Indexed: 05/06/2023]
Abstract
NMR is a key technology for metabolomics because of its robustness and reproducibility. Herein we discuss practical considerations that extend the utility of NMR spectroscopy. First, the long T1 spin relaxation times of small molecules limits high-throughput data acquisition because most experimental time is lost while waiting for signal recovery. In principle, the addition of a small amount of commercially-available paramagnetic gadolinium chelate allows cost-effective and efficient high-throughput mixture analysis with correct concentration determination. However, idle time caused by slow temperature regulation during sample exchanges, poses a next constraint. We show how, with proper care, NMR sample scanning times can be reduced additionally by a factor of two. Lastly, we describe how equidistant bucketing is a simple and fast procedure for metabolomic fingerprinting. The combination of these advancements help to make NMR metabolomics more versatile than it is today.
Collapse
Affiliation(s)
| | - Leonardo Tenori
- Magnetic Resonance Center (CERM) and Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, Florence, Italy; Consorzio Interuniversitario Risonanze Magnetiche di Metallo Proteine (CIRMMP), Sesto Fiorentino, Florence, Italy
| | - Cristina Licari
- Magnetic Resonance Center (CERM) and Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, Florence, Italy; Consorzio Interuniversitario Risonanze Magnetiche di Metallo Proteine (CIRMMP), Sesto Fiorentino, Florence, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM) and Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, Florence, Italy; Consorzio Interuniversitario Risonanze Magnetiche di Metallo Proteine (CIRMMP), Sesto Fiorentino, Florence, Italy; GiottoBiotech s.r.l., Sesto Fiorentino, Florence, Italy.
| |
Collapse
|
20
|
Kwak S, Lee HJ, Kim S, Park JB, Lee SP, Kim HK, Kim YJ. Machine learning reveals sex-specific associations between cardiovascular risk factors and incident atherosclerotic cardiovascular disease. Sci Rep 2023; 13:9364. [PMID: 37291421 PMCID: PMC10250402 DOI: 10.1038/s41598-023-36450-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/03/2023] [Indexed: 06/10/2023] Open
Abstract
We aimed to investigate sex-specific associations between cardiovascular risk factors and atherosclerotic cardiovascular disease (ASCVD) risk using machine learning. We studied 258,279 individuals (132,505 [51.3%] men and 125,774 [48.7%] women) without documented ASCVD who underwent national health screening. A random forest model was developed using 16 variables to predict the 10-year ASCVD in each sex. The association between cardiovascular risk factors and 10-year ASCVD probabilities was examined using partial dependency plots. During the 10-year follow-up, 12,319 (4.8%) individuals developed ASCVD, with a higher incidence in men than in women (5.3% vs. 4.2%, P < 0.001). The performance of the random forest model was similar to that of the pooled cohort equations (area under the receiver operating characteristic curve, men: 0.733 vs. 0.727; women: 0.769 vs. 0.762). Age and body mass index were the two most important predictors in the random forest model for both sexes. In partial dependency plots, advanced age and increased waist circumference were more strongly associated with higher probabilities of ASCVD in women. In contrast, ASCVD probabilities increased more steeply with higher total cholesterol and low-density lipoprotein (LDL) cholesterol levels in men. These sex-specific associations were verified in the conventional Cox analyses. In conclusion, there were significant sex differences in the association between cardiovascular risk factors and ASCVD events. While higher total cholesterol or LDL cholesterol levels were more strongly associated with the risk of ASCVD in men, older age and increased waist circumference were more strongly associated with the risk of ASCVD in women.
Collapse
Affiliation(s)
- Soongu Kwak
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Hyun-Jung Lee
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea.
- Division of Cardiology, Severance Cardiovascular Hospital, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea.
| | - Seungyeon Kim
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea.
- College of Pharmacy, Dankook University, Dandae-ro 119, Dongnam-gu, Cheonan-si, Chungcheongnam-do, 31116, Republic of Korea.
| | - Jun-Bean Park
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Seung-Pyo Lee
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Hyung-Kwan Kim
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Yong-Jin Kim
- Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, Seoul National University College of Medicine, Daehak-ro 101, Jongno-gu, Seoul, 03080, Republic of Korea
| |
Collapse
|
21
|
Pieplow C, Wessel G. Functional annotation of a hugely expanded nanos repertoire in Lytechinus variegatus, the green sea urchin. Mol Reprod Dev 2023; 90:310-322. [PMID: 37039283 PMCID: PMC10225336 DOI: 10.1002/mrd.23684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 02/17/2023] [Accepted: 03/18/2023] [Indexed: 04/12/2023]
Abstract
Nanos genes encode essential RNA-binding proteins involved in germline determination and germline stem cell maintenance. When examining diverse classes of echinoderms, typically three, sometimes four, nanos genes are present. In this analysis, we identify and annotate nine nanos orthologs in the green sea urchin, Lytechinus variegatus (Lv). All nine genes are transcribed and grouped into three distinct classes. Class one includes the germline Nanos, with one member: Nanos2. Class two includes Nanos3-like genes, with significant sequence similarity to Nanos3 in the purple sea urchin, Strongylocentrotus purpuratus (Sp), but with wildly variable expression patterns. The third class includes several previously undescribed nanos zinc-finger genes that may be the result of duplications of Nanos2. All nine nanos transcripts occupy unique genomic loci and are expressed with unique temporal profiles during development. Importantly, here we describe and characterize the unique genomic location, conservation, and phylogeny of the Lv ortholog of the well-studied Sp Nanos2. However, in addition to the conserved germline functioning Nanos2, the green sea urchin appears to be an outlier in the echinoderm phyla with eight additional nanos genes. We hypothesize that this expansion of nanos gene members may be the result of a previously uncharacterized L1-class transposon encoded on the opposite strand of a nanos2 pseudogene present on chromosome 12 in this species. The expansion of nanos genes described here represents intriguing insights into germline specification and nanos evolution in this species of sea urchin.
Collapse
Affiliation(s)
- Cosmo Pieplow
- MCB Department, Division of Biomedicine, Brown University, Providence RI 02912
| | - Gary Wessel
- MCB Department, Division of Biomedicine, Brown University, Providence RI 02912
| |
Collapse
|
22
|
Wang D, Tang G, Zhao L, Wang M, Chen L, Zhao C, Liang Z, Chen J, Cao Y, Yao J. Potential roles of the rectum keystone microbiota in modulating the microbial community and growth performance in goat model. J Anim Sci Biotechnol 2023; 14:55. [PMID: 37029437 PMCID: PMC10080759 DOI: 10.1186/s40104-023-00850-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 02/05/2023] [Indexed: 04/09/2023] Open
Abstract
BACKGROUND Ruminal microbiota in early life plays critical roles in the life-time health and productivity of ruminant animals. However, understanding of the relationship between gut microbiota and ruminant phenotypes is very limited. Here, the relationship between the rectum microbiota, their primary metabolites, and growth rate of a total of 76 young dairy goats (6-month-old) were analyzed, and then 10 goats with the highest or lowest growth rates respectively were further compared for the differences in the rectum microbiota, metabolites, and animal's immune parameters, to investigate the potential mechanisms by which the rectum microbiota contributes to the health and growth rate. RESULTS The analysis of Spearman correlation and microbial co-occurrence network indicated that some keystone rectum microbiota, including unclassified Prevotellaceae, Faecalibacterium and Succinivibrio, were the key modulators to shape the rectum microbiota and closely correlated with the rectum SCFA production and serum IgG, which contribute to the health and growth rate of young goats. In addition, random forest machine learning analysis suggested that six bacterial taxa in feces could be used as potential biomarkers for differentiating high or low growth rate goats, with 98.3% accuracy of prediction. Moreover, the rectum microbiota played more important roles in gut fermentation in early life (6-month-old) than in adulthood stage (19-month-old) of goats. CONCLUSION We concluded that the rectum microbiota was associated with the health and growth rate of young goats, and can be a focus on the design of the early-life gut microbial intervention.
Collapse
Affiliation(s)
- Dangdang Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Guangfu Tang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Lichao Zhao
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Mengya Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Luyu Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Congcong Zhao
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Ziqi Liang
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Jie Chen
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yangchun Cao
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China.
| | - Junhu Yao
- College of Animal Science and Technology, Northwest A&F University, Yangling, 712100, Shaanxi, China.
| |
Collapse
|
23
|
Spänig S, Michel A, Heider D. Unsupervised encoding selection through ensemble pruning for biomedical classification. BioData Min 2023; 16:10. [PMID: 36927546 PMCID: PMC10018861 DOI: 10.1186/s13040-022-00317-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 11/27/2022] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers leverage artificial intelligence to circumvent the cumbersome, wet-lab-based identification and automate the detection of promising candidates. However, the prediction of a peptide's function is not limited to antimicrobial efficiency. To date, multiple studies successfully classified additional properties, e.g., antiviral or cell-penetrating effects. In this light, ensemble classifiers are employed aiming to further improve the prediction. Although we recently presented a workflow to significantly diminish the initial encoding choice, an entire unsupervised encoding selection, considering various machine learning models, is still lacking. RESULTS We developed a workflow, automatically selecting encodings and generating classifier ensembles by employing sophisticated pruning methods. We observed that the Pareto frontier pruning is a good method to create encoding ensembles for the datasets at hand. In addition, encodings combined with the Decision Tree classifier as the base model are often superior. However, our results also demonstrate that none of the ensemble building techniques is outstanding for all datasets. CONCLUSION The workflow conducts multiple pruning methods to evaluate ensemble classifiers composed from a wide range of peptide encodings and base models. Consequently, researchers can use the workflow for unsupervised encoding selection and ensemble creation. Ultimately, the extensible workflow can be used as a plugin for the PEPTIDE REACToR, further establishing it as a versatile tool in the domain.
Collapse
Affiliation(s)
- Sebastian Spänig
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Alexander Michel
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Dominik Heider
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany.
| |
Collapse
|
24
|
Banaye Yazdipour A, Masoorian H, Ahmadi M, Mohammadzadeh N, Ayyoubzadeh SM. Predicting the toxicity of nanoparticles using artificial intelligence tools: a systematic review. Nanotoxicology 2023; 17:62-77. [PMID: 36883698 DOI: 10.1080/17435390.2023.2186279] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Nanoparticles have been used extensively in different scientific fields. Due to the possible destructive effects of nanoparticles on the environment or the biological systems, their toxicity evaluation is a crucial phase for studying nanomaterial safety. In the meantime, experimental approaches for toxicity assessment of various nanoparticles are expensive and time-consuming. Thus, an alternative technique, such as artificial intelligence (AI), could be valuable for predicting nanoparticle toxicity. Therefore, in this review, the AI tools were investigated for the toxicity assessment of nanomaterials. To this end, a systematic search was performed on PubMed, Web of Science, and Scopus databases. Articles were included or excluded based on pre-defined inclusion and exclusion criteria, and duplicate studies were excluded. Finally, twenty-six studies were included. The majority of the studies were conducted on metal oxide and metallic nanoparticles. In addition, Random Forest (RF) and Support Vector Machine (SVM) had the most frequency in the included studies. Most of the models demonstrated acceptable performance. Overall, AI could provide a robust, fast, and low-cost tool for the evaluation of nanoparticle toxicity.
Collapse
Affiliation(s)
- Alireza Banaye Yazdipour
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.,Students' Scientific Research Center (SSRC), Tehran University of Medical Sciences, Tehran, Iran
| | - Hoorie Masoorian
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahnaz Ahmadi
- Department of Pharmaceutics and Pharmaceutical Nanotechnology, School of Pharmacy, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Niloofar Mohammadzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
25
|
Rudar J, Golding GB, Kremer SC, Hajibabaei M. Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta Diversity in Medically Relevant 16S Amplicon Sequencing Data. Microbiol Spectr 2023; 11:e0206522. [PMID: 36877086 PMCID: PMC10100742 DOI: 10.1128/spectrum.02065-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 02/11/2023] [Indexed: 03/07/2023] Open
Abstract
Developing an understanding of how microbial communities vary across conditions is an important analytical step. We used 16S rRNA data isolated from human stool samples to investigate whether learned dissimilarities, such as those produced using unsupervised decision tree ensembles, can be used to improve the analysis of the composition of bacterial communities in patients suffering from Crohn's disease and adenomas/colorectal cancers. We also introduce a workflow capable of learning dissimilarities, projecting them into a lower dimensional space, and identifying features that impact the location of samples in the projections. For example, when used with the centered log ratio transformation, our new workflow (TreeOrdination) could identify differences in the microbial communities of Crohn's disease patients and healthy controls. Further investigation of our models elucidated the global impact amplicon sequence variants (ASVs) had on the locations of samples in the projected space and how each ASV impacted individual samples in this space. Furthermore, this approach can be used to integrate patient data easily into the model and results in models that generalize well to unseen data. Models employing multivariate splits can improve the analysis of complex high-throughput sequencing data sets because they are better able to learn about the underlying structure of the data set. IMPORTANCE There is an ever-increasing level of interest in accurately modeling and understanding the roles that commensal organisms play in human health and disease. We show that learned representations can be used to create informative ordinations. We also demonstrate that the application of modern model introspection algorithms can be used to investigate and quantify the impacts of taxa in these ordinations, and that the taxa identified by these approaches have been associated with immune-mediated inflammatory diseases and colorectal cancer.
Collapse
Affiliation(s)
- Josip Rudar
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Stefan C. Kremer
- School of Computer Science, University of Guelph, Guelph, Ontario, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
26
|
Mavragani A, Bozio C, Butterfield K, Reynolds S, Reese SE, Ball S, Steffens A, Demarco M, McEvoy C, Thompson M, Rowley E, Porter RM, Fink RV, Irving SA, Naleway A. Accuracy of COVID-19-Like Illness Diagnoses in Electronic Health Record Data: Retrospective Cohort Study. JMIR Form Res 2023; 7:e39231. [PMID: 36383633 PMCID: PMC9848441 DOI: 10.2196/39231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/13/2022] [Accepted: 09/30/2022] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Electronic health record (EHR) data provide a unique opportunity to study the epidemiology of COVID-19, clinical outcomes of the infection, comparative effectiveness of therapies, and vaccine effectiveness but require a well-defined computable phenotype of COVID-19-like illness (CLI). OBJECTIVE The objective of this study was to evaluate the performance of pathogen-specific and other acute respiratory illness (ARI) International Statistical Classification of Diseases-9 and -10 codes in identifying COVID-19 cases in emergency department (ED) or urgent care (UC) and inpatient settings. METHODS We conducted a retrospective observational cohort study using EHR, claims, and laboratory information system data of ED or UC and inpatient encounters from 4 health systems in the United States. Patients who were aged ≥18 years, had an ED or UC or inpatient encounter for an ARI, and underwent a SARS-CoV-2 polymerase chain reaction test between March 1, 2020, and March 31, 2021, were included. We evaluated various CLI definitions using combinations of International Statistical Classification of Diseases-10 codes as follows: COVID-19-specific codes; CLI definition used in VISION network studies; ARI signs, symptoms, and diagnosis codes only; signs and symptoms of ARI only; and random forest model definitions. We evaluated the sensitivity, specificity, positive predictive value, and negative predictive value of each CLI definition using a positive SARS-CoV-2 polymerase chain reaction test as the reference standard. We evaluated the performance of each CLI definition for distinct hospitalization and ED or UC cohorts. RESULTS Among 90,952 hospitalizations and 137,067 ED or UC visits, 5627 (6.19%) and 9866 (7.20%) were positive for SARS-CoV-2, respectively. COVID-19-specific codes had high sensitivity (91.6%) and specificity (99.6%) in identifying patients with SARS-CoV-2 positivity among hospitalized patients. The VISION CLI definition maintained high sensitivity (95.8%) but lowered specificity (45.5%). By contrast, signs and symptoms of ARI had low sensitivity and positive predictive value (28.9% and 11.8%, respectively) but higher specificity and negative predictive value (85.3% and 94.7%, respectively). ARI diagnoses, signs, and symptoms alone had low predictive performance. All CLI definitions had lower sensitivity for ED or UC encounters. Random forest approaches identified distinct CLI definitions with high performance for hospital encounters and moderate performance for ED or UC encounters. CONCLUSIONS COVID-19-specific codes have high sensitivity and specificity in identifying adults with positive SARS-CoV-2 test results. Separate combinations of COVID-19-specific codes and ARI codes enhance the utility of CLI definitions in studies using EHR data in hospital and ED or UC settings.
Collapse
Affiliation(s)
| | - Catherine Bozio
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Sue Reynolds
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | | | - Andrea Steffens
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | | | - Mark Thompson
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Rachael M Porter
- Centers for Disease Control and Prevention, Atlanta, GA, United States
| | | | - Stephanie A Irving
- Science Programs Department, Kaiser Permanente Center for Health Research, Portland, OR, United States
| | - Allison Naleway
- Science Programs Department, Kaiser Permanente Center for Health Research, Portland, OR, United States
| |
Collapse
|
27
|
Bowe AK, Lightbody G, Staines A, Murray DM. Big data, machine learning, and population health: predicting cognitive outcomes in childhood. Pediatr Res 2023; 93:300-307. [PMID: 35681091 PMCID: PMC7614199 DOI: 10.1038/s41390-022-02137-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/05/2022] [Accepted: 05/17/2022] [Indexed: 11/09/2022]
Abstract
The application of machine learning (ML) to address population health challenges has received much less attention than its application in the clinical setting. One such challenge is addressing disparities in early childhood cognitive development-a complex public health issue rooted in the social determinants of health, exacerbated by inequity, characterised by intergenerational transmission, and which will continue unabated without novel approaches to address it. Early life, the period of optimal neuroplasticity, presents a window of opportunity for early intervention to improve cognitive development. Unfortunately for many, this window will be missed, and intervention may never occur or occur only when overt signs of cognitive delay manifest. In this review, we explore the potential value of ML and big data analysis in the early identification of children at risk for poor cognitive outcome, an area where there is an apparent dearth of research. We compare and contrast traditional statistical methods with ML approaches, provide examples of how ML has been used to date in the field of neurodevelopmental disorders, and present a discussion of the opportunities and risks associated with its use at a population level. The review concludes by highlighting potential directions for future research in this area. IMPACT: To date, the application of machine learning to address population health challenges in paediatrics lags behind other clinical applications. This review provides an overview of the public health challenge we face in addressing disparities in childhood cognitive development and focuses on the cornerstone of early intervention. Recent advances in our ability to collect large volumes of data, and in analytic capabilities, provide a potential opportunity to improve current practices in this field. This review explores the potential role of machine learning and big data analysis in the early identification of children at risk for poor cognitive outcomes.
Collapse
Affiliation(s)
- Andrea K. Bowe
- grid.7872.a0000000123318773INFANT Research Centre, University College Cork, Cork, Ireland
| | - Gordon Lightbody
- grid.7872.a0000000123318773INFANT Research Centre, University College Cork, Cork, Ireland ,grid.7872.a0000000123318773Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Anthony Staines
- grid.15596.3e0000000102380260School of Nursing, Psychotherapy, and Community Health, Dublin City University, Dublin, Ireland
| | - Deirdre M. Murray
- grid.7872.a0000000123318773INFANT Research Centre, University College Cork, Cork, Ireland
| |
Collapse
|
28
|
Ma X, Jiang S, Zhang Z, Wang H, Song C, He J. Long‐term collar deployment leads to bias in soil respiration measurements. Methods Ecol Evol 2023. [DOI: 10.1111/2041-210x.14056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Xiaoliang Ma
- State Key Laboratory of Herbage Improvement and Grassland Agro‐Ecosystems, College of Pastoral Agriculture Science and Technology Lanzhou University Lanzhou China
| | - Shengjing Jiang
- State Key Laboratory of Herbage Improvement and Grassland Agro‐Ecosystems, College of Pastoral Agriculture Science and Technology Lanzhou University Lanzhou China
| | - Zhiqi Zhang
- Institute of Ecology, College of Urban and Environmental Sciences Peking University Beijing China
| | - Hao Wang
- State Key Laboratory of Herbage Improvement and Grassland Agro‐Ecosystems, College of Ecology Lanzhou University Lanzhou China
| | - Chao Song
- State Key Laboratory of Herbage Improvement and Grassland Agro‐Ecosystems, College of Ecology Lanzhou University Lanzhou China
| | - Jin‐Sheng He
- State Key Laboratory of Herbage Improvement and Grassland Agro‐Ecosystems, College of Pastoral Agriculture Science and Technology Lanzhou University Lanzhou China
- Institute of Ecology, College of Urban and Environmental Sciences Peking University Beijing China
| |
Collapse
|
29
|
Evaluating Histological Subtypes Classification of Primary Lung Cancers on Unenhanced Computed Tomography Based on Random Forest Model. JOURNAL OF HEALTHCARE ENGINEERING 2023; 2023:8964676. [PMID: 36794098 PMCID: PMC9925238 DOI: 10.1155/2023/8964676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/07/2022] [Accepted: 01/21/2023] [Indexed: 02/08/2023]
Abstract
Lung cancer is the leading cause of cancer-related death in many countries, and an accurate histopathological diagnosis is of great importance in subsequent treatment. The aim of this study was to establish the random forest (RF) model based on radiomic features to automatically classify and predict lung adenocarcinoma (ADC), lung squamous cell carcinoma (SCC), and small cell lung cancer (SCLC) on unenhanced computed tomography (CT) images. Eight hundred and fifty-two patients (mean age: 61.4, range: 29-87, male/female: 536/316) with preoperative unenhanced CT and postoperative histopathologically confirmed primary lung cancers, including 525 patients with ADC, 161 patients with SCC, and 166 patients with SCLC, were included in this retrospective study. Radiomic features were extracted, selected, and then used to establish the RF classification model to analyse and classify primary lung cancers into three subtypes, including ADC, SCC, and SCLC according to histopathological results. The training (446 ADC, 137 SCC, and 141 SCLC) and testing cohorts (79 ADC, 24 SCC, and 25 SCLC) accounted for 85% and 15% of the whole datasets, respectively. The prediction performance of the RF classification model was evaluated by F1 scores and the receiver operating characteristic (ROC) curve. On the testing cohort, the areas under the ROC curve (AUC) of the RF model in classifying ADC, SCC, and SCLC were 0.74, 0.77, and 0.88, respectively. The F1 scores achieved 0.80, 0.40, and 0.73 in ADC, SCC, and SCLC, respectively, and the weighted average F1 score was 0.71. In addition, for the RF classification model, the precisions were 0.72, 0.64, and 0.70; the recalls were 0.86, 0.29, and 0.76; and the specificities were 0.55, 0.96, and 0.92 in ADC, SCC, and SCLC. The primary lung cancers were feasibly and effectively classified into ADC, SCC, and SCLC based on the combination of RF classification model and radiomic features, which has the potential for noninvasive predicting histological subtypes of primary lung cancers.
Collapse
|
30
|
Risi E, Lisanti C, Vignoli A, Biagioni C, Paderi A, Cappadona S, Monte FD, Moretti E, Sanna G, Livraghi L, Malorni L, Benelli M, Puglisi F, Luchinat C, Tenori L, Biganzoli L. Risk assessment of disease recurrence in early breast cancer: A serum metabolomic study focused on elderly patients. Transl Oncol 2022; 27:101585. [PMID: 36403505 PMCID: PMC9676351 DOI: 10.1016/j.tranon.2022.101585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/28/2022] [Accepted: 11/08/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND We previously showed that metabolomics predicts relapse in early breast cancer (eBC) patients, unselected by age. This study aims to identify a "metabolic signature" that differentiates eBC from advanced breast cancer (aBC) patients, and to investigate its potential prognostic role in an elderly population. METHODS Serum samples from elderly breast cancer (BC) patients enrolled in 3 onco-geriatric trials, were retrospectively analyzed via proton nuclear magnetic resonance (1H NMR) spectroscopy. Three nuclear magnetic resonance (NMR) spectra were acquired for each serum sample: NOESY1D, CPMG, Diffusion-edited. Random Forest (RF) models to predict BC relapse were built on NMR spectra, and resulting RF risk scores were evaluated by Kaplan-Meier curves. RESULTS Serum samples from 140 eBC patients and 27 aBC were retrieved. In the eBC cohort, median age was 76 years; 77% of patients had luminal, 10% HER2-positive and 13% triple negative (TN) BC. Forty-two percent of patients had tumors >2 cm, 43% had positive axillary nodes. Using NOESY1D spectra, the RF classifier discriminated free-from-recurrence eBC from aBC with sensitivity, specificity and accuracy of 81%, 67% and 70% respectively. We tested the NOESY1D spectra of each eBC patient on the RF models already calculated. We found that patients classified as "high risk" had higher risk of disease recurrence (hazard ratio (HR) 3.42, 95% confidence interval (CI) 1.58-7.37) than patients at low-risk. CONCLUSIONS This analysis suggests that a "metabolic signature", identified employing NMR fingerprinting, is able to predict the risk of disease recurrence in elderly patients with eBC independently from standard clinicopathological features.
Collapse
Affiliation(s)
- Emanuela Risi
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Camilla Lisanti
- Cro Aviano - National Cancer Institute - IRCCS, Medical Oncology and Cancer Prevention, Aviano, Italy
| | - Alessia Vignoli
- Magnetic Resonance Center (CERM), University of Florence, Sesto Fiorentino, Italy
| | | | - Agnese Paderi
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Silvia Cappadona
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Francesca Del Monte
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Erica Moretti
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Giuseppina Sanna
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Luca Livraghi
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | - Luca Malorni
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy
| | | | - Fabio Puglisi
- Cro Aviano - National Cancer Institute - IRCCS, Medical Oncology and Cancer Prevention, Aviano, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Sesto Fiorentino, Italy
| | - Leonardo Tenori
- Magnetic Resonance Center (CERM), University of Florence, Sesto Fiorentino, Italy
| | - Laura Biganzoli
- Sandro Pitigliani Medical Oncology Department, Hospital of Prato, Prato, Italy,Corresponding author.
| |
Collapse
|
31
|
Mao Y, Zhu Z, Pan S, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real-world retrospective cohort study. J Diabetes Investig 2022; 14:309-320. [PMID: 36345236 PMCID: PMC9889616 DOI: 10.1111/jdi.13937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 10/04/2022] [Accepted: 10/16/2022] [Indexed: 11/11/2022] Open
Abstract
AIMS/INTRODUCTION To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. MATERIALS AND METHODS This is a 3-year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10-fold cross-validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi-layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. RESULTS A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10-fold cross-validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823-0.886) in the training set and 0.835 (95% CI 0.779-0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814-0.866) in the training set and 0.834 (95% CI 0.785-0.884) in the test set. CONCLUSIONS In the real-world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value.
Collapse
Affiliation(s)
- Yaqian Mao
- Department of Internal Medicine, Fujian Provincial Hospital South BranchShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Zheng Zhu
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Shuyao Pan
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Wei Lin
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Jixing Liang
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Huibin Huang
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Liantao Li
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Junping Wen
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Gang Chen
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina,Fujian Provincial Key Laboratory of Medical Analysis, Fujian Academy of MedicalFuzhouChina
| |
Collapse
|
32
|
UAV-based classification of maritime Antarctic vegetation types using GEOBIA and random forest. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
33
|
Neto NGB, O'Rourke SA, Zhang M, Fitzgerald HK, Dunne A, Monaghan MG. Non-invasive classification of macrophage polarisation by 2P-FLIM and machine learning. eLife 2022; 11:77373. [PMID: 36254592 PMCID: PMC9578711 DOI: 10.7554/elife.77373] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 09/25/2022] [Indexed: 11/13/2022] Open
Abstract
In this study, we utilise fluorescence lifetime imaging of NAD(P)H-based cellular autofluorescence as a non-invasive modality to classify two contrasting states of human macrophages by proxy of their governing metabolic state. Macrophages derived from human blood-circulating monocytes were polarised using established protocols and metabolically challenged using small molecules to validate their responding metabolic actions in extracellular acidification and oxygen consumption. Large field-of-view images of individual polarised macrophages were obtained using fluorescence lifetime imaging microscopy (FLIM). These were challenged in real time with small-molecule perturbations of metabolism during imaging. We uncovered FLIM parameters that are pronounced under the action of carbonyl cyanide-p-trifluoromethoxyphenylhydrazone (FCCP), which strongly stratifies the phenotype of polarised human macrophages; however, this performance is impacted by donor variability when analysing the data at a single-cell level. The stratification and parameters emanating from a full field-of-view and single-cell FLIM approach serve as the basis for machine learning models. Applying a random forests model, we identify three strongly governing FLIM parameters, achieving an area under the receiver operating characteristics curve (ROC-AUC) value of 0.944 and out-of-bag (OBB) error rate of 16.67% when classifying human macrophages in a full field-of-view image. To conclude, 2P-FLIM with the integration of machine learning models is showed to be a powerful technique for analysis of both human macrophage metabolism and polarisation at full FoV and single-cell level.
Collapse
Affiliation(s)
- Nuno G B Neto
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity College Dublin, Dublin, Ireland.,Trinity Centre for Biomedical Engineering, Trinity Biomedical Science Institute, Trinity College Dublin, Dublin, Ireland
| | - Sinead A O'Rourke
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity College Dublin, Dublin, Ireland.,Trinity Centre for Biomedical Engineering, Trinity Biomedical Science Institute, Trinity College Dublin, Dublin, Ireland.,School of Biochemistry & Immunology and School of Medicine, Trinity Biomedical Science Institute, Trinity College Dublin, Dublin, Ireland
| | - Mimi Zhang
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
| | - Hannah K Fitzgerald
- School of Biochemistry & Immunology and School of Medicine, Trinity Biomedical Science Institute, Trinity College Dublin, Dublin, Ireland
| | - Aisling Dunne
- School of Biochemistry & Immunology and School of Medicine, Trinity Biomedical Science Institute, Trinity College Dublin, Dublin, Ireland.,Advanced Materials for BioEngineering Research (AMBER) Centre, Trinity College Dublin and Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Michael G Monaghan
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity College Dublin, Dublin, Ireland.,Trinity Centre for Biomedical Engineering, Trinity Biomedical Science Institute, Trinity College Dublin, Dublin, Ireland.,Advanced Materials for BioEngineering Research (AMBER) Centre, Trinity College Dublin and Royal College of Surgeons in Ireland, Dublin, Ireland.,CURAM SFI Research Centre for Medical Devices, National University of Ireland, Galway, Ireland
| |
Collapse
|
34
|
Betz LT, Rosen M, Salokangas RKR, Kambeitz J. Disentangling the impact of childhood abuse and neglect on depressive affect in adulthood: A machine learning approach in a general population sample. J Affect Disord 2022; 315:17-26. [PMID: 35882299 DOI: 10.1016/j.jad.2022.07.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/26/2022]
Abstract
BACKGROUND Different types of childhood maltreatment (CM) are key risk factors for psychopathology. Specifically, there is evidence for a unique role of emotional abuse in affective psychopathology in children and youth; however, its predictive power for depressive symptomatology in adulthood is still unknown. Additionally, emotional abuse encompasses several facets, but the strength of their individual contribution to depressive affect has not been examined. METHOD Here, we used a machine learning (ML) approach based on Random Forests to assess the performance of domain scores and individual items from the Childhood Trauma Questionnaire (CTQ) in predicting self-reported levels of depressive affect in an adult general population sample. Models were generated in a training sample (N = 769) and validated in an independent test sample (N = 466). Using state-of-the-art methods from interpretable ML, we identified the most predictive domains and facets of CM for adult depressive affect. RESULTS Models based on individual CM items explained more variance in the independent test sample than models based on CM domain scores (R2 = 7.6 % vs. 6.4 %). Emotional abuse, particularly its more subjective components such as reactions to and appraisal of the abuse, emerged as the strongest predictors of adult depressive affect. LIMITATIONS Assessment of CM was retrospective and lacked information on timing and duration. Moreover, reported rates of CM and depressive affect were comparatively low. CONCLUSIONS Our findings corroborate the strong role of subjective experience in CM-related psychopathology across the lifespan that necessitates greater attention in research, policy, and clinical practice.
Collapse
Affiliation(s)
- Linda T Betz
- Department of Psychiatry and Psychotherapy, Faculty of Medicine and University Hospital of Cologne, University of Cologne, Cologne, Germany.
| | - Marlene Rosen
- Department of Psychiatry and Psychotherapy, Faculty of Medicine and University Hospital of Cologne, University of Cologne, Cologne, Germany
| | | | - Joseph Kambeitz
- Department of Psychiatry and Psychotherapy, Faculty of Medicine and University Hospital of Cologne, University of Cologne, Cologne, Germany
| |
Collapse
|
35
|
Behnamian S, Esposito U, Holland G, Alshehab G, Dobre AM, Pirooznia M, Brimacombe CS, Elhaik E. Temporal population structure, a genetic dating method for ancient Eurasian genomes from the past 10,000 years. CELL REPORTS METHODS 2022; 2:100270. [PMID: 36046618 PMCID: PMC9421539 DOI: 10.1016/j.crmeth.2022.100270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 06/17/2022] [Accepted: 07/19/2022] [Indexed: 11/21/2022]
Abstract
Radiocarbon dating is the gold standard in archeology to estimate the age of skeletons, a key to studying their origins. Many published ancient genomes lack reliable and direct dates, which results in obscure and contradictory reports. We developed the temporal population structure (TPS), a DNA-based dating method for genomes ranging from the Late Mesolithic to today, and applied it to 3,591 ancient and 1,307 modern Eurasians. TPS predictions aligned with the known dates and correctly accounted for kin relationships. TPS dating of poorly dated Eurasian samples resolved conflicting reports in the literature, as illustrated by one test case. We also demonstrated how TPS improved the ability to study phenotypic traits over time. TPS can be used when radiocarbon dating is unfeasible or uncertain or to develop alternative hypotheses for samples younger than 10,000 years ago, a limitation that may be resolved over time as ancient data accumulate.
Collapse
Affiliation(s)
- Sara Behnamian
- Department of Biology, Lund University, 22362 Lund, Sweden
| | - Umberto Esposito
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Grace Holland
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Ghadeer Alshehab
- Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield S1 3JD, UK
| | - Ann M. Dobre
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Mehdi Pirooznia
- National Heart, Lung, and Blood Institute (NHLBI), Bethesda, MD 20892, USA
| | - Conrad S. Brimacombe
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, UK
- Department of Anthropology and Archaeology, University of Bristol, Bristol BS8 1TH, UK
| | - Eran Elhaik
- Department of Biology, Lund University, 22362 Lund, Sweden
| |
Collapse
|
36
|
Damigos G, Zacharaki EI, Zerva N, Pavlopoulos A, Chatzikyrkou K, Koumenti A, Moustakas K, Pantos C, Mourouzis I, Lourbopoulos A. Machine learning based analysis of stroke lesions on mouse tissue sections. J Cereb Blood Flow Metab 2022; 42:1463-1477. [PMID: 35209753 PMCID: PMC9274860 DOI: 10.1177/0271678x221083387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
An unbiased, automated and reliable method for analysis of brain lesions in tissue after ischemic stroke is missing. Manual infarct volumetry or by threshold-based semi-automated approaches is laborious, and biased to human error or biased by many false -positive and -negative data, respectively. Thereby, we developed a novel machine learning, atlas-based method for fully automated stroke analysis in mouse brain slices stained with 2% Triphenyltetrazolium-chloride (2% TTC), named "StrokeAnalyst", which runs on a user-friendly graphical interface. StrokeAnalyst registers subject images on a common spatial domain (a novel mouse TTC- brain atlas of 80 average mathematical images), calculates pixel-based, tissue-intensity statistics (z-scores), applies outlier-detection and machine learning (Random-Forest) models to increase accuracy of lesion detection, and produces volumetry data and detailed neuroanatomical information per lesion. We validated StrokeAnalyst in two separate experimental sets using the filament stroke model. StrokeAnalyst detects stroke lesions in a rater-independent and reproducible way, correctly detects hemispheric volumes even in presence of post-stroke edema and significantly minimizes false-positive errors compared to threshold-based approaches (false-positive rate 1.2-2.3%, p < 0.05). It can process scanner-acquired, and even smartphone-captured or pdf-retrieved images. Overall, StrokeAnalyst surpasses all previous TTC-volumetry approaches and increases quality, reproducibility and reliability of stroke detection in relevant preclinical models.
Collapse
Affiliation(s)
- Gerasimos Damigos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece.,Department of Electrical and Computer Engineering, University of Patras, Patras, Greece
| | - Evangelia I Zacharaki
- Department of Electrical and Computer Engineering, University of Patras, Patras, Greece
| | - Nefeli Zerva
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Angelos Pavlopoulos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantina Chatzikyrkou
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Argyro Koumenti
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | | | - Constantinos Pantos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Iordanis Mourouzis
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Athanasios Lourbopoulos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece.,Institute for Stroke and Dementia Research (ISD), University of Munich Medical Center, Munich, Germany.,Neurointensive Care Unit, Schoen Klinik Bad Aibling, Germany
| |
Collapse
|
37
|
Boueiz A, Xu Z, Chang Y, Masoomi A, Gregory A, Lutz S, Qiao D, Crapo JD, Dy JG, Silverman EK, Castaldi PJ. Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study. CHRONIC OBSTRUCTIVE PULMONARY DISEASES (MIAMI, FLA.) 2022; 9:349-365. [PMID: 35649102 PMCID: PMC9448009 DOI: 10.15326/jcopdf.2021.0275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/18/2022] [Indexed: 05/24/2023]
Abstract
BACKGROUND The heterogeneous nature of chronic obstructive pulmonary disease (COPD) complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features. METHODS We included 4496 smokers with available data from their enrollment and 5-year follow-up visits in the COPD Genetic Epidemiology (COPDGene®) study. We constructed linear regression (LR) and supervised random forest models to predict 5-year progression in forced expiratory in 1 second (FEV1) from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit. RESULTS Predicting the change in FEV1 over time is more challenging than simply predicting the future absolute FEV1 level. For random forest, R-squared was 0.15 and the area under the receiver operator characteristic (ROC) curves for the prediction of participants in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). Random forest provided slightly better performance than LR. The accuracy was best for Global initiative for chronic Obstructive Lung Disease (GOLD) grades 1-2 participants, and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD. CONCLUSION Random forest, along with deep phenotyping, predicts FEV1 progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.
Collapse
Affiliation(s)
- Adel Boueiz
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- *These authors contributed equally
| | - Zhonghui Xu
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- *These authors contributed equally
| | - Yale Chang
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
| | - Aria Masoomi
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
| | - Andrew Gregory
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - Sharon Lutz
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States
| | - Dandi Qiao
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - James D. Crapo
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, Colorado, United States
| | - Jennifer G. Dy
- Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
- Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
| | | |
Collapse
|
38
|
Kurata H, Tsukiyama S, Manavalan B. iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Brief Bioinform 2022; 23:6623727. [PMID: 35772910 DOI: 10.1093/bib/bbac265] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/23/2022] [Accepted: 06/06/2022] [Indexed: 01/22/2023] Open
Abstract
The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.
Collapse
Affiliation(s)
- Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| |
Collapse
|
39
|
Soogun AO, Kharsany ABM, Zewotir T, North D, Ogunsakin RE. Identifying Potential Factors Associated with High HIV viral load in KwaZulu-Natal, South Africa using Multiple Correspondence Analysis and Random Forest Analysis. BMC Med Res Methodol 2022; 22:174. [PMID: 35715730 PMCID: PMC9206247 DOI: 10.1186/s12874-022-01625-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 04/27/2022] [Indexed: 12/02/2022] Open
Abstract
Background Sustainable Human Immunodeficiency Virus (HIV) virological suppression is crucial to achieving the Joint United Nations Programme of HIV/AIDS (UNAIDS) 95–95-95 treatment targets to reduce the risk of onward HIV transmission. Exploratory data analysis is an integral part of statistical analysis which aids variable selection from complex survey data for further confirmatory analysis. Methods In this study, we divulge participants’ epidemiological and biological factors with high HIV RNA viral load (HHVL) from an HIV Incidence Provincial Surveillance System (HIPSS) sequential cross-sectional survey between 2014 and 2015 KwaZulu-Natal, South Africa. Using multiple correspondence analysis (MCA) and random forest analysis (RFA), we analyzed the linkage between socio-demographic, behavioral, psycho-social, and biological factors associated with HHVL, defined as ≥400 copies per m/L. Results Out of 3956 in 2014 and 3868 in 2015, 50.1% and 41% of participants, respectively, had HHVL. MCA and RFA revealed that knowledge of HIV status, ART use, ARV dosage, current CD4 cell count, perceived risk of contracting HIV, number of lifetime HIV tests, number of lifetime sex partners, and ever diagnosed with TB were consistent potential factors identified to be associated with high HIV viral load in the 2014 and 2015 surveys. Based on MCA findings, diverse categories of variables identified with HHVL were, did not know HIV status, not on ART, on multiple dosages of ARV, with less likely perceived risk of contracting HIV and having two or more lifetime sexual partners. Conclusion The high proportion of individuals with HHVL suggests that the UNAIDS 95–95-95 goal of HIV viral suppression is less likely to be achieved. Based on performance and visualization evaluation, MCA was selected as the best and essential exploration tool for identifying and understanding categorical variables’ significant associations and interactions to enhance individual epidemiological understanding of high HIV viral load. When faced with complex survey data and challenges of variables selection in research, exploratory data analysis with robust graphical visualization and reliability that can reveal divers’ structures should be considered. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01625-6.
Collapse
Affiliation(s)
- Adenike O Soogun
- School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Westville Campus, Durban, South Africa. .,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa.
| | - Ayesha B M Kharsany
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, Durban, South Africa
| | - Temesgen Zewotir
- School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Westville Campus, Durban, South Africa
| | - Delia North
- School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Westville Campus, Durban, South Africa
| | - Ropo Ebenezer Ogunsakin
- Biostatistics Unit, Discipline of Public Health Medicine, School of Nursing & Public Health, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
40
|
Quah Y, Yi-Le JC, Park NH, Lee YY, Lee EB, Jang SH, Kim MJ, Rhee MH, Lee SJ, Park SC. Serum biomarker-based osteoporosis risk prediction and the systemic effects of Trifolium pratense ethanolic extract in a postmenopausal model. Chin Med 2022; 17:70. [PMID: 35701790 PMCID: PMC9199188 DOI: 10.1186/s13020-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 05/11/2022] [Indexed: 11/10/2022] Open
Abstract
Background Recent years, a soaring number of marketed Trifolium pratense (red clover) extract products have denoted that a rising number of consumers are turning to natural alternatives to manage postmenopausal symptoms. T. pratense ethanolic extract (TPEE) showed immense potential for their uses in the treatment of menopause complications including osteoporosis and hormone dependent diseases. Early diagnosis of osteoporosis can increase the chance of efficient treatment and reduce fracture risks. Currently, the most common diagnosis of osteoporosis is performed by using dual-energy x-ray absorptiometry (DXA). However, the major limitation of DXA is that it is inaccessible and expensive in rural areas to be used for primary care inspection. Hence, serum biomarkers can serve as a meaningful and accessible data for osteoporosis diagnosis. Methods The present study systematically elucidated the anti-osteoporosis and estrogenic activities of TPEE in ovariectomized (OVX) rats by evaluating the bone microstructure, uterus index, serum and bone biomarkers, and osteoblastic and osteoclastic gene expression. Leverage on a pool of serum biomarkers obtained from this study, recursive feature elimination with a cross-validation method (RFECV) was used to select useful biomarkers for osteoporosis prediction. Then, using the key features extracted, we employed five classification algorithms: extreme gradient boosting (XGBoost), random forest, support vector machine, artificial neural network, and decision tree to predict the bone quality in terms of T-score. Results TPEE treatments down-regulated nuclear factor kappa-B ligand, alkaline phosphatase, and up-regulated estrogen receptor β gene expression. Additionally, reduced serum C-terminal telopeptides of type 1 collagen level and improvement in the estrogen dependent characteristics of the uterus on the lining of the lumen were observed in the TPEE intervention group. Among the tested classifiers, XGBoost stood out as the best performing classification model with the highest F1-score and lowest standard deviation. Conclusions The present study demonstrates that TPEE treatment showed therapeutic benefits in the prevention of osteoporosis at the transcriptional level and maintained the estrogen dependent characteristics of the uterus. Our study revealed that, in the case of limited number of features, RFECV paired with XGBoost model could serve as a powerful tool to readily evaluate and diagnose postmenopausal osteoporosis. Supplementary Information The online version contains supplementary material available at 10.1186/s13020-022-00622-7.
Collapse
Affiliation(s)
- Yixian Quah
- College of Veterinary Medicine and Cardiovascular Research Institute, Kyungpook National University, 80 Daehak-ro, Daegu, 41566, Republic of Korea.,Reproductive and Development Toxicology Research Group, Korea Institute of Toxicology, Daejeon, Republic of Korea
| | - Jireh Chan Yi-Le
- Centre of IoT and Big Data, Universiti Tunku Abdul Rahman, 31900, Kampar, Perak, Malaysia
| | - Na-Hye Park
- Laboratory Animal Center, Daegu-Gyeongbuk Medical Innovation Foundation, Daegu, Republic of Korea
| | - Yuan Yee Lee
- College of Veterinary Medicine and Cardiovascular Research Institute, Kyungpook National University, 80 Daehak-ro, Daegu, 41566, Republic of Korea
| | - Eon-Bee Lee
- College of Veterinary Medicine and Cardiovascular Research Institute, Kyungpook National University, 80 Daehak-ro, Daegu, 41566, Republic of Korea
| | - Seung-Hee Jang
- Teazen Co. Ltd., Gyegok-myeon, Haenam-gun, Jeollanam-do, 59017, Republic of Korea
| | - Min-Jeong Kim
- Teazen Co. Ltd., Gyegok-myeon, Haenam-gun, Jeollanam-do, 59017, Republic of Korea
| | - Man Hee Rhee
- College of Veterinary Medicine and Cardiovascular Research Institute, Kyungpook National University, 80 Daehak-ro, Daegu, 41566, Republic of Korea
| | - Seung-Jin Lee
- Reproductive and Development Toxicology Research Group, Korea Institute of Toxicology, Daejeon, Republic of Korea.
| | - Seung-Chun Park
- College of Veterinary Medicine and Cardiovascular Research Institute, Kyungpook National University, 80 Daehak-ro, Daegu, 41566, Republic of Korea.
| |
Collapse
|
41
|
Antikainen AA, Heinonen M, Lähdesmäki H. Modeling binding specificities of transcription factor pairs with random forests. BMC Bioinformatics 2022; 23:212. [PMID: 35659235 PMCID: PMC9166390 DOI: 10.1186/s12859-022-04734-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 05/12/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding.
Results
We propose two random forest (RF) methods for joint TF-TF binding site prediction: and . We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed , which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as (p<0.00195). provides an approach for predicting TF-TF binding sites without prior knowledge on pairwise binding preferences. However, more research is needed to assess eligibility for practical applications.
Conclusions
Random forest is well suited for modeling pairwise TF-TF-DNA binding specificities, and provides an improvement to pairwise binding site prediction accuracy.
Collapse
|
42
|
Provable Boolean interaction recovery from tree ensemble obtained via random forests. Proc Natl Acad Sci U S A 2022; 119:e2118636119. [PMID: 35609192 PMCID: PMC9295780 DOI: 10.1073/pnas.2118636119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
SignificanceRandom Forests (RFs) are among the most successful machine-learning algorithms in terms of prediction accuracy. In many domain problems, however, the primary goal is not prediction, but to understand the data-generation process-in particular, finding important features and feature interactions. There exists strong empirical evidence that RF-based methods-in particular, iterative RF (iRF)-are very successful in terms of detecting feature interactions. In this work, we propose a biologically motivated, Boolean interaction model. Using this model, we complement the existing empirical evidence with theoretical evidence for the ability of iRF-type methods to select desirable interactions. Our theoretical analysis also yields deeper insights into the general interaction selection mechanism of decision-tree algorithms and the importance of feature subsampling.
Collapse
|
43
|
Petrosyan Y, Mesana TG, Sun LY. Prediction of acute kidney injury risk after cardiac surgery: using a hybrid machine learning algorithm. BMC Med Inform Decis Mak 2022; 22:137. [PMID: 35585624 PMCID: PMC9118758 DOI: 10.1186/s12911-022-01859-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 04/20/2022] [Indexed: 11/17/2022] Open
Abstract
Background Acute kidney injury (AKI) is a serious complication after cardiac surgery. We derived and internally validated a Machine Learning preoperative model to predict cardiac surgery-associated AKI of any severity and compared its performance with parametric statistical models. Methods We conducted a retrospective study of adult patients who underwent major cardiac surgery requiring cardiopulmonary bypass between November 1st, 2009 and March 31st, 2015. AKI was defined according to the KDIGO criteria as stage 1 or greater, within 7 days of surgery. We randomly split the cohort into derivation and validation datasets. We developed three AKI risk models: (1) a hybrid machine learning (ML) algorithm, using Random Forests for variable selection, followed by high performance logistic regression; (2) a traditional logistic regression model and (3) an enhanced logistic regression model with 500 bootstraps, with backward variable selection. For each model, we assigned risk scores to each of the retained covariate and assessed model discrimination (C statistic) and calibration (Hosmer–Lemeshow goodness-of-fit test) in the validation datasets. Results Of 6522 included patients, 1760 (27.0%) developed AKI. The best performance was achieved by the hybrid ML algorithm to predict AKI of any severity. The ML and enhanced statistical models remained robust after internal validation (C statistic = 0.75; Hosmer–Lemeshow p = 0.804, and AUC = 0.74, Hosmer–Lemeshow p = 0.347, respectively). Conclusions We demonstrated that a hybrid ML model provides higher accuracy without sacrificing parsimony, computational efficiency, or interpretability, when compared with parametric statistical models. This score-based model can easily be used at the bedside to identify high-risk patients who may benefit from intensive perioperative monitoring and personalized management strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01859-w.
Collapse
Affiliation(s)
- Yelena Petrosyan
- Cardiocore Big Data Research Unit, University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, ON, K1Y 4W7, Canada
| | - Thierry G Mesana
- Cardiocore Big Data Research Unit, University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, ON, K1Y 4W7, Canada
| | - Louise Y Sun
- Cardiocore Big Data Research Unit, University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, ON, K1Y 4W7, Canada. .,Division of Cardiac Anesthesiology, University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, ON, K1Y 4W7, Canada. .,School of Epidemiology and Public Health, University of Ottawa, 600 Peter Morand Cres, Ottawa, ON, K1G 5Z3, Canada.
| |
Collapse
|
44
|
Vignoli A, Tenori L, Luchinat C. An omics approach to study trace metals in sera of hemodialysis patients treated with erythropoiesis stimulating agents. Metallomics 2022; 14:6572376. [PMID: 35451491 DOI: 10.1093/mtomcs/mfac028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/20/2022] [Indexed: 11/12/2022]
Abstract
Hemodialysis (HD) represents a life-sustaining treatment for patients with end stage renal disease. However, it is associated with several complications, including anemia. Erythropoiesis stimulating agents (ESA) are often administered to HD patients with renal anemia, but a relevant proportion of them fail to respond to the therapy. Since trace metals are involved in several biological processes and their blood levels can be altered by hemodialysis, we study the possible association between serum trace metal concentrations and ratios with the administration and response to ESA. For this study, data and sample information of 110 HD patients were downloaded from the UC San-Diego Metabolomics Workbench public repository (PR000565). The blood serum levels (and ratios) of antimony, cadmium, copper, manganese, molybdenum, nickel, selenium, tin and zinc were studied applying an omics statistical approach. The Random Forest model was able to discriminate HD dependent patients treated and not treated with ESA, with an accuracy of 71.7% (95% CI 71.5-71.9%). Logistic regression analysis identifies alterations of Mn, Mo, Cd, Sn, and several of their ratios as characteristic of patients treated with ESA. Moreover, patients with scarce response to ESA showed to be characterized by reduced Mn to Ni and Mn to Sb ratios. In conclusion, our results show that trace metals, in particular manganese, play a role in the mechanisms underlying human response to ESA, and if further confirmed, the re-equilibration of their physiological levels could contribute to a better management of HD patients hopefully reducing their morbidity and mortality.
Collapse
Affiliation(s)
- Alessia Vignoli
- Magnetic Resonance Center (CERM) and Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, 50019, Italy.,Consorzio Interuniversitario Risonanze Magnetiche MetalloProteine (CIRMMP), Sesto Fiorentino, 50019, Italy
| | - Leonardo Tenori
- Magnetic Resonance Center (CERM) and Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, 50019, Italy.,Consorzio Interuniversitario Risonanze Magnetiche MetalloProteine (CIRMMP), Sesto Fiorentino, 50019, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM) and Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, 50019, Italy.,Consorzio Interuniversitario Risonanze Magnetiche MetalloProteine (CIRMMP), Sesto Fiorentino, 50019, Italy
| |
Collapse
|
45
|
Gadot R, Anand A, Lovin BD, Sweeney AD, Patel AJ. Predicting surgical decision-making in vestibular schwannoma using tree-based machine learning. Neurosurg Focus 2022; 52:E8. [DOI: 10.3171/2022.1.focus21708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/19/2022] [Indexed: 11/06/2022]
Abstract
OBJECTIVE
Vestibular schwannomas (VSs) are the most common neoplasm of the cerebellopontine angle in adults. Though these lesions are generally slow growing, their growth patterns and associated symptoms can be unpredictable, which may complicate the decision to pursue conservative management versus active intervention. Additionally, surgical decision-making can be controversial because of limited high-quality evidence and multiple quality-of-life considerations. Machine learning (ML) is a powerful tool that utilizes data sets to essentialize multidimensional clinical processes. In this study, the authors trained multiple tree-based ML algorithms to predict the decision for active treatment versus MRI surveillance of VS in a single institutional cohort. In doing so, they sought to assess which preoperative variables carried the most weight in driving the decision for intervention and could be used to guide future surgical decision-making through an evidence-based approach.
METHODS
The authors reviewed the records of patients who had undergone evaluation by neurosurgery and otolaryngology with subsequent active treatment (resection or radiation) for unilateral VS in the period from 2009 to 2021, as well as those of patients who had been evaluated for VS and were managed conservatively throughout 2021. Clinical presentation, radiographic data, and management plans were abstracted from each patient record from the time of first evaluation until the last follow-up or surgery. Each encounter with the patient was treated as an instance involving a management decision that depended on demographics, symptoms, and tumor profile. Decision tree and random forest classifiers were trained and tested to predict the decision for treatment versus imaging surveillance on the basis of unseen data using an 80/20 pseudorandom split. Predictor variables were tuned to maximize performance based on lowest Gini impurity indices. Model performance was optimized using fivefold cross-validation.
RESULTS
One hundred twenty-four patients with 198 rendered decisions concerning management were included in the study. In the decision tree analysis, only a maximum tumor dimension threshold of 1.6 cm and progressive symptoms were required to predict the decision for treatment with 85% accuracy. Optimizing maximum dimension thresholds and including age at presentation boosted accuracy to 88%. Random forest analysis (n = 500 trees) predicted the decision for treatment with 80% accuracy. Factors with the highest variable importance based on multiple measures of importance, including mean minimal conditional depth and largest Gini impurity reduction, were maximum tumor dimension, age at presentation, Koos grade, and progressive symptoms at presentation.
CONCLUSIONS
Tree-based ML was used to predict which factors drive the decision for active treatment of VS with 80%–88% accuracy. The most important factors were maximum tumor dimension, age at presentation, Koos grade, and progressive symptoms. These results can assist in surgical decision-making and patient counseling. They also demonstrate the power of ML algorithms in extracting useful insights from limited data sets.
Collapse
Affiliation(s)
- Ron Gadot
- Department of Neurosurgery, Baylor College of Medicine
| | - Adrish Anand
- Department of Neurosurgery, Baylor College of Medicine
| | - Benjamin D. Lovin
- Department of Otolaryngology-Head and Neck Surgery, Baylor College of Medicine, Houston; and
| | - Alex D. Sweeney
- Department of Otolaryngology-Head and Neck Surgery, Baylor College of Medicine, Houston; and
| | - Akash J. Patel
- Department of Neurosurgery, Baylor College of Medicine
- Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital, Houston, Texas
| |
Collapse
|
46
|
Tarimo CS, Bhuyan SS, Zhao Y, Ren W, Mohammed A, Li Q, Gardner M, Mahande MJ, Wang Y, Wu J. Prediction of low Apgar score at five minutes following labor induction intervention in vaginal deliveries: machine learning approach for imbalanced data at a tertiary hospital in North Tanzania. BMC Pregnancy Childbirth 2022; 22:275. [PMID: 35365129 PMCID: PMC8976377 DOI: 10.1186/s12884-022-04534-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open
Abstract
Background Prediction of low Apgar score for vaginal deliveries following labor induction intervention is critical for improving neonatal health outcomes. We set out to investigate important attributes and train popular machine learning (ML) algorithms to correctly classify neonates with a low Apgar scores from an imbalanced learning perspective. Methods We analyzed 7716 induced vaginal deliveries from the electronic birth registry of the Kilimanjaro Christian Medical Centre (KCMC). 733 (9.5%) of which constituted of low (< 7) Apgar score neonates. The ‘extra-tree classifier’ was used to assess features’ importance. We used Area Under Curve (AUC), recall, precision, F-score, Matthews Correlation Coefficient (MCC), balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK) to evaluate the performance of the selected six (6) machine learning classifiers. To address class imbalances, we examined three widely used resampling techniques: the Synthetic Minority Oversampling Technique (SMOTE) and Random Oversampling Examples (ROS) and Random undersampling techniques (RUS). We applied Decision Curve Analysis (DCA) to evaluate the net benefit of the selected classifiers. Results Birth weight, maternal age, and gestational age were found to be important predictors for the low Apgar score following induced vaginal delivery. SMOTE, ROS and and RUS techniques were more effective at improving “recalls” among other metrics in all the models under investigation. A slight improvement was observed in the F1 score, BA, and BM. DCA revealed potential benefits of applying Boosting method for predicting low Apgar scores among the tested models. Conclusion There is an opportunity for more algorithms to be tested to come up with theoretical guidance on more effective rebalancing techniques suitable for this particular imbalanced ratio. Future research should prioritize a debate on which performance indicators to look up to when dealing with imbalanced or skewed data. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-022-04534-0.
Collapse
Affiliation(s)
- Clifford Silver Tarimo
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China.,Department of Science and Laboratory Technology, Dar es Salaam Institute of Technology, P.O. Box 2958, Dar es Salaam, Tanzania
| | - Soumitra S Bhuyan
- Rutgers University-New Brunswick, Edward J. Bloustein, School of Planning and Public Policy, New Brunswick, USA
| | - Yizhen Zhao
- Luoyang Orthopedic Traumatological Hospital of Henan Province, Luoyang, China
| | - Weicun Ren
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China.,College of Sanquan, Xinxiang Medical University, Xinxiang, People's Republic of China
| | - Akram Mohammed
- Center for Biomedical Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Quanman Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China
| | - Marilyn Gardner
- Department of Public Health, Western Kentucky University, 1906 College Heights Blvd, Bowling Green, KY, 42101, USA
| | - Michael Johnson Mahande
- Institute of Public Health, Kilimanjaro Christian Medical University College, P.O. Box 2240, Moshi, Tanzania
| | - Yuhui Wang
- Centre for Financial and Corporate Integrity, Coventry University, Coventry, UK
| | - Jian Wu
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China. .,Henan Province Engineering Research Center of Health Economics & Health Technology Assessment, Henan Province, China.
| |
Collapse
|
47
|
Rudar J, Porter TM, Wright M, Golding GB, Hajibabaei M. LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data. BMC Bioinformatics 2022; 23:110. [PMID: 35361114 PMCID: PMC8969335 DOI: 10.1186/s12859-022-04631-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
Background Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery. Results We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada’s Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark’s generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (p ≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries. Conclusions Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04631-z.
Collapse
Affiliation(s)
- Josip Rudar
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| | - Teresita M Porter
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Michael Wright
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, 1280 Main St. West, Hamilton, ON, L8S 4K1, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| |
Collapse
|
48
|
An Integrated Taxonomic Approach Points towards a Single-Species Hypothesis for Santolina (Asteraceae) in Corsica and Sardinia. BIOLOGY 2022; 11:biology11030356. [PMID: 35336730 PMCID: PMC8945001 DOI: 10.3390/biology11030356] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/26/2022] [Accepted: 02/22/2022] [Indexed: 12/04/2022]
Abstract
Simple Summary Systematics is the branch of biology that studies the relationships among organisms and their evolution, while taxonomy is the science of classification. In this work, a systematic and taxonomic investigation about three plant species of Santolina, commonly known as lavender-cotton, is presented. Two of these species occur exclusively in Corsica and Sardinia, two of the main islands of the Mediterranean Sea, while a third one is a common ornamental plant, known only as cultivated. By integrating several approaches, we find out that the two putative species from Corsica and Sardinia are actually very similar from many points of view. A two-species hypothesis is no longer supported according to our results, so that these plants should be reclassified as a single species. This study demonstrates the importance of integrating different sources of information to produce reliable classifications (i.e. taxonomic hypotheses). In addition, our study is useful to better understand plant evolution in the context of the Mediterranean Basin, one of the world’s biodiversity hotspots. Abstract Santolina is a plant genus of dwarf aromatic shrubs that includes about 26 species native to the western Mediterranean Basin. In Corsica and Sardinia, two of the main islands of the Mediterranean, Santolina corsica (tetraploid) and S. insularis (hexaploid) are reported. Along with the cultivated pentaploid S. chamaecyparissus, these species form a group of taxa that is hard to distinguish only by morphology. Molecular (using ITS, trnH-psbA, trnL-trnF, trnQ-rps16, rps15-ycf1, psbM-trnD, and trnS-trnG), cypsela morpho-colorimetric, morphometric, and niche similarity analyses were conducted to investigate the diversity of plants belonging to this species group. Our results confute the current taxonomic hypothesis and suggest considering S. corsica and S. insularis as a single species. Moreover, molecular and morphometric results highlight the strong affinity between S. chamaecyparissus and the Santolina populations endemic to Corsica and Sardinia. Finally, the populations from south-western Sardinia, due to their high differentiation in the studied plastid markers and the different climatic niche with respect to all the other populations, could be considered as an evolutionary significant unit.
Collapse
|
49
|
Construction of a Diagnostic Model for Lymph Node Metastasis of the Papillary Thyroid Carcinoma Using Preoperative Ultrasound Features and Imaging Omics. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:1872412. [PMID: 35178222 PMCID: PMC8846989 DOI: 10.1155/2022/1872412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 12/14/2021] [Accepted: 01/07/2022] [Indexed: 11/17/2022]
Abstract
In this paper, we mainly adopted 337 patients who had undergone the surgery on lymph node metastasis of papillary thyroid carcinoma (PTC) as the sample population. In order to provide clinical reference for the intelligent decision-making in treatment plan and improvement of prognosis, we utilized ultrasound features and imaging features to construct five early diagnosis models for patients based on the ultrasound features, imaging features, and combined features. The model integrated with broad learning system (BLS) showed the best performance, with the area under the curve (AUC) of 0.857 (95% confidence interval (CI): 0.811–0.902)) and the accuracy of 0.805 (95% CI: 0.759–0.850). For demographic and clinical features, the prediction effect was also good, with the AUC more than 0.700.
Collapse
|
50
|
Künzel SR, Saarinen TF, Liu EW, Sekhon JS. Linear Aggregation in Tree-based Estimators. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2026780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Sören R. Künzel
- Department of Statistics, University of California, Berkeley
| | | | | | | |
Collapse
|