Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SAFT. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 2012;14:315-26. [PMID: 22786785 PMCID: PMC3659301 DOI: 10.1093/bib/bbs034] [Citation(s) in RCA: 204] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

For:	Touw WG, Bayjanov JR, Overmars L, Backus L, Boekhorst J, Wels M, van Hijum SAFT. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? Brief Bioinform 2012;14:315-26. [PMID: 22786785 PMCID: PMC3659301 DOI: 10.1093/bib/bbs034] [Citation(s) in RCA: 204] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Number

Cited by Other Article(s)

Hasegawa N, Sugiyama M, Igarashi K. Random forest machine-learning algorithm classifies white- and brown-rot fungi according to the number of the genes encoding Carbohydrate-Active enZyme families. Appl Environ Microbiol 2024;90:e0048224. [PMID: 38832775 PMCID: PMC11267879 DOI: 10.1128/aem.00482-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/04/2024] [Indexed: 06/05/2024] Open

Levin MA, Kia A, Timsina P, Cheng FY, Nguyen KAN, Kohli-Seth R, Lin HM, Ouyang Y, Freeman R, Reich DL. Real-Time Machine Learning Alerts to Prevent Escalation of Care: A Nonrandomized Clustered Pragmatic Clinical Trial. Crit Care Med 2024;52:1007-1020. [PMID: 38380992 DOI: 10.1097/ccm.0000000000006243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]

Abstract

OBJECTIVES

Machine learning algorithms can outperform older methods in predicting clinical deterioration, but rigorous prospective data on their real-world efficacy are limited. We hypothesized that real-time machine learning generated alerts sent directly to front-line providers would reduce escalations.

DESIGN

Single-center prospective pragmatic nonrandomized clustered clinical trial.

SETTING

Academic tertiary care medical center.

PATIENTS

Adult patients admitted to four medical-surgical units. Assignment to intervention or control arms was determined by initial unit admission.

INTERVENTIONS

Real-time alerts stratified according to predicted likelihood of deterioration sent either to the primary team or directly to the rapid response team (RRT). Clinical care and interventions were at the providers' discretion. For the control units, alerts were generated but not sent, and standard RRT activation criteria were used.

MEASUREMENTS AND MAIN RESULTS

The primary outcome was the rate of escalation per 1000 patient bed days. Secondary outcomes included the frequency of orders for fluids, medications, and diagnostic tests, and combined in-hospital and 30-day mortality. Propensity score modeling with stabilized inverse probability of treatment weight (IPTW) was used to account for differences between groups. Data from 2740 patients enrolled between July 2019 and March 2020 were analyzed (1488 intervention, 1252 control). Average age was 66.3 years and 1428 participants (52%) were female. The rate of escalation was 12.3 vs. 11.3 per 1000 patient bed days (difference, 1.0; 95% CI, -2.8 to 4.7) and IPTW adjusted incidence rate ratio 1.43 (95% CI, 1.16-1.78; p < 0.001). Patients in the intervention group were more likely to receive cardiovascular medication orders (16.1% vs. 11.3%; 4.7%; 95% CI, 2.1-7.4%) and IPTW adjusted relative risk (RR) (1.74; 95% CI, 1.39-2.18; p < 0.001). Combined in-hospital and 30-day-mortality was lower in the intervention group (7% vs. 9.3%; -2.4%; 95% CI, -4.5% to -0.2%) and IPTW adjusted RR (0.76; 95% CI, 0.58-0.99; p = 0.045).

CONCLUSIONS

Real-time machine learning alerts do not reduce the rate of escalation but may reduce mortality.

Collapse

Lee CS, Lin CR, Chua HH, Wu JF, Chang KC, Ni YH, Chang MH, Chen HL. Gut Bifidobacterium longum is associated with better native liver survival in patients with biliary atresia. JHEP Rep 2024;6:101090. [PMID: 39006502 PMCID: PMC11246047 DOI: 10.1016/j.jhepr.2024.101090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 03/08/2024] [Accepted: 04/03/2024] [Indexed: 07/16/2024] Open

Abstract

Background & Aims

The gut microbiome plays an important role in liver diseases, but its specific impact on biliary atresia (BA) remains to be explored. We aimed to investigate the microbial signature in the early life of patients with BA and to analyze its influence on long-term outcomes.

Methods

Fecal samples (n = 42) were collected from infants with BA before and after Kasai portoenterostomy (KPE). The stool microbiota was analyzed using 16S rRNA next-generation sequencing and compared with that of age-matched healthy controls (HCs). Shotgun metagenomic sequencing analysis was employed to confirm the bacterial composition in 10 fecal samples before KPE. The correlation of the microbiome signature with liver function and long-term outcomes was assessed.

Results

In the 16S rRNA next-generation sequencing analysis of fecal microbiota, the alpha and beta diversity analyses revealed significant differences between HCs and patients with BA before and after KPE. The difference in microbial composition analyzed by linear discriminant analysis and random forest classification revealed that the abundance of Bifidobacterium longum (B. longum) was significantly lower in patients before and after KPE than in HCs. The abundance of B. longum was negatively correlated with the gamma-glutamyltransferase level after KPE (p <0.05). Patients with early detectable B. longum had significantly lower total and direct bilirubin 3 months after KPE (p <0.005) and had a significantly lower liver transplantation rate (hazard ratio: 0.16, 95% CI 0.03-0.83, p = 0.029). Shotgun metagenomic sequencing also revealed that patients with BA and detectable B. longum had reduced total and direct bilirubin after KPE.

Conclusion

The gut microbiome of patients with BA differed from that of HCs, with a notable abundance of B. longum in early infancy correlating with better long-term outcomes.

Impact and implications

Bifidobacterium longum (B. longum) is a beneficial bacterium commonly found in the human gut. It has been studied for its potential impacts on various health conditions. In patients with biliary atresia, we found that a greater abundance of B. longum in the fecal microbiome is associated with improved clinical outcomes. This suggests that early colonization and increasing B. longum levels in the gut could be a therapeutic strategy to improve the prognosis of patients with biliary atresia.

Collapse

Jiam ML, Xin KZ, Ha PK, Jiam NT. A supervised machine learning model for identifying predictive factors for recommending head and neck cancer surgery. Head Neck 2024;46:1001-1008. [PMID: 38344931 DOI: 10.1002/hed.27674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/08/2024] [Accepted: 01/23/2024] [Indexed: 04/10/2024] Open

Manrique PD, Leus IV, López CA, Mehla J, Malloci G, Gervasoni S, Vargiu AV, Kinthada RK, Herndon L, Hengartner NW, Walker JK, Rybenkov VV, Ruggerone P, Zgurskaya HI, Gnanakaran S. Predicting permeation of compounds across the outer membrane of P. aeruginosa using molecular descriptors. Commun Chem 2024;7:84. [PMID: 38609430 PMCID: PMC11015012 DOI: 10.1038/s42004-024-01161-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 03/27/2024] [Indexed: 04/14/2024] Open

Rudar J, Kruczkiewicz P, Vernygora O, Golding GB, Hajibabaei M, Lung O. Sequence signatures within the genome of SARS-CoV-2 can be used to predict host source. Microbiol Spectr 2024;12:e0358423. [PMID: 38436242 PMCID: PMC10986507 DOI: 10.1128/spectrum.03584-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 02/11/2024] [Indexed: 03/05/2024] Open

Abstract

We conducted an in silico analysis to better understand the potential factors impacting host adaptation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in white-tailed deer, humans, and mink due to the strong evidence of sustained transmission within these hosts. Classification models trained on single nucleotide and amino acid differences between samples effectively identified white-tailed deer-, human-, and mink-derived SARS-CoV-2. For example, the balanced accuracy score of Extremely Randomized Trees classifiers was 0.984 ± 0.006. Eighty-eight commonly identified predictive mutations are found at sites under strong positive and negative selective pressure. A large fraction of sites under selection (86.9%) or identified by machine learning (87.1%) are found in genes other than the spike. Some locations encoded by these gene regions are predicted to be B- and T-cell epitopes or are implicated in modulating the immune response suggesting that host adaptation may involve the evasion of the host immune system, modulation of the class-I major-histocompatibility complex, and the diminished recognition of immune epitopes by CD8+ T cells. Our selection and machine learning analysis also identified that silent mutations, such as C7303T and C9430T, play an important role in discriminating deer-derived samples across multiple clades. Finally, our investigation into the origin of the B.1.641 lineage from white-tailed deer in Canada discovered an additional human sequence from Michigan related to the B.1.641 lineage sampled near the emergence of this lineage. These findings demonstrate that machine-learning approaches can be used in combination with evolutionary genomics to identify factors possibly involved in the cross-species transmission of viruses and the emergence of novel viral lineages.IMPORTANCESevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible virus capable of infecting and establishing itself in human and wildlife populations, such as white-tailed deer. This fact highlights the importance of developing novel ways to identify genetic factors that contribute to its spread and adaptation to new host species. This is especially important since these populations can serve as reservoirs that potentially facilitate the re-introduction of new variants into human populations. In this study, we apply machine learning and phylogenetic methods to uncover biomarkers of SARS-CoV-2 adaptation in mink and white-tailed deer. We find evidence demonstrating that both non-synonymous and silent mutations can be used to differentiate animal-derived sequences from human-derived ones and each other. This evidence also suggests that host adaptation involves the evasion of the immune system and the suppression of antigen presentation. Finally, the methods developed here are general and can be used to investigate host adaptation in viruses other than SARS-CoV-2.

Collapse

Wu G, Zaker A, Ebrahimi A, Tripathi S, Mer AS. Text-mining-based feature selection for anticancer drug response prediction. BIOINFORMATICS ADVANCES 2024;4:vbae047. [PMID: 38606185 PMCID: PMC11009020 DOI: 10.1093/bioadv/vbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 04/13/2024]

Andorra M, Freire A, Zubizarreta I, de Rosbo NK, Bos SD, Rinas M, Høgestøl EA, de Rodez Benavent SA, Berge T, Brune-Ingebretse S, Ivaldi F, Cellerino M, Pardini M, Vila G, Pulido-Valdeolivas I, Martinez-Lapiscina EH, Llufriu S, Saiz A, Blanco Y, Martinez-Heras E, Solana E, Bäcker-Koduah P, Behrens J, Kuchling J, Asseyer S, Scheel M, Chien C, Zimmermann H, Motamedi S, Kauer-Bonin J, Brandt A, Saez-Rodriguez J, Alexopoulos LG, Paul F, Harbo HF, Shams H, Oksenberg J, Uccelli A, Baeza-Yates R, Villoslada P. Predicting disease severity in multiple sclerosis using multimodal data and machine learning. J Neurol 2024;271:1133-1149. [PMID: 38133801 PMCID: PMC10896787 DOI: 10.1007/s00415-023-12132-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 10/28/2023] [Accepted: 11/22/2023] [Indexed: 12/23/2023]

Affiliation(s)

Magi Andorra Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Ana Freire School of Management, Pompeu Fabra University, Barcelona, Spain UPF Barcelona School of Management, Balmes 132, 08008, Barcelona, Spain
Irati Zubizarreta Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Nicole Kerlero de Rosbo Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy IRCCS Ospedale Policlinico San Martino, Genoa, Italy
Steffan D Bos University of Oslo, Oslo, Norway Oslo University Hospital, Oslo, Norway
Melanie Rinas Institute for Computational Biomedicine, Heidelberg University Hospital, and Heidelberg University, Heidelberg, Germany
Einar A Høgestøl University of Oslo, Oslo, Norway Oslo University Hospital, Oslo, Norway
Sigrid A de Rodez Benavent University of Oslo, Oslo, Norway Oslo University Hospital, Oslo, Norway
Tone Berge Oslo University Hospital, Oslo, Norway Oslo Metropolitan University, Oslo, Norway
Synne Brune-Ingebretse University of Oslo, Oslo, Norway Oslo University Hospital, Oslo, Norway
Federico Ivaldi Department of Internal Medicine, University of Genoa, Genoa, Italy
Maria Cellerino Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy
Matteo Pardini Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy IRCCS Ospedale Policlinico San Martino, Genoa, Italy
Gemma Vila Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Irene Pulido-Valdeolivas Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Elena H Martinez-Lapiscina Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Sara Llufriu Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Albert Saiz Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Yolanda Blanco Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Eloy Martinez-Heras Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Elisabeth Solana Institut d'Investigacions Biomediques August Pi Sunyer (IDIBAPS) and Hospital Clinic Barcelona, Barcelona, Spain
Priscilla Bäcker-Koduah Charité Universitaetsmedizin Berlin, Berlin, Germany
Janina Behrens Charité Universitaetsmedizin Berlin, Berlin, Germany
Joseph Kuchling Charité Universitaetsmedizin Berlin, Berlin, Germany
Susanna Asseyer Charité Universitaetsmedizin Berlin, Berlin, Germany Max Delbrueck Center for Molecular Medicine, Berlin, Germany
Michael Scheel Charité Universitaetsmedizin Berlin, Berlin, Germany
Claudia Chien Charité Universitaetsmedizin Berlin, Berlin, Germany Max Delbrueck Center for Molecular Medicine, Berlin, Germany
Hanna Zimmermann Charité Universitaetsmedizin Berlin, Berlin, Germany Max Delbrueck Center for Molecular Medicine, Berlin, Germany
Seyedamirhosein Motamedi Charité Universitaetsmedizin Berlin, Berlin, Germany
Josef Kauer-Bonin Charité Universitaetsmedizin Berlin, Berlin, Germany
Alex Brandt Charité Universitaetsmedizin Berlin, Berlin, Germany
Julio Saez-Rodriguez Institute for Computational Biomedicine, Heidelberg University Hospital, and Heidelberg University, Heidelberg, Germany
Leonidas G Alexopoulos ProtATonce Ltd, Athens, Greece School of Mechanical Engineering, National Technical University of Athens, Zografou, Greece
Friedemann Paul Charité Universitaetsmedizin Berlin, Berlin, Germany Max Delbrueck Center for Molecular Medicine, Berlin, Germany
Hanne F Harbo University of Oslo, Oslo, Norway Oslo University Hospital, Oslo, Norway
Hengameh Shams Department of Neurology, University of California, San Francisco, USA
Jorge Oksenberg Department of Neurology, University of California, San Francisco, USA
Antonio Uccelli Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa, Italy IRCCS Ospedale Policlinico San Martino, Genoa, Italy
Ricardo Baeza-Yates School of Engineering, Pompeu Fabra University, Barcelona, Spain
Pablo Villoslada Department of Medicine and Life Sciences, Pompeu Fabra University, Barcelona, Spain. Hospital del Mar Research Institute, Barcelona, Spain.

Collapse

Torigoe T, Takahashi M, Heravizadeh O, Ikeda K, Nakatani K, Bamba T, Izumi Y. Predicting Retention Time in Unified-Hydrophilic-Interaction/Anion-Exchange Liquid Chromatography High-Resolution Tandem Mass Spectrometry (Unified-HILIC/AEX/HRMS/MS) for Comprehensive Structural Annotation of Polar Metabolome. Anal Chem 2024;96:1275-1283. [PMID: 38186224 DOI: 10.1021/acs.analchem.3c04618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]

Affiliation(s)

Taihei Torigoe Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
Masatomo Takahashi Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
Omidreza Heravizadeh Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
Kazuki Ikeda Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
Kohta Nakatani Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
Takeshi Bamba Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
Yoshihiro Izumi Department of Systems Life Sciences, Graduate School of Systems Life Sciences, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan Division of Metabolomics/Mass Spectrometry Center, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan

Collapse

Ke TM, Lophatananon A, Muir KR. An Integrative Pancreatic Cancer Risk Prediction Model in the UK Biobank. Biomedicines 2023;11:3206. [PMID: 38137427 PMCID: PMC10740416 DOI: 10.3390/biomedicines11123206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 11/20/2023] [Accepted: 11/26/2023] [Indexed: 12/24/2023] Open

Nguyen LN, Le TH, Nguyen LQ, Tran VQ. Machine learning approaches for predicting Cracking Tolerance Index (CTIndex) of asphalt concrete containing reclaimed asphalt pavement. PLoS One 2023;18:e0287255. [PMID: 37883340 PMCID: PMC10602248 DOI: 10.1371/journal.pone.0287255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 06/01/2023] [Indexed: 10/28/2023] Open

Nguyen QTN, Nguyen P, Wang C, Phuc PT, Lin R, Hung C, Kuo N, Cheng Y, Lin S, Hsieh Z, Cheng C, Hsu M, Hsu JC. Machine learning approaches for predicting 5-year breast cancer survival: A multicenter study. Cancer Sci 2023;114:4063-4072. [PMID: 37489252 PMCID: PMC10551582 DOI: 10.1111/cas.15917] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 06/27/2023] [Accepted: 07/05/2023] [Indexed: 07/26/2023] Open

Affiliation(s)

Quynh Thi Nhu Nguyen School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
Phung‐Anh Nguyen Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan Clinical Big Data Research CenterTaipei Medical University Hospital, Taipei Medical UniversityTaipei CityTaiwan Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
Chun‐Jung Wang School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
Phan Thanh Phuc Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
Ruo‐Kai Lin School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
Chin‐Sheng Hung Department of Surgery, School of Medicine, College of MedicineTaipei Medical UniversityTaipei CityTaiwan
Nei‐Hui Kuo Oncology CenterTaipei Medical University HospitalTaipei CityTaiwan
Yu‐Wen Cheng School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
Shwu‐Jiuan Lin School of Pharmacy, College of PharmacyTaipei Medical UniversityTaipei CityTaiwan
Zong‐You Hsieh Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
Chi‐Tsun Cheng Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
Min‐Huei Hsu Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan Graduate Institute of Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan
Jason C. Hsu Clinical Data Center, Office of Data ScienceTaipei Medical UniversityTaipei CityTaiwan Clinical Big Data Research CenterTaipei Medical University Hospital, Taipei Medical UniversityTaipei CityTaiwan Research Center of Health Care Industry Data Science, College of ManagementTaipei Medical UniversityTaipei CityTaiwan International Ph.D. Program in Biotech and Healthcare Management, College of ManagementTaipei Medical UniversityTaipei CityTaiwan

Collapse

Fradera-Soler M, Mravec J, Harholt J, Grace OM, Jørgensen B. Cell wall polysaccharide and glycoprotein content tracks growth-form diversity and an aridity gradient in the leaf-succulent genus Crassula. PHYSIOLOGIA PLANTARUM 2023;175:e14007. [PMID: 37882271 DOI: 10.1111/ppl.14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 06/22/2023] [Accepted: 08/14/2023] [Indexed: 10/27/2023]

Veeramani A, Zhang AS, Blackburn AZ, Etzel CM, DiSilvestro KJ, McDonald CL, Daniels AH. An Artificial Intelligence Approach to Predicting Unplanned Intubation Following Anterior Cervical Discectomy and Fusion. Global Spine J 2023;13:1849-1855. [PMID: 35132907 PMCID: PMC10556901 DOI: 10.1177/21925682211053593] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Fife DA, D'Onofrio J. Common, uncommon, and novel applications of random forest in psychological research. Behav Res Methods 2023;55:2447-2466. [PMID: 35915361 DOI: 10.3758/s13428-022-01901-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2022] [Indexed: 01/08/2023]

Patro A, Perkins EL, Ortega CA, Lindquist NR, Dawant BM, Gifford R, Haynes DS, Chowdhury N. Machine Learning Approach for Screening Cochlear Implant Candidates: Comparing With the 60/60 Guideline. Otol Neurotol 2023;44:e486-e491. [PMID: 37400135 PMCID: PMC10524241 DOI: 10.1097/mao.0000000000003927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]

Abstract

OBJECTIVE

To develop a machine learning-based referral guideline for patients undergoing cochlear implant candidacy evaluation (CICE) and to compare with the widely used 60/60 guideline.

STUDY DESIGN

Retrospective cohort.

SETTING

Tertiary referral center.

PATIENTS

772 adults undergoing CICE from 2015 to 2020.

INTERVENTIONS

Variables included demographics, unaided thresholds, and word recognition score. A random forest classification model was trained on patients undergoing CICE, and bootstrap cross-validation was used to assess the modeling approach's performance.

MAIN OUTCOME MEASURES

The machine learning-based referral tool was evaluated against the 60/60 guideline based on ability to identify CI candidates under traditional and expanded criteria.

RESULTS

Of 587 patients with complete data, 563 (96%) met candidacy at our center, and the 60/60 guideline identified 512 (87%) patients. In the random forest model, word recognition score; thresholds at 3000, 2000, and 125; and age at CICE had the largest impact on candidacy (mean decrease in Gini coefficient, 2.83, 1.60, 1.20, 1.17, and 1.16, respectively). The 60/60 guideline had a sensitivity of 0.91, a specificity of 0.42, and an accuracy of 0.89 (95% confidence interval, 0.86-0.91). The random forest model obtained higher sensitivity (0.96), specificity (1.00), and accuracy (0.96; 95% confidence interval, 0.95-0.98). Across 1,000 bootstrapped iterations, the model yielded a median sensitivity of 0.92 (interquartile range [IQR], 0.85-0.98), specificity of 1.00 (IQR, 0.88-1.00), accuracy of 0.93 (IQR, 0.85-0.97), and area under the curve of 0.96 (IQR, 0.93-0.98).

CONCLUSIONS

A novel machine learning-based screening model is highly sensitive, specific, and accurate in predicting CI candidacy. Bootstrapping confirmed that this approach is potentially generalizable with consistent results.

Collapse

Tian L, Wu W, Yu T. Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features. Biomolecules 2023;13:1153. [PMID: 37509188 PMCID: PMC10377046 DOI: 10.3390/biom13071153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023] Open

Ribeiro C, Farmer CK, de Magalhães JP, Freitas AA. Predicting lifespan-extending chemical compounds for C. elegans with machine learning and biologically interpretable features. Aging (Albany NY) 2023;15:6073-6099. [PMID: 37450404 PMCID: PMC10373959 DOI: 10.18632/aging.204866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 06/19/2023] [Indexed: 07/18/2023]

Mulder FAA, Tenori L, Licari C, Luchinat C. Practical considerations for rapid and quantitative NMR-based metabolomics. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2023;352:107462. [PMID: 37141802 DOI: 10.1016/j.jmr.2023.107462] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/23/2023] [Accepted: 04/21/2023] [Indexed: 05/06/2023]

Kwak S, Lee HJ, Kim S, Park JB, Lee SP, Kim HK, Kim YJ. Machine learning reveals sex-specific associations between cardiovascular risk factors and incident atherosclerotic cardiovascular disease. Sci Rep 2023;13:9364. [PMID: 37291421 PMCID: PMC10250402 DOI: 10.1038/s41598-023-36450-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 06/03/2023] [Indexed: 06/10/2023] Open

Pieplow C, Wessel G. Functional annotation of a hugely expanded nanos repertoire in Lytechinus variegatus, the green sea urchin. Mol Reprod Dev 2023;90:310-322. [PMID: 37039283 PMCID: PMC10225336 DOI: 10.1002/mrd.23684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 02/17/2023] [Accepted: 03/18/2023] [Indexed: 04/12/2023]

Wang D, Tang G, Zhao L, Wang M, Chen L, Zhao C, Liang Z, Chen J, Cao Y, Yao J. Potential roles of the rectum keystone microbiota in modulating the microbial community and growth performance in goat model. J Anim Sci Biotechnol 2023;14:55. [PMID: 37029437 PMCID: PMC10080759 DOI: 10.1186/s40104-023-00850-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 02/05/2023] [Indexed: 04/09/2023] Open

Spänig S, Michel A, Heider D. Unsupervised encoding selection through ensemble pruning for biomedical classification. BioData Min 2023;16:10. [PMID: 36927546 PMCID: PMC10018861 DOI: 10.1186/s13040-022-00317-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 11/27/2022] [Indexed: 03/18/2023] Open

Abstract

BACKGROUND

Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers leverage artificial intelligence to circumvent the cumbersome, wet-lab-based identification and automate the detection of promising candidates. However, the prediction of a peptide's function is not limited to antimicrobial efficiency. To date, multiple studies successfully classified additional properties, e.g., antiviral or cell-penetrating effects. In this light, ensemble classifiers are employed aiming to further improve the prediction. Although we recently presented a workflow to significantly diminish the initial encoding choice, an entire unsupervised encoding selection, considering various machine learning models, is still lacking.

RESULTS

We developed a workflow, automatically selecting encodings and generating classifier ensembles by employing sophisticated pruning methods. We observed that the Pareto frontier pruning is a good method to create encoding ensembles for the datasets at hand. In addition, encodings combined with the Decision Tree classifier as the base model are often superior. However, our results also demonstrate that none of the ensemble building techniques is outstanding for all datasets.

CONCLUSION

The workflow conducts multiple pruning methods to evaluate ensemble classifiers composed from a wide range of peptide encodings and base models. Consequently, researchers can use the workflow for unsupervised encoding selection and ensemble creation. Ultimately, the extensible workflow can be used as a plugin for the PEPTIDE REACToR, further establishing it as a versatile tool in the domain.

Collapse

Banaye Yazdipour A, Masoorian H, Ahmadi M, Mohammadzadeh N, Ayyoubzadeh SM. Predicting the toxicity of nanoparticles using artificial intelligence tools: a systematic review. Nanotoxicology 2023;17:62-77. [PMID: 36883698 DOI: 10.1080/17435390.2023.2186279] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Rudar J, Golding GB, Kremer SC, Hajibabaei M. Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta Diversity in Medically Relevant 16S Amplicon Sequencing Data. Microbiol Spectr 2023;11:e0206522. [PMID: 36877086 PMCID: PMC10100742 DOI: 10.1128/spectrum.02065-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 02/11/2023] [Indexed: 03/07/2023] Open

Abstract

Developing an understanding of how microbial communities vary across conditions is an important analytical step. We used 16S rRNA data isolated from human stool samples to investigate whether learned dissimilarities, such as those produced using unsupervised decision tree ensembles, can be used to improve the analysis of the composition of bacterial communities in patients suffering from Crohn's disease and adenomas/colorectal cancers. We also introduce a workflow capable of learning dissimilarities, projecting them into a lower dimensional space, and identifying features that impact the location of samples in the projections. For example, when used with the centered log ratio transformation, our new workflow (TreeOrdination) could identify differences in the microbial communities of Crohn's disease patients and healthy controls. Further investigation of our models elucidated the global impact amplicon sequence variants (ASVs) had on the locations of samples in the projected space and how each ASV impacted individual samples in this space. Furthermore, this approach can be used to integrate patient data easily into the model and results in models that generalize well to unseen data. Models employing multivariate splits can improve the analysis of complex high-throughput sequencing data sets because they are better able to learn about the underlying structure of the data set. IMPORTANCE There is an ever-increasing level of interest in accurately modeling and understanding the roles that commensal organisms play in human health and disease. We show that learned representations can be used to create informative ordinations. We also demonstrate that the application of modern model introspection algorithms can be used to investigate and quantify the impacts of taxa in these ordinations, and that the taxa identified by these approaches have been associated with immune-mediated inflammatory diseases and colorectal cancer.

Collapse

Mavragani A, Bozio C, Butterfield K, Reynolds S, Reese SE, Ball S, Steffens A, Demarco M, McEvoy C, Thompson M, Rowley E, Porter RM, Fink RV, Irving SA, Naleway A. Accuracy of COVID-19-Like Illness Diagnoses in Electronic Health Record Data: Retrospective Cohort Study. JMIR Form Res 2023;7:e39231. [PMID: 36383633 PMCID: PMC9848441 DOI: 10.2196/39231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/13/2022] [Accepted: 09/30/2022] [Indexed: 01/21/2023] Open

Abstract

BACKGROUND

Electronic health record (EHR) data provide a unique opportunity to study the epidemiology of COVID-19, clinical outcomes of the infection, comparative effectiveness of therapies, and vaccine effectiveness but require a well-defined computable phenotype of COVID-19-like illness (CLI).

OBJECTIVE

The objective of this study was to evaluate the performance of pathogen-specific and other acute respiratory illness (ARI) International Statistical Classification of Diseases-9 and -10 codes in identifying COVID-19 cases in emergency department (ED) or urgent care (UC) and inpatient settings.

METHODS

We conducted a retrospective observational cohort study using EHR, claims, and laboratory information system data of ED or UC and inpatient encounters from 4 health systems in the United States. Patients who were aged ≥18 years, had an ED or UC or inpatient encounter for an ARI, and underwent a SARS-CoV-2 polymerase chain reaction test between March 1, 2020, and March 31, 2021, were included. We evaluated various CLI definitions using combinations of International Statistical Classification of Diseases-10 codes as follows: COVID-19-specific codes; CLI definition used in VISION network studies; ARI signs, symptoms, and diagnosis codes only; signs and symptoms of ARI only; and random forest model definitions. We evaluated the sensitivity, specificity, positive predictive value, and negative predictive value of each CLI definition using a positive SARS-CoV-2 polymerase chain reaction test as the reference standard. We evaluated the performance of each CLI definition for distinct hospitalization and ED or UC cohorts.

RESULTS

Among 90,952 hospitalizations and 137,067 ED or UC visits, 5627 (6.19%) and 9866 (7.20%) were positive for SARS-CoV-2, respectively. COVID-19-specific codes had high sensitivity (91.6%) and specificity (99.6%) in identifying patients with SARS-CoV-2 positivity among hospitalized patients. The VISION CLI definition maintained high sensitivity (95.8%) but lowered specificity (45.5%). By contrast, signs and symptoms of ARI had low sensitivity and positive predictive value (28.9% and 11.8%, respectively) but higher specificity and negative predictive value (85.3% and 94.7%, respectively). ARI diagnoses, signs, and symptoms alone had low predictive performance. All CLI definitions had lower sensitivity for ED or UC encounters. Random forest approaches identified distinct CLI definitions with high performance for hospital encounters and moderate performance for ED or UC encounters.

CONCLUSIONS

COVID-19-specific codes have high sensitivity and specificity in identifying adults with positive SARS-CoV-2 test results. Separate combinations of COVID-19-specific codes and ARI codes enhance the utility of CLI definitions in studies using EHR data in hospital and ED or UC settings.

Collapse

Bowe AK, Lightbody G, Staines A, Murray DM. Big data, machine learning, and population health: predicting cognitive outcomes in childhood. Pediatr Res 2023;93:300-307. [PMID: 35681091 PMCID: PMC7614199 DOI: 10.1038/s41390-022-02137-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 05/05/2022] [Accepted: 05/17/2022] [Indexed: 11/09/2022]

Abstract

The application of machine learning (ML) to address population health challenges has received much less attention than its application in the clinical setting. One such challenge is addressing disparities in early childhood cognitive development-a complex public health issue rooted in the social determinants of health, exacerbated by inequity, characterised by intergenerational transmission, and which will continue unabated without novel approaches to address it. Early life, the period of optimal neuroplasticity, presents a window of opportunity for early intervention to improve cognitive development. Unfortunately for many, this window will be missed, and intervention may never occur or occur only when overt signs of cognitive delay manifest. In this review, we explore the potential value of ML and big data analysis in the early identification of children at risk for poor cognitive outcome, an area where there is an apparent dearth of research. We compare and contrast traditional statistical methods with ML approaches, provide examples of how ML has been used to date in the field of neurodevelopmental disorders, and present a discussion of the opportunities and risks associated with its use at a population level. The review concludes by highlighting potential directions for future research in this area. IMPACT: To date, the application of machine learning to address population health challenges in paediatrics lags behind other clinical applications. This review provides an overview of the public health challenge we face in addressing disparities in childhood cognitive development and focuses on the cornerstone of early intervention. Recent advances in our ability to collect large volumes of data, and in analytic capabilities, provide a potential opportunity to improve current practices in this field. This review explores the potential role of machine learning and big data analysis in the early identification of children at risk for poor cognitive outcomes.

Collapse

Ma X, Jiang S, Zhang Z, Wang H, Song C, He J. Long‐term collar deployment leads to bias in soil respiration measurements. Methods Ecol Evol 2023. [DOI: 10.1111/2041-210x.14056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Evaluating Histological Subtypes Classification of Primary Lung Cancers on Unenhanced Computed Tomography Based on Random Forest Model. JOURNAL OF HEALTHCARE ENGINEERING 2023;2023:8964676. [PMID: 36794098 PMCID: PMC9925238 DOI: 10.1155/2023/8964676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/07/2022] [Accepted: 01/21/2023] [Indexed: 02/08/2023]

Abstract

Lung cancer is the leading cause of cancer-related death in many countries, and an accurate histopathological diagnosis is of great importance in subsequent treatment. The aim of this study was to establish the random forest (RF) model based on radiomic features to automatically classify and predict lung adenocarcinoma (ADC), lung squamous cell carcinoma (SCC), and small cell lung cancer (SCLC) on unenhanced computed tomography (CT) images. Eight hundred and fifty-two patients (mean age: 61.4, range: 29-87, male/female: 536/316) with preoperative unenhanced CT and postoperative histopathologically confirmed primary lung cancers, including 525 patients with ADC, 161 patients with SCC, and 166 patients with SCLC, were included in this retrospective study. Radiomic features were extracted, selected, and then used to establish the RF classification model to analyse and classify primary lung cancers into three subtypes, including ADC, SCC, and SCLC according to histopathological results. The training (446 ADC, 137 SCC, and 141 SCLC) and testing cohorts (79 ADC, 24 SCC, and 25 SCLC) accounted for 85% and 15% of the whole datasets, respectively. The prediction performance of the RF classification model was evaluated by F1 scores and the receiver operating characteristic (ROC) curve. On the testing cohort, the areas under the ROC curve (AUC) of the RF model in classifying ADC, SCC, and SCLC were 0.74, 0.77, and 0.88, respectively. The F1 scores achieved 0.80, 0.40, and 0.73 in ADC, SCC, and SCLC, respectively, and the weighted average F1 score was 0.71. In addition, for the RF classification model, the precisions were 0.72, 0.64, and 0.70; the recalls were 0.86, 0.29, and 0.76; and the specificities were 0.55, 0.96, and 0.92 in ADC, SCC, and SCLC. The primary lung cancers were feasibly and effectively classified into ADC, SCC, and SCLC based on the combination of RF classification model and radiomic features, which has the potential for noninvasive predicting histological subtypes of primary lung cancers.

Collapse

Risi E, Lisanti C, Vignoli A, Biagioni C, Paderi A, Cappadona S, Monte FD, Moretti E, Sanna G, Livraghi L, Malorni L, Benelli M, Puglisi F, Luchinat C, Tenori L, Biganzoli L. Risk assessment of disease recurrence in early breast cancer: A serum metabolomic study focused on elderly patients. Transl Oncol 2022;27:101585. [PMID: 36403505 PMCID: PMC9676351 DOI: 10.1016/j.tranon.2022.101585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 10/28/2022] [Accepted: 11/08/2022] [Indexed: 11/18/2022] Open

Mao Y, Zhu Z, Pan S, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real-world retrospective cohort study. J Diabetes Investig 2022;14:309-320. [PMID: 36345236 PMCID: PMC9889616 DOI: 10.1111/jdi.13937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 10/04/2022] [Accepted: 10/16/2022] [Indexed: 11/11/2022] Open

UAV-based classification of maritime Antarctic vegetation types using GEOBIA and random forest. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Neto NGB, O'Rourke SA, Zhang M, Fitzgerald HK, Dunne A, Monaghan MG. Non-invasive classification of macrophage polarisation by 2P-FLIM and machine learning. eLife 2022;11:77373. [PMID: 36254592 PMCID: PMC9578711 DOI: 10.7554/elife.77373] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 09/25/2022] [Indexed: 11/13/2022] Open

Betz LT, Rosen M, Salokangas RKR, Kambeitz J. Disentangling the impact of childhood abuse and neglect on depressive affect in adulthood: A machine learning approach in a general population sample. J Affect Disord 2022;315:17-26. [PMID: 35882299 DOI: 10.1016/j.jad.2022.07.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/26/2022]

Behnamian S, Esposito U, Holland G, Alshehab G, Dobre AM, Pirooznia M, Brimacombe CS, Elhaik E. Temporal population structure, a genetic dating method for ancient Eurasian genomes from the past 10,000 years. CELL REPORTS METHODS 2022;2:100270. [PMID: 36046618 PMCID: PMC9421539 DOI: 10.1016/j.crmeth.2022.100270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 06/17/2022] [Accepted: 07/19/2022] [Indexed: 11/21/2022]

Damigos G, Zacharaki EI, Zerva N, Pavlopoulos A, Chatzikyrkou K, Koumenti A, Moustakas K, Pantos C, Mourouzis I, Lourbopoulos A. Machine learning based analysis of stroke lesions on mouse tissue sections. J Cereb Blood Flow Metab 2022;42:1463-1477. [PMID: 35209753 PMCID: PMC9274860 DOI: 10.1177/0271678x221083387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Boueiz A, Xu Z, Chang Y, Masoomi A, Gregory A, Lutz S, Qiao D, Crapo JD, Dy JG, Silverman EK, Castaldi PJ. Machine Learning Prediction of Progression in Forced Expiratory Volume in 1 Second in the COPDGene® Study. CHRONIC OBSTRUCTIVE PULMONARY DISEASES (MIAMI, FLA.) 2022;9:349-365. [PMID: 35649102 PMCID: PMC9448009 DOI: 10.15326/jcopdf.2021.0275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/18/2022] [Indexed: 05/24/2023]

Abstract

BACKGROUND

The heterogeneous nature of chronic obstructive pulmonary disease (COPD) complicates the identification of the predictors of disease progression. We aimed to improve the prediction of disease progression in COPD by using machine learning and incorporating a rich dataset of phenotypic features.

METHODS

We included 4496 smokers with available data from their enrollment and 5-year follow-up visits in the COPD Genetic Epidemiology (COPDGene^®) study. We constructed linear regression (LR) and supervised random forest models to predict 5-year progression in forced expiratory in 1 second (FEV₁) from 46 baseline features. Using cross-validation, we randomly partitioned participants into training and testing samples. We also validated the results in the COPDGene 10-year follow-up visit.

RESULTS

Predicting the change in FEV₁ over time is more challenging than simply predicting the future absolute FEV₁ level. For random forest, R-squared was 0.15 and the area under the receiver operator characteristic (ROC) curves for the prediction of participants in the top quartile of observed progression was 0.71 (testing) and respectively, 0.10 and 0.70 (validation). Random forest provided slightly better performance than LR. The accuracy was best for Global initiative for chronic Obstructive Lung Disease (GOLD) grades 1-2 participants, and it was harder to achieve accurate prediction in advanced stages of the disease. Predictive variables differed in their relative importance as well as for the predictions by GOLD.

CONCLUSION

Random forest, along with deep phenotyping, predicts FEV₁ progression with reasonable accuracy. There is significant room for improvement in future models. This prediction model facilitates the identification of smokers at increased risk for rapid disease progression. Such findings may be useful in the selection of patient populations for targeted clinical trials.

Collapse

Affiliation(s)

Adel Boueiz Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States *These authors contributed equally
Zhonghui Xu Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States *These authors contributed equally
Yale Chang Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
Aria Masoomi Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
Andrew Gregory Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
Sharon Lutz Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, Massachusetts, United States
Dandi Qiao Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
James D. Crapo Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, Colorado, United States
Jennifer G. Dy Department of Electrical and Computer Engineering, Northeastern University, Boston, Massachusetts, United States
Edwin K. Silverman Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States Pulmonary and Critical Care Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
Peter J. Castaldi Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States
for the COPDGene Investigators

Collapse

Kurata H, Tsukiyama S, Manavalan B. iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Brief Bioinform 2022;23:6623727. [PMID: 35772910 DOI: 10.1093/bib/bbac265] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/23/2022] [Accepted: 06/06/2022] [Indexed: 01/22/2023] Open

Soogun AO, Kharsany ABM, Zewotir T, North D, Ogunsakin RE. Identifying Potential Factors Associated with High HIV viral load in KwaZulu-Natal, South Africa using Multiple Correspondence Analysis and Random Forest Analysis. BMC Med Res Methodol 2022;22:174. [PMID: 35715730 PMCID: PMC9206247 DOI: 10.1186/s12874-022-01625-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 04/27/2022] [Indexed: 12/02/2022] Open

Abstract

Background

Sustainable Human Immunodeficiency Virus (HIV) virological suppression is crucial to achieving the Joint United Nations Programme of HIV/AIDS (UNAIDS) 95–95-95 treatment targets to reduce the risk of onward HIV transmission. Exploratory data analysis is an integral part of statistical analysis which aids variable selection from complex survey data for further confirmatory analysis.

Methods

In this study, we divulge participants’ epidemiological and biological factors with high HIV RNA viral load (HHVL) from an HIV Incidence Provincial Surveillance System (HIPSS) sequential cross-sectional survey between 2014 and 2015 KwaZulu-Natal, South Africa. Using multiple correspondence analysis (MCA) and random forest analysis (RFA), we analyzed the linkage between socio-demographic, behavioral, psycho-social, and biological factors associated with HHVL, defined as ≥400 copies per m/L.

Results

Out of 3956 in 2014 and 3868 in 2015, 50.1% and 41% of participants, respectively, had HHVL. MCA and RFA revealed that knowledge of HIV status, ART use, ARV dosage, current CD4 cell count, perceived risk of contracting HIV, number of lifetime HIV tests, number of lifetime sex partners, and ever diagnosed with TB were consistent potential factors identified to be associated with high HIV viral load in the 2014 and 2015 surveys. Based on MCA findings, diverse categories of variables identified with HHVL were, did not know HIV status, not on ART, on multiple dosages of ARV, with less likely perceived risk of contracting HIV and having two or more lifetime sexual partners.

Conclusion

The high proportion of individuals with HHVL suggests that the UNAIDS 95–95-95 goal of HIV viral suppression is less likely to be achieved. Based on performance and visualization evaluation, MCA was selected as the best and essential exploration tool for identifying and understanding categorical variables’ significant associations and interactions to enhance individual epidemiological understanding of high HIV viral load. When faced with complex survey data and challenges of variables selection in research, exploratory data analysis with robust graphical visualization and reliability that can reveal divers’ structures should be considered.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-022-01625-6.

Collapse

Quah Y, Yi-Le JC, Park NH, Lee YY, Lee EB, Jang SH, Kim MJ, Rhee MH, Lee SJ, Park SC. Serum biomarker-based osteoporosis risk prediction and the systemic effects of Trifolium pratense ethanolic extract in a postmenopausal model. Chin Med 2022;17:70. [PMID: 35701790 PMCID: PMC9199188 DOI: 10.1186/s13020-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 05/11/2022] [Indexed: 11/10/2022] Open

Abstract

Background

Recent years, a soaring number of marketed Trifolium pratense (red clover) extract products have denoted that a rising number of consumers are turning to natural alternatives to manage postmenopausal symptoms. T. pratense ethanolic extract (TPEE) showed immense potential for their uses in the treatment of menopause complications including osteoporosis and hormone dependent diseases. Early diagnosis of osteoporosis can increase the chance of efficient treatment and reduce fracture risks. Currently, the most common diagnosis of osteoporosis is performed by using dual-energy x-ray absorptiometry (DXA). However, the major limitation of DXA is that it is inaccessible and expensive in rural areas to be used for primary care inspection. Hence, serum biomarkers can serve as a meaningful and accessible data for osteoporosis diagnosis.

Methods

The present study systematically elucidated the anti-osteoporosis and estrogenic activities of TPEE in ovariectomized (OVX) rats by evaluating the bone microstructure, uterus index, serum and bone biomarkers, and osteoblastic and osteoclastic gene expression. Leverage on a pool of serum biomarkers obtained from this study, recursive feature elimination with a cross-validation method (RFECV) was used to select useful biomarkers for osteoporosis prediction. Then, using the key features extracted, we employed five classification algorithms: extreme gradient boosting (XGBoost), random forest, support vector machine, artificial neural network, and decision tree to predict the bone quality in terms of T-score.

Results

TPEE treatments down-regulated nuclear factor kappa-B ligand, alkaline phosphatase, and up-regulated estrogen receptor β gene expression. Additionally, reduced serum C-terminal telopeptides of type 1 collagen level and improvement in the estrogen dependent characteristics of the uterus on the lining of the lumen were observed in the TPEE intervention group. Among the tested classifiers, XGBoost stood out as the best performing classification model with the highest F1-score and lowest standard deviation.

Conclusions

The present study demonstrates that TPEE treatment showed therapeutic benefits in the prevention of osteoporosis at the transcriptional level and maintained the estrogen dependent characteristics of the uterus. Our study revealed that, in the case of limited number of features, RFECV paired with XGBoost model could serve as a powerful tool to readily evaluate and diagnose postmenopausal osteoporosis.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13020-022-00622-7.

Collapse

Antikainen AA, Heinonen M, Lähdesmäki H. Modeling binding specificities of transcription factor pairs with random forests. BMC Bioinformatics 2022;23:212. [PMID: 35659235 PMCID: PMC9166390 DOI: 10.1186/s12859-022-04734-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 05/12/2022] [Indexed: 11/10/2022] Open

Abstract Abstract Background Transcription factors (TFs) bind regulatory DNA regions with sequence specificity, form complexes and regulate gene expression. In cooperative TF-TF binding, two transcription factors bind onto a shared DNA binding site as a pair. Previous work has demonstrated pairwise TF-TF-DNA interactions with position weight matrices (PWMs), which may however not sufficiently take into account the complexity and flexibility of pairwise binding. Results We propose two random forest (RF) methods for joint TF-TF binding site prediction: and . We train models with previously published large-scale CAP-SELEX DNA libraries, which comprise DNA sequences enriched for binding of a selected TF pair. builds a random forest with sub-sequences selected from CAP-SELEX DNA reads with previously proposed pairwise PWM. outperforms (area under receiver operating characteristics curve, AUROC, 0.75) the current state-of-the-art method i.e. orientation and spacing specific pairwise PWMs (AUROC 0.59). Thus, may be utilized to improve prediction accuracy for pre-determined binding preferences. However, pairwise TF binding is currently considered flexible; a pair may bind DNA with different orientations and amounts of dinucleotide gaps or overlap between the two motifs. Thus, we developed , which utilizes random forests by considering simultaneously multiple orientations and spacings of the two factors. Our approach outperforms (AUROC 0.78) PWMs, as well as (p<0.00195). provides an approach for predicting TF-TF binding sites without prior knowledge on pairwise binding preferences. However, more research is needed to assess eligibility for practical applications. Conclusions Random forest is well suited for modeling pairwise TF-TF-DNA binding specificities, and provides an improvement to pairwise binding site prediction accuracy. Collapse

Provable Boolean interaction recovery from tree ensemble obtained via random forests. Proc Natl Acad Sci U S A 2022;119:e2118636119. [PMID: 35609192 PMCID: PMC9295780 DOI: 10.1073/pnas.2118636119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Petrosyan Y, Mesana TG, Sun LY. Prediction of acute kidney injury risk after cardiac surgery: using a hybrid machine learning algorithm. BMC Med Inform Decis Mak 2022;22:137. [PMID: 35585624 PMCID: PMC9118758 DOI: 10.1186/s12911-022-01859-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 04/20/2022] [Indexed: 11/17/2022] Open

Abstract

Background

Acute kidney injury (AKI) is a serious complication after cardiac surgery. We derived and internally validated a Machine Learning preoperative model to predict cardiac surgery-associated AKI of any severity and compared its performance with parametric statistical models.

Methods

We conducted a retrospective study of adult patients who underwent major cardiac surgery requiring cardiopulmonary bypass between November 1st, 2009 and March 31st, 2015. AKI was defined according to the KDIGO criteria as stage 1 or greater, within 7 days of surgery. We randomly split the cohort into derivation and validation datasets. We developed three AKI risk models: (1) a hybrid machine learning (ML) algorithm, using Random Forests for variable selection, followed by high performance logistic regression; (2) a traditional logistic regression model and (3) an enhanced logistic regression model with 500 bootstraps, with backward variable selection. For each model, we assigned risk scores to each of the retained covariate and assessed model discrimination (C statistic) and calibration (Hosmer–Lemeshow goodness-of-fit test) in the validation datasets.

Results

Of 6522 included patients, 1760 (27.0%) developed AKI. The best performance was achieved by the hybrid ML algorithm to predict AKI of any severity. The ML and enhanced statistical models remained robust after internal validation (C statistic = 0.75; Hosmer–Lemeshow p = 0.804, and AUC = 0.74, Hosmer–Lemeshow p = 0.347, respectively).

Conclusions

We demonstrated that a hybrid ML model provides higher accuracy without sacrificing parsimony, computational efficiency, or interpretability, when compared with parametric statistical models. This score-based model can easily be used at the bedside to identify high-risk patients who may benefit from intensive perioperative monitoring and personalized management strategies.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12911-022-01859-w.

Collapse

Vignoli A, Tenori L, Luchinat C. An omics approach to study trace metals in sera of hemodialysis patients treated with erythropoiesis stimulating agents. Metallomics 2022;14:6572376. [PMID: 35451491 DOI: 10.1093/mtomcs/mfac028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/20/2022] [Indexed: 11/12/2022]

Gadot R, Anand A, Lovin BD, Sweeney AD, Patel AJ. Predicting surgical decision-making in vestibular schwannoma using tree-based machine learning. Neurosurg Focus 2022;52:E8. [DOI: 10.3171/2022.1.focus21708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 01/19/2022] [Indexed: 11/06/2022]

Abstract OBJECTIVE Vestibular schwannomas (VSs) are the most common neoplasm of the cerebellopontine angle in adults. Though these lesions are generally slow growing, their growth patterns and associated symptoms can be unpredictable, which may complicate the decision to pursue conservative management versus active intervention. Additionally, surgical decision-making can be controversial because of limited high-quality evidence and multiple quality-of-life considerations. Machine learning (ML) is a powerful tool that utilizes data sets to essentialize multidimensional clinical processes. In this study, the authors trained multiple tree-based ML algorithms to predict the decision for active treatment versus MRI surveillance of VS in a single institutional cohort. In doing so, they sought to assess which preoperative variables carried the most weight in driving the decision for intervention and could be used to guide future surgical decision-making through an evidence-based approach. METHODS The authors reviewed the records of patients who had undergone evaluation by neurosurgery and otolaryngology with subsequent active treatment (resection or radiation) for unilateral VS in the period from 2009 to 2021, as well as those of patients who had been evaluated for VS and were managed conservatively throughout 2021. Clinical presentation, radiographic data, and management plans were abstracted from each patient record from the time of first evaluation until the last follow-up or surgery. Each encounter with the patient was treated as an instance involving a management decision that depended on demographics, symptoms, and tumor profile. Decision tree and random forest classifiers were trained and tested to predict the decision for treatment versus imaging surveillance on the basis of unseen data using an 80/20 pseudorandom split. Predictor variables were tuned to maximize performance based on lowest Gini impurity indices. Model performance was optimized using fivefold cross-validation. RESULTS One hundred twenty-four patients with 198 rendered decisions concerning management were included in the study. In the decision tree analysis, only a maximum tumor dimension threshold of 1.6 cm and progressive symptoms were required to predict the decision for treatment with 85% accuracy. Optimizing maximum dimension thresholds and including age at presentation boosted accuracy to 88%. Random forest analysis (n = 500 trees) predicted the decision for treatment with 80% accuracy. Factors with the highest variable importance based on multiple measures of importance, including mean minimal conditional depth and largest Gini impurity reduction, were maximum tumor dimension, age at presentation, Koos grade, and progressive symptoms at presentation. CONCLUSIONS Tree-based ML was used to predict which factors drive the decision for active treatment of VS with 80%–88% accuracy. The most important factors were maximum tumor dimension, age at presentation, Koos grade, and progressive symptoms. These results can assist in surgical decision-making and patient counseling. They also demonstrate the power of ML algorithms in extracting useful insights from limited data sets. Collapse

Tarimo CS, Bhuyan SS, Zhao Y, Ren W, Mohammed A, Li Q, Gardner M, Mahande MJ, Wang Y, Wu J. Prediction of low Apgar score at five minutes following labor induction intervention in vaginal deliveries: machine learning approach for imbalanced data at a tertiary hospital in North Tanzania. BMC Pregnancy Childbirth 2022;22:275. [PMID: 35365129 PMCID: PMC8976377 DOI: 10.1186/s12884-022-04534-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open

Abstract

Background

Prediction of low Apgar score for vaginal deliveries following labor induction intervention is critical for improving neonatal health outcomes. We set out to investigate important attributes and train popular machine learning (ML) algorithms to correctly classify neonates with a low Apgar scores from an imbalanced learning perspective.

Methods

We analyzed 7716 induced vaginal deliveries from the electronic birth registry of the Kilimanjaro Christian Medical Centre (KCMC). 733 (9.5%) of which constituted of low (< 7) Apgar score neonates. The ‘extra-tree classifier’ was used to assess features’ importance. We used Area Under Curve (AUC), recall, precision, F-score, Matthews Correlation Coefficient (MCC), balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK) to evaluate the performance of the selected six (6) machine learning classifiers. To address class imbalances, we examined three widely used resampling techniques: the Synthetic Minority Oversampling Technique (SMOTE) and Random Oversampling Examples (ROS) and Random undersampling techniques (RUS). We applied Decision Curve Analysis (DCA) to evaluate the net benefit of the selected classifiers.

Results

Birth weight, maternal age, and gestational age were found to be important predictors for the low Apgar score following induced vaginal delivery. SMOTE, ROS and and RUS techniques were more effective at improving “recalls” among other metrics in all the models under investigation. A slight improvement was observed in the F1 score, BA, and BM. DCA revealed potential benefits of applying Boosting method for predicting low Apgar scores among the tested models.

Conclusion

There is an opportunity for more algorithms to be tested to come up with theoretical guidance on more effective rebalancing techniques suitable for this particular imbalanced ratio. Future research should prioritize a debate on which performance indicators to look up to when dealing with imbalanced or skewed data.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12884-022-04534-0.

Collapse

Rudar J, Porter TM, Wright M, Golding GB, Hajibabaei M. LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data. BMC Bioinformatics 2022;23:110. [PMID: 35361114 PMCID: PMC8969335 DOI: 10.1186/s12859-022-04631-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open

Abstract

Background

Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery.

Results

We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada’s Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark’s generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (p ≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries.

Conclusions

Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04631-z.

Collapse

An Integrated Taxonomic Approach Points towards a Single-Species Hypothesis for Santolina (Asteraceae) in Corsica and Sardinia. BIOLOGY 2022;11:biology11030356. [PMID: 35336730 PMCID: PMC8945001 DOI: 10.3390/biology11030356] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/26/2022] [Accepted: 02/22/2022] [Indexed: 12/04/2022]

Abstract

Simple Summary

Systematics is the branch of biology that studies the relationships among organisms and their evolution, while taxonomy is the science of classification. In this work, a systematic and taxonomic investigation about three plant species of Santolina, commonly known as lavender-cotton, is presented. Two of these species occur exclusively in Corsica and Sardinia, two of the main islands of the Mediterranean Sea, while a third one is a common ornamental plant, known only as cultivated. By integrating several approaches, we find out that the two putative species from Corsica and Sardinia are actually very similar from many points of view. A two-species hypothesis is no longer supported according to our results, so that these plants should be reclassified as a single species. This study demonstrates the importance of integrating different sources of information to produce reliable classifications (i.e. taxonomic hypotheses). In addition, our study is useful to better understand plant evolution in the context of the Mediterranean Basin, one of the world’s biodiversity hotspots.

Abstract

Santolina is a plant genus of dwarf aromatic shrubs that includes about 26 species native to the western Mediterranean Basin. In Corsica and Sardinia, two of the main islands of the Mediterranean, Santolina corsica (tetraploid) and S. insularis (hexaploid) are reported. Along with the cultivated pentaploid S. chamaecyparissus, these species form a group of taxa that is hard to distinguish only by morphology. Molecular (using ITS, trnH-psbA, trnL-trnF, trnQ-rps16, rps15-ycf1, psbM-trnD, and trnS-trnG), cypsela morpho-colorimetric, morphometric, and niche similarity analyses were conducted to investigate the diversity of plants belonging to this species group. Our results confute the current taxonomic hypothesis and suggest considering S. corsica and S. insularis as a single species. Moreover, molecular and morphometric results highlight the strong affinity between S. chamaecyparissus and the Santolina populations endemic to Corsica and Sardinia. Finally, the populations from south-western Sardinia, due to their high differentiation in the studied plastid markers and the different climatic niche with respect to all the other populations, could be considered as an evolutionary significant unit.

Collapse

Construction of a Diagnostic Model for Lymph Node Metastasis of the Papillary Thyroid Carcinoma Using Preoperative Ultrasound Features and Imaging Omics. JOURNAL OF HEALTHCARE ENGINEERING 2022;2022:1872412. [PMID: 35178222 PMCID: PMC8846989 DOI: 10.1155/2022/1872412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 12/14/2021] [Accepted: 01/07/2022] [Indexed: 11/17/2022]

Künzel SR, Saarinen TF, Liu EW, Sekhon JS. Linear Aggregation in Tree-based Estimators. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2026780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]