1
|
Ouchi K, Yoshimaru D, Takemura A, Yamamoto S, Hayashi R, Higo N, Obara M, Sugase-Miyamoto Y, Tsurugizawa T. Multi-scale hierarchical brain regions detect individual and interspecies variations of structural connectivity in macaque monkeys and humans. Neuroimage 2024; 302:120901. [PMID: 39447715 DOI: 10.1016/j.neuroimage.2024.120901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 10/01/2024] [Accepted: 10/22/2024] [Indexed: 10/26/2024] Open
Abstract
Macaques are representative animal models in translational research. However, the distinct shape and location of the brain regions between macaques and humans prevents us from comparing the brain structure directly. Here, we calculated structural connectivity (SC) with multi-scale hierarchical regions of interest (ROIs) to parcel out human and macaque brain into 8 (level 1 ROIs), 28 (level 2 ROIs), or 46 (level 3 ROIs) regions, which consist of anatomically and functionally defined level 4 ROIs (around 100 parcellation of the brain). The SC with the level 1 ROIs showed lower individual and interspecies variation in macaques and humans. SC with level 2 and 3 ROIs shows that the several regions in frontal, temporal and parietal lobe show distinct connectivity between macaques and humans. Lateral frontal cortex, motor cortex and auditory cortex were shown to be important areas for interspecies differences. These results provide insights to use macaques as animal models for translational study.
Collapse
Affiliation(s)
- Kazuya Ouchi
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan; Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki 305-8573, Japan
| | - Daisuke Yoshimaru
- Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki 305-8573, Japan; Jikei University School of Medicine, 3-25-8 Nishishinbashi, Minato City Tokyo 105-8461, Japan
| | - Aya Takemura
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan
| | - Shinya Yamamoto
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan; Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1, Tennodai, Tsukuba, Ibaraki 305-8577, Japan
| | - Ryusuke Hayashi
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan
| | - Noriyuki Higo
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan
| | - Makoto Obara
- Philips Japan, 2-13-37 Kohnan, Minato-ku 108-8507, Tokyo, Japan
| | - Yasuko Sugase-Miyamoto
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan
| | - Tomokazu Tsurugizawa
- Human Informatics and Interaction Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba-City, Ibaraki 305-8568, Japan; Faculty of Engineering, Information and Systems, University of Tsukuba, Ibaraki 305-8573, Japan; Jikei University School of Medicine, 3-25-8 Nishishinbashi, Minato City Tokyo 105-8461, Japan.
| |
Collapse
|
2
|
Wang JH, Hou PL, Chen YH. Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data. Cancer Inform 2024; 23:11769351241286710. [PMID: 39385930 PMCID: PMC11462568 DOI: 10.1177/11769351241286710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 09/05/2024] [Indexed: 10/12/2024] Open
Abstract
Objectives Under the classification of multicategory survival outcomes of cancer patients, it is crucial to identify biomarkers that affect specific outcome categories. The classification of multicategory survival outcomes from transcriptomic data has been thoroughly investigated in computational biology. Nevertheless, several challenges must be addressed, including the ultra-high-dimensional feature space, feature contamination, and data imbalance, all of which contribute to the instability of the diagnostic model. Furthermore, although most methods achieve accurate predicted performance for binary classification with high-dimensional transcriptomic data, their extension to multi-class classification is not straightforward. Methods We employ the One-versus-One strategy to transform multi-class classification into multiple binary classification, and utilize the overlapping group screening procedure with binary logistic regression to include pathway information for identifying important genes and gene-gene interactions for multicategory survival outcomes. Results A series of simulation studies are conducted to compare the classification accuracy of our proposed approach with some existing machine learning methods. In practical data applications, we utilize the random oversampling procedure to tackle class imbalance issues. We then apply the proposed method to analyze transcriptomic data from various cancers in The Cancer Genome Atlas, such as kidney renal papillary cell carcinoma, lung adenocarcinoma, and head and neck squamous cell carcinoma. Our aim is to establish an accurate microarray-based multicategory cancer diagnosis model. The numerical results illustrate that the new proposal effectively enhances cancer diagnosis compared to approaches that neglect pathway information. Conclusions We showcase the effectiveness of the proposed method in terms of class prediction accuracy through evaluations on simulated synthetic datasets as well as real dataset applications. We also identified the cancer-related gene-gene interaction biomarkers and reported the corresponding network structure. According to the identified major genes and gene-gene interactions, we can predict for each patient the probabilities that he/she belongs to each of the survival outcome classes.
Collapse
Affiliation(s)
- Jie-Huei Wang
- Department of Mathematics, National Chung Cheng University, Chiayi City, Taiwan
| | - Po-Lin Hou
- Department of Mathematics, National Chung Cheng University, Chiayi City, Taiwan
| | - Yi-Hau Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
3
|
Hoffmann M, Poschenrieder J, Incudini M, Baier S, Fritz A, Maier A, Hartung M, Hoffmann C, Trummer N, Adamowicz K, Picciani M, Scheibling E, Harl M, Lesch I, Frey H, Kayser S, Wissenberg P, Schwartz L, Hafner L, Acharya A, Hackl L, Grabert G, Lee SG, Cho G, Cloward M, Jankowski J, Lee H, Tsoy O, Wenke N, Pedersen A, Bønnelykke K, Mandarino A, Melograna F, Schulz L, Climente-González H, Wilhelm M, Iapichino L, Wienbrandt L, Ellinghaus D, Van Steen K, Grossi M, Furth P, Hennighausen L, Di Pierro A, Baumbach J, Kacprowski T, List M, Blumenthal D. Network medicine-based epistasis detection in complex diseases: ready for quantum computing. Nucleic Acids Res 2024; 52:10144-10160. [PMID: 39175109 PMCID: PMC11417373 DOI: 10.1093/nar/gkae697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 07/12/2024] [Accepted: 08/01/2024] [Indexed: 08/24/2024] Open
Abstract
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Collapse
Affiliation(s)
- Markus Hoffmann
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a) Technical University of Munich, D-85748 Garching, Germany
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - Julian M Poschenrieder
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Massimiliano Incudini
- Dipartimento di Informatica, Universit‘a di Verona, Strada le Grazie 15 - 34137 Verona, Italy
| | - Sylvie Baier
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Amelie Fritz
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Christian Hoffmann
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nico Trummer
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Evelyn Scheibling
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Maximilian V Harl
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
- Department of Health Sciences and Technology, Neuroscience Center Zürich (ZNZ), Swiss Federal Institute of Technology (ETH Zürich), Zürich 8092, Switzerland
| | - Ingmar Lesch
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Hunor Frey
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Simon Kayser
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Paul Wissenberg
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Leon Schwartz
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Leon Hafner
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a) Technical University of Munich, D-85748 Garching, Germany
| | - Aakriti Acharya
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| | - Lena Hackl
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Gordon Grabert
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| | - Sung-Gwon Lee
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, Korea
| | - Gyuhyeok Cho
- Department of Chemistry, Gwangju Institute of Science and Technology, Gwangju, Korea
| | | | - Jakub Jankowski
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - Hye Kyung Lee
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nina Wenke
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Anders Gorm Pedersen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
| | - Klaus Bønnelykke
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Antonio Mandarino
- International Centre for Theory of Quantum Technologies, University of Gdańsk, 80-309 Gdańsk, Poland
| | - Federico Melograna
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Laura Schulz
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | | | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
- Munich Data Science Institute (MDSI), Technical University of Munich, Garching, Germany
| | - Luigi Iapichino
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | - Lars Wienbrandt
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - Kristel Van Steen
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Michele Grossi
- European Organization for Nuclear Research (CERN), Geneva1211, Switzerland
| | - Priscilla A Furth
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
| | - Lothar Hennighausen
- Institute for Advanced Study (Lichtenbergstrasse 2 a) Technical University of Munich, D-85748 Garching, Germany
- National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA
| | - Alessandra Di Pierro
- Dipartimento di Informatica, Universit‘a di Verona, Strada le Grazie 15 - 34137 Verona, Italy
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Denmark
| | - Tim Kacprowski
- Department of Health Sciences and Technology, Neuroscience Center Zürich (ZNZ), Swiss Federal Institute of Technology (ETH Zürich), Zürich 8092, Switzerland
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
| | - Markus List
- Data Science in Systems Biology, School of Life Sciences, Technical University of Munich, Freising, Germany
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | | |
Collapse
|
4
|
Li HF, Wang JT, Zhao Q, Zhang YM. BLUPmrMLM: A Fast mrMLM Algorithm in Genome-wide Association Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae020. [PMID: 39348630 DOI: 10.1093/gpbjnl/qzae020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 12/13/2023] [Accepted: 01/10/2024] [Indexed: 10/02/2024]
Abstract
Multilocus genome-wide association study has become the state-of-the-art tool for dissecting the genetic architecture of complex and multiomic traits. However, most existing multilocus methods require relatively long computational time when analyzing large datasets. To address this issue, in this study, we proposed a fast mrMLM method, namely, best linear unbiased prediction multilocus random-SNP-effect mixed linear model (BLUPmrMLM). First, genome-wide single-marker scanning in mrMLM was replaced by vectorized Wald tests based on the best linear unbiased prediction (BLUP) values of marker effects and their variances in BLUPmrMLM. Then, adaptive best subset selection (ABESS) was used to identify potentially associated markers on each chromosome to reduce computational time when estimating marker effects via empirical Bayes. Finally, shared memory and parallel computing schemes were used to reduce the computational time. In simulation studies, BLUPmrMLM outperformed GEMMA, EMMAX, mrMLM, and FarmCPU as well as the control method (BLUPmrMLM with ABESS removed), in terms of computational time, power, accuracy for estimating quantitative trait nucleotide positions and effects, false positive rate, false discovery rate, false negative rate, and F1 score. In the reanalysis of two large rice datasets, BLUPmrMLM significantly reduced the computational time and identified more previously reported genes, compared with the aforementioned methods. This study provides an excellent multilocus model method for the analysis of large-scale and multiomic datasets. The software mrMLM v5.1 is available at BioCode (https://ngdc.cncb.ac.cn/biocode/tool/BT007388) or GitHub (https://github.com/YuanmingZhang65/mrMLM).
Collapse
Affiliation(s)
- Hong-Fu Li
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Jing-Tian Wang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiong Zhao
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan-Ming Zhang
- College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
5
|
Hong S, Choi YA, Joo DS, Gürsoy G. Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data. J Biomed Inform 2024; 156:104678. [PMID: 38936565 PMCID: PMC11272436 DOI: 10.1016/j.jbi.2024.104678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/29/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
OBJECTIVE Linear and logistic regression are widely used statistical techniques in population genetics for analyzing genetic data and uncovering patterns and associations in large genetic datasets, such as identifying genetic variations linked to specific diseases or traits. However, obtaining statistically significant results from these studies requires large amounts of sensitive genotype and phenotype information from thousands of patients, which raises privacy concerns. Although cryptographic techniques such as homomorphic encryption offers a potential solution to the privacy concerns as it allows computations on encrypted data, previous methods leveraging homomorphic encryption have not addressed the confidentiality of shared models, which can leak information about the training data. METHODS In this work, we present a secure model evaluation method for linear and logistic regression using homomorphic encryption for six prediction tasks, where input genotypes, output phenotypes, and model parameters are all encrypted. RESULTS Our method ensures no private information leakage during inference and achieves high accuracy (≥93% for all outcomes) with each inference taking less than ten seconds for ∼200 genomes. CONCLUSION Our study demonstrates that it is possible to perform linear and logistic regression model evaluation while protecting patient confidentiality with theoretical security guarantees. Our implementation and test data are available at https://github.com/G2Lab/privateML/.
Collapse
Affiliation(s)
- Seungwan Hong
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA
| | - Yoolim A Choi
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA
| | - Daniel S Joo
- New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
6
|
Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024; 3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| | - Andrew D White
- Department of Chemical Engineering, University of Rochester Rochester NY 14627 USA
| |
Collapse
|
7
|
Soh Tsin Howe J. Discovering significant topics from legal decisions with selective inference. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2024; 382:20230147. [PMID: 38403064 PMCID: PMC10894691 DOI: 10.1098/rsta.2023.0147] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 12/19/2023] [Indexed: 02/27/2024]
Abstract
We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts by passing features synthesized with topic models through penalized regressions and post-selection significance tests. The method identifies case topics significantly correlated with outcomes, topic-word distributions which can be manually interpreted to gain insights about significant topics, and case-topic weights which can be used to identify representative cases for each topic. We demonstrate the method on a new dataset of domain name disputes and a canonical dataset of European Court of Human Rights violation cases. Topic models based on latent semantic analysis as well as language model embeddings are evaluated. We show that topics derived by the pipeline are consistent with legal doctrines in both areas and can be useful in other related legal analysis tasks. This article is part of the theme issue 'A complexity science approach to law and governance'.
Collapse
|
8
|
Chen J, Zhao L, Zhang L, Luo Y, Jiang Y, H P. The identification of signature genes and their relationship with immune cell infiltration in age-related macular degeneration. Mol Biol Rep 2024; 51:339. [PMID: 38393419 DOI: 10.1007/s11033-023-08969-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 11/26/2023] [Indexed: 02/25/2024]
Abstract
BACKGROUND Age-related macular degeneration (AMD) is a prevalent source of visual impairment among the elderly population, and its incidence has risen in tandem with the increasing longevity of humans. Despite the progress made with anti-VEGF therapy, clinical outcomes have proven to be unsatisfactory. METHOD We obtained differentially expressed genes (DEGs) of AMD patients and healthy controls from the GEO database. GO and KEGG analyses were used to enrich the DEGs. Weighted gene coexpression network analysis (WGCNA) was used to identify modules related to AMD. SVM, random forest, and least absolute shrinkage and selection operator (LASSO) were employed to screen hub genes. Gene set enrichment analysis (GSEA) was used to explore the pathways in which these hub genes were enriched. CIBERSORT was utilized to analyze the relationship between the hub genes and immune cell infiltration. Finally, Western blotting and RT‒PCR were used to explore the expression of hub genes in AMD mice. RESULTS We screened 1084 DEGs in GSE29801, of which 496 genes were upregulated. These 1084 DEGs were introduced into the WGCNA, and 94 genes related to AMD were obtained. Seventy-nine overlapping genes were obtained by the Venn plot. These 79 genes were introduced into three machine-learning methods to screen the hub genes, and the genes identified by the three methods were TNC, FAP, SREBF1, and TGF-β2. We verified their diagnostic function in the GSE29801 and GSE103060 datasets. Then, the hub gene co-enrichment pathways were obtained by GO and KEGG analyses. CIBERSORT analysis showed that these hub genes were associated with immune cell infiltration. Finally, we found increased expression of TNC, FAP, SREBF1, and TGF-β2 mRNA and protein in the retinas of AMD mice. CONCLUSION We found that four hub genes, namely, FAP, TGF-β2, SREBF1, and TNC, have diagnostic significance in patients with AMD and are related to immune cell infiltration. Finally, we determined that the mRNA and protein expression of these hub genes was upregulated in the retinas of AMD mice.
Collapse
Affiliation(s)
- Jinquan Chen
- Department of Ophthalmology, The Tongnan District People's Hospital, Chongqing, China
| | - Long Zhao
- Department of Ophthalmology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Longbin Zhang
- Department of Ophthalmology, The Tongnan District People's Hospital, Chongqing, China
| | - Yiling Luo
- Department of Ophthalmology, The Tongnan District People's Hospital, Chongqing, China
| | - Yuling Jiang
- Department of Ophthalmology, The Tongnan District People's Hospital, Chongqing, China
| | - Peng H
- Department of Ophthalmology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
9
|
Choi Y, Cha J, Choi S. Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES). BMC Bioinformatics 2024; 25:56. [PMID: 38308205 PMCID: PMC10837879 DOI: 10.1186/s12859-024-05677-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 01/26/2024] [Indexed: 02/04/2024] Open
Abstract
BACKGROUND Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). RESULTS First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen's Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. CONCLUSIONS Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.
Collapse
Affiliation(s)
- Yongjun Choi
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea
| | - Junho Cha
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea
| | - Sungkyoung Choi
- Department of Applied Artificial Intelligence, College of Computing, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
- Department of Mathematical Data Science, College of Science and Convergence Technology, Hanyang University, 55 Hanyang-daehak-ro, Sangnok-gu, Ansan, 15588, South Korea.
| |
Collapse
|
10
|
Luo Y, Zhu M, Wei X, Xu J, Pan S, Liu G, Song Y, Hu W, Dai Y, Wu G. Investigation of clear cell renal cell carcinoma grades using diffusion-relaxation correlation spectroscopic imaging with optimized spatial-spectrum analysis. Br J Radiol 2024; 97:135-141. [PMID: 38263829 PMCID: PMC11008501 DOI: 10.1093/bjr/tqad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 09/28/2023] [Accepted: 10/10/2023] [Indexed: 01/25/2024] Open
Abstract
OBJECTIVES To differentiate high-grade from low-grade clear cell renal cell carcinoma (ccRCC) using diffusion-relaxation correlation spectroscopic imaging (DR-CSI) spectra in an equal separating analysis. METHODS Eighty patients with 86 pathologically confirmed ccRCCs who underwent DR-CSI were enrolled. Two radiologists delineated the region of interest. The spectrum was derived based on DR-CSI and was further segmented into multiple equal subregions from 2*2 to 9*9. The agreement between the 2 radiologists was assessed by the intraclass correlation coefficient (ICC). Logistic regression was used to establish the regression model for differentiation, and 5-fold cross-validation was used to evaluate its accuracy. McNemar's test was used to compare the diagnostic performance between equipartition models and the traditional parameters, including the apparent diffusion coefficient (ADC) and T2 value. RESULTS The inter-reader agreement decreased as the divisions in the equipartition model increased (overall ICC ranged from 0.859 to 0.920). The accuracy increased from the 2*2 to 9*9 equipartition model (0.68 for 2*2, 0.69 for 3*3 and 4*4, 0.70 for 5*5, 0.71 for 6*6, 0.78 for 7*7, and 0.75 for 8*8 and 9*9). The equipartition models with divisions >7*7 were significantly better than ADC and T2 (vs ADC: P = .002-.008; vs T2: P = .001-.004). CONCLUSIONS The equipartition method has the potential to analyse the DR-CSI spectrum and discriminate between low-grade and high-grade ccRCC. ADVANCES IN KNOWLEDGE The evaluation of DR-CSI relies on prior knowledge, and how to assess the spectrum derived from DR-CSI without prior knowledge has not been well studied.
Collapse
Affiliation(s)
- Yuansheng Luo
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Mengying Zhu
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaobin Wei
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Jianrong Xu
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Shihang Pan
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Guiqin Liu
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Yang Song
- MR Scientific Marketing, Siemens Healthineers Ltd., 200129 Shanghai, China
| | - Wentao Hu
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Yongming Dai
- School of Biomedical Engineering, Shanghai Tech University, 201210 Shanghai, China
| | - Guangyu Wu
- Department of Radiology, School of Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
11
|
Yu Y, Zhang Z, Xia F, Sun B, Liu S, Wang X, Zhou X, Zhao J. Exploration of the pathophysiology of high myopia via proteomic profiling of human corneal stromal lenticules. Exp Eye Res 2024; 238:109726. [PMID: 37979904 DOI: 10.1016/j.exer.2023.109726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 11/08/2023] [Accepted: 11/10/2023] [Indexed: 11/20/2023]
Abstract
This study aimed to investigate the underlying pathophysiology of high myopia by analyzing the proteome of human corneal stromal lenticule samples obtained through small incision lenticule extraction (SMILE). A total of thirty-two patients who underwent SMILE were included in the study. Label-free quantitative proteomic analysis was performed on corneal stromal lenticule samples, equally representing high myopia (n = 10) and low myopia (n = 10) groups. The identified and profiled lenticule proteomes were analyzed using in silico tools to explore biological characteristics of differentially expressed proteins (DEPs). Additionally, LASSO regression and random forest model were employed to identify key proteins associated with the pathophysiology of high myopia. The DEPs were found to be closely linked to immune activation, extracellular matrix, and cell adhesion-related pathways according to gene ontology analysis. Specifically, decreased expression of COL1A1 and increased expression of CDH11 were associated with the pathogenesis of high myopia and validated by western blotting (n = 6) and quantitative real time polymerase chain reaction (n = 6). Overall, this study provides evidence that COL1A1 and CDH11 may contribute to the pathophysiology of high myopia based on comparative proteomic profiling of human corneal stromal lenticules obtained through SMILE.
Collapse
Affiliation(s)
- Yanze Yu
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China; Fudan University Shanghai Medical College, Shanghai 200032, China
| | - Zhe Zhang
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
| | - Fei Xia
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
| | - Bingqing Sun
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
| | - Shengtao Liu
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
| | - Xiaoying Wang
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China
| | - Xingtao Zhou
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China.
| | - Jing Zhao
- Department of Ophthalmology and Vision Science, Eye and ENT Hospital, Fudan University, Shanghai, China; NHC Key Laboratory of Myopia (Fudan University), Shanghai, China; Laboratory of Myopia, Chinese Academy of Medical Sciences, Shanghai, China.
| |
Collapse
|
12
|
Guo H, Li T, Wang Z. Pleiotropic genetic association analysis with multiple phenotypes using multivariate response best-subset selection. BMC Genomics 2023; 24:759. [PMID: 38082214 PMCID: PMC10712198 DOI: 10.1186/s12864-023-09820-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open
Abstract
Genetic pleiotropy refers to the simultaneous association of a gene with multiple phenotypes. It is widely distributed in the whole genome and can help to understand the common genetic mechanism of diseases or traits. In this study, a multivariate response best-subset selection (MRBSS) model based pleiotropic association analysis method is proposed. Different from the traditional genetic association model, the high-dimensional genotypic data are viewed as response variables while the multiple phenotypic data as predictor variables. Moreover, the response best-subset selection procedure is converted into an 0-1 integer optimization problem by introducing a separation parameter and a tuning parameter. Furthermore, the model parameters are estimated by using the curve search under the modified Bayesian information criterion. Simulation experiments show that the proposed method MRBSS remarkably reduces the computational time, obtains higher statistical power under most of the considered scenarios, and controls the type I error rate at a low level. The application studies in the datasets of maize yield traits and pig lipid traits further verifies the effectiveness.
Collapse
Affiliation(s)
- Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China.
| | - Tong Li
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China
| | - Zixuan Wang
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan, 430074, People's Republic of China
| |
Collapse
|
13
|
Lombardi A, Arezzo F, Di Sciascio E, Ardito C, Mongelli M, Di Lillo N, Fascilla FD, Silvestris E, Kardhashi A, Putino C, Cazzolla A, Loizzi V, Cazzato G, Cormio G, Di Noia T. A human-interpretable machine learning pipeline based on ultrasound to support leiomyosarcoma diagnosis. Artif Intell Med 2023; 146:102697. [PMID: 38042596 DOI: 10.1016/j.artmed.2023.102697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 10/08/2023] [Accepted: 10/29/2023] [Indexed: 12/04/2023]
Abstract
The preoperative evaluation of myometrial tumors is essential to avoid delayed treatment and to establish the appropriate surgical approach. Specifically, the differential diagnosis of leiomyosarcoma (LMS) is particularly challenging due to the overlapping of clinical, laboratory and ultrasound features between fibroids and LMS. In this work, we present a human-interpretable machine learning (ML) pipeline to support the preoperative differential diagnosis of LMS from leiomyomas, based on both clinical data and gynecological ultrasound assessment of 68 patients (8 with LMS diagnosis). The pipeline provides the following novel contributions: (i) end-users have been involved both in the definition of the ML tasks and in the evaluation of the overall approach; (ii) clinical specialists get a full understanding of both the decision-making mechanisms of the ML algorithms and the impact of the features on each automatic decision. Moreover, the proposed pipeline addresses some of the problems concerning both the imbalance of the two classes by analyzing and selecting the best combination of the synthetic oversampling strategy of the minority class and the classification algorithm among different choices, and the explainability of the features at global and local levels. The results show very high performance of the best strategy (AUC = 0.99, F1 = 0.87) and the strong and stable impact of two ultrasound-based features (i.e., tumor borders and consistency of the lesions). Furthermore, the SHAP algorithm was exploited to quantify the impact of the features at the local level and a specific module was developed to provide a template-based natural language (NL) translation of the explanations for enhancing their interpretability and fostering the use of ML in the clinical setting.
Collapse
Affiliation(s)
- Angela Lombardi
- Department of Electrical and Information Engineering (DEI), Politecnico di Bari, Bari, Italy.
| | - Francesca Arezzo
- Gynecologic Oncology Unit, Interdisciplinar Department of Medicine, IRCCS Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Eugenio Di Sciascio
- Department of Electrical and Information Engineering (DEI), Politecnico di Bari, Bari, Italy
| | - Carmelo Ardito
- Department of Engineering, LUM "Giuseppe Degennaro" University, Casamassima, Bari, Italy
| | - Michele Mongelli
- Obstetrics and Gynecology Unit, Department of Biomedical Sciences and Human Oncology, University of Bari "Aldo Moro", Bari, Italy
| | - Nicola Di Lillo
- Obstetrics and Gynecology Unit, Department of Biomedical Sciences and Human Oncology, University of Bari "Aldo Moro", Bari, Italy
| | | | - Erica Silvestris
- Gynecologic Oncology Unit, Interdisciplinar Department of Medicine, IRCCS Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Anila Kardhashi
- Gynecologic Oncology Unit, Interdisciplinar Department of Medicine, IRCCS Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Carmela Putino
- Obstetrics and Gynecology Unit, Department of Biomedical Sciences and Human Oncology, University of Bari "Aldo Moro", Bari, Italy
| | - Ambrogio Cazzolla
- Gynecologic Oncology Unit, Interdisciplinar Department of Medicine, IRCCS Istituto Tumori "Giovanni Paolo II", Bari, Italy
| | - Vera Loizzi
- Gynecologic Oncology Unit, Interdisciplinar Department of Medicine, IRCCS Istituto Tumori "Giovanni Paolo II", Bari, Italy; Interdisciplinar Department of Medicine, University of Bari "Aldo Moro", Bari, Italy
| | - Gerardo Cazzato
- Section of Pathology, Department of Emergency and Organ Transplantation (DETO), University of Bari "Aldo Moro", Bari, Italy
| | - Gennaro Cormio
- Gynecologic Oncology Unit, Interdisciplinar Department of Medicine, IRCCS Istituto Tumori "Giovanni Paolo II", Bari, Italy; Interdisciplinar Department of Medicine, University of Bari "Aldo Moro", Bari, Italy
| | - Tommaso Di Noia
- Department of Electrical and Information Engineering (DEI), Politecnico di Bari, Bari, Italy
| |
Collapse
|
14
|
Wei M, Zhang Y, Zhao L, Zhao Z. Development and validation of a radiomics nomogram for diagnosis of malignant pleural effusion. Discov Oncol 2023; 14:213. [PMID: 37999794 PMCID: PMC10673775 DOI: 10.1007/s12672-023-00835-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 11/21/2023] [Indexed: 11/25/2023] Open
Abstract
OBJECTIVE We aimed to develop a radiomics nomogram based on computed tomography (CT) scan features and high-throughput radiomics features for diagnosis of malignant pleural effusion (MPE). METHODS In this study, 507 eligible patients with PE (207 malignant and 300 benign) were collected retrospectively. Patients were divided into training (n = 355) and validation cohorts (n = 152). Radiomics features were extracted from initial unenhanced CT images. CT scan features of PE were also collected. We used the variance threshold algorithm and least absolute shrinkage and selection operator (LASSO) to select optimal features to build a radiomics model for predicting the nature of PE. Univariate and multivariable logistic regression analyzes were used to identify significant independent factors associated with MPE, which were then included in the radiomics nomogram. RESULTS A total of four CT features were retained as significant independent factors, including massive PE, obstructive atelectasis or pneumonia, pleural thickening > 10 mm, and pulmonary nodules and/or masses. The radiomics nomogram constructed from 13 radiomics parameters and four CT features showed good predictive efficacy in training cohort [area under the curve (AUC) = 0.926, 95% CI 0.894, 0.951] and validation cohort (AUC = 0.916, 95% CI 0.860, 0.955). The calibration curve and decision curve analysis showed that the nomogram helped differentiate MPE from benign pleural effusion (BPE) in clinical practice. CONCLUSION This study presents a nomogram model incorporating CT scan features and radiomics features to help physicians differentiate MPE from BPE.
Collapse
Affiliation(s)
- Mingzhu Wei
- Department of Radiology, Shaoxing People's Hospital, Shaoxing, Zhejiang, People's Republic of China.
- Department of Radiology, Shaoxing People's Hospital, No. 568, Zhongxing North Road, Yuecheng District, Shaoxing, 312000, Zhejiang, People's Republic of China.
| | - Yaping Zhang
- Department of Radiology, Shaoxing People's Hospital, Shaoxing, Zhejiang, People's Republic of China
| | - Li Zhao
- Department of Radiology, Shaoxing People's Hospital, Shaoxing, Zhejiang, People's Republic of China
| | - Zhenhua Zhao
- Department of Radiology, Shaoxing People's Hospital, Shaoxing, Zhejiang, People's Republic of China
| |
Collapse
|
15
|
Xu T, Zhu C, Song F, Zhang W, Yuan M, Pan Z, Huang P. Immunological characteristics of immunogenic cell death genes and malignant progression driving roles of TLR4 in anaplastic thyroid carcinoma. BMC Cancer 2023; 23:1131. [PMID: 37990304 PMCID: PMC10664293 DOI: 10.1186/s12885-023-11647-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 11/15/2023] [Indexed: 11/23/2023] Open
Abstract
Anaplastic thyroid carcinoma (ATC) was a rare malignancy featured with the weak immunotherapeutic response. So far, disorders of immunogenic cell death genes (ICDGs) were identified as the driving factors in cancer progression, while their roles in ATC remained poorly clear. Datasets analysis identified that most ICDGs were high expressed in ATC, while DE-ICDGs were located in module c1_112, which was mainly enriched in Toll-like receptor signalings. Subsequently, the ICD score was established to classify ATC samples into the high and low ICD score groups, and function analysis indicated that high ICD score was associated with the immune characteristics. The high ICD score group had higher proportions of specific immune and stromal cells, as well as increased expression of immune checkpoints. Additionally, TLR4, ENTPD1, LY96, CASP1 and PDIA3 were identified as the dynamic signature in the malignant progression of ATC. Notably, TLR4 was significantly upregulated in ATC tissues, associated with poor prognosis. Silence of TLR4 inhibited the proliferation, metastasis and clone formation of ATC cells. Eventually, silence of TLR4 synergistically enhanced paclitaxel-induced proliferation inhibition, apoptosis, CALR exposure and release of ATP. Our findings highlighted that the aberrant expression of TLR4 drove the malignant progression of ATC, which contributed to our understanding of the roles of ICDGs in ATC.
Collapse
Affiliation(s)
- Tong Xu
- Center for Clinical Pharmacy, Cancer Center, Department of Pharmacy, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, 158 Shangtang Road, Xiacheng District, Hangzhou, Zhejiang, 310014, China
| | - Chaozhuang Zhu
- Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Feifeng Song
- Center for Clinical Pharmacy, Cancer Center, Department of Pharmacy, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, 158 Shangtang Road, Xiacheng District, Hangzhou, Zhejiang, 310014, China
| | - Wanli Zhang
- Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Mengnan Yuan
- Center for Clinical Pharmacy, Cancer Center, Department of Pharmacy, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, 158 Shangtang Road, Xiacheng District, Hangzhou, Zhejiang, 310014, China
| | - Zongfu Pan
- Center for Clinical Pharmacy, Cancer Center, Department of Pharmacy, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, 158 Shangtang Road, Xiacheng District, Hangzhou, Zhejiang, 310014, China
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, Zhejiang, 310014, China
| | - Ping Huang
- Center for Clinical Pharmacy, Cancer Center, Department of Pharmacy, Zhejiang Provincial People's Hospital (Affiliated People's Hospital), Hangzhou Medical College, 158 Shangtang Road, Xiacheng District, Hangzhou, Zhejiang, 310014, China.
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, Zhejiang, 310014, China.
| |
Collapse
|
16
|
Hoffmann M, Poschenrieder JM, Incudini M, Baier S, Fitz A, Maier A, Hartung M, Hoffmann C, Trummer N, Adamowicz K, Picciani M, Scheibling E, Harl MV, Lesch I, Frey H, Kayser S, Wissenberg P, Schwartz L, Hafner L, Acharya A, Hackl L, Grabert G, Lee SG, Cho G, Cloward M, Jankowski J, Lee HK, Tsoy O, Wenke N, Pedersen AG, Bønnelykke K, Mandarino A, Melograna F, Schulz L, Climente-González H, Wilhelm M, Iapichino L, Wienbrandt L, Ellinghaus D, Van Steen K, Grossi M, Furth PA, Hennighausen L, Di Pierro A, Baumbach J, Kacprowski T, List M, Blumenthal DB. Network medicine-based epistasis detection in complex diseases: ready for quantum computing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.07.23298205. [PMID: 38076997 PMCID: PMC10705612 DOI: 10.1101/2023.11.07.23298205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.
Collapse
Affiliation(s)
- Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Julian M. Poschenrieder
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Massimiliano Incudini
- Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
| | - Sylvie Baier
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Amelie Fitz
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Christian Hoffmann
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nico Trummer
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Evelyn Scheibling
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Maximilian V. Harl
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Ingmar Lesch
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Hunor Frey
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Simon Kayser
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Paul Wissenberg
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Leon Schwartz
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - Leon Hafner
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
| | - Aakriti Acharya
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Lena Hackl
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Gordon Grabert
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Sung-Gwon Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, Korea
| | - Gyuhyeok Cho
- Department of Chemistry, Gwangju Institute of Science and Technology, Gwangju, Korea
| | - Matthew Cloward
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jakub Jankowski
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Hye Kyung Lee
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Nina Wenke
- Institute for Computational Systems Biology, University of Hamburg, Germany
| | - Anders Gorm Pedersen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
| | - Klaus Bønnelykke
- Copenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Antonio Mandarino
- International Centre for Theory of Quantum Technologies, University of Gdańsk, 80-309 Gdańsk, Poland
| | - Federico Melograna
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Laura Schulz
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | | | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Luigi Iapichino
- Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities (LRZ), Garching b. München, Germany
| | - Lars Wienbrandt
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Christian Albrechts University of Kiel, Kiel, Germany
| | - Kristel Van Steen
- BIO3 - Systems Genetics; GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Systems Medicine; Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Michele Grossi
- European Organization for Nuclear Research (CERN), Geneva 1211, Switzerland
| | - Priscilla A. Furth
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
- Departments of Oncology & Medicine, Georgetown University, Washington, DC, USA
| | - Lothar Hennighausen
- Institute for Advanced Study (Lichtenbergstrasse 2 a, D-85748 Garching, Germany), Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, United States of America
| | - Alessandra Di Pierro
- Dipartimento di Informatica, Universit’a di Verona, Strada le Grazie 15 - 34137, Verona, Italy
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Germany
- Computational BioMedicine Lab, University of Southern Denmark, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, Technische Universität Braunschweig and Hannover Medical School, Rebenring 56, 38106 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Germany
| | - David B. Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
17
|
Yang M, Wen Y, Zheng J, Zhang J, Zhao T, Feng J. Improving power of genome-wide association studies via transforming ordinal phenotypes into continuous phenotypes. FRONTIERS IN PLANT SCIENCE 2023; 14:1247181. [PMID: 38023883 PMCID: PMC10652869 DOI: 10.3389/fpls.2023.1247181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 10/18/2023] [Indexed: 12/01/2023]
Abstract
Introduction Ordinal traits are important complex traits in crops, while genome-wide association study (GWAS) is a widely-used method in their gene mining. Presently, GWAS of continuous quantitative traits (C-GWAS) and single-locus association analysis method of ordinal traits are the main methods used for ordinal traits. However, the detection power of these two methods is low. Methods To address this issue, we proposed a new method, named MTOTC, in which hierarchical data of ordinal traits are transformed into continuous phenotypic data (CPData). Results Then, FASTmrMLM, one C-GWAS method, was used to conduct GWAS for CPData. The results from the simulation studies showed that, MTOTC+FASTmrMLM for ordinal traits was better than the classical methods when there were four and fewer hierarchical levels. In addition, when MTOTC was combined with FASTmrEMMA, mrMLM, ISIS EM-BLASSO, pLARmEB, and pKWmEB, relatively high power and low false positive rate in QTN detection were observed as well. Subsequently, MTOTC was applied to analyze the hierarchical data of soybean salt-alkali tolerance. It was revealed that more significant QTNs were detected when MTOTC was combined with any of the above six C-GWAs. Discussion Accordingly, the new method increases the choices of the GWAS methods for ordinal traits and helps to mine the genes for ordinal traits in resource populations.
Collapse
Affiliation(s)
- Ming Yang
- Key Laboratory of Biology and Genetics Improvement of Soybean, Ministry of Agriculture/Zhongshan Biological Breeding Laboratory (ZSBBL)/National Innovation Platform for Soybean Breeding and Industry-Education Integration/State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization/College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Yangjun Wen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Jinchang Zheng
- Key Laboratory of Biology and Genetics Improvement of Soybean, Ministry of Agriculture/Zhongshan Biological Breeding Laboratory (ZSBBL)/National Innovation Platform for Soybean Breeding and Industry-Education Integration/State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization/College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Tuanjie Zhao
- Key Laboratory of Biology and Genetics Improvement of Soybean, Ministry of Agriculture/Zhongshan Biological Breeding Laboratory (ZSBBL)/National Innovation Platform for Soybean Breeding and Industry-Education Integration/State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization/College of Agriculture, Nanjing Agricultural University, Nanjing, China
| | - Jianying Feng
- Key Laboratory of Biology and Genetics Improvement of Soybean, Ministry of Agriculture/Zhongshan Biological Breeding Laboratory (ZSBBL)/National Innovation Platform for Soybean Breeding and Industry-Education Integration/State Key Laboratory of Crop Genetics & Germplasm Enhancement and Utilization/College of Agriculture, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
18
|
Tao W, Wang BY, Luo L, Li Q, Meng ZA, Xia TL, Deng WM, Yang M, Zhou J, Zhang X, Gao X, Li LY, He YD. A urine extracellular vesicle lncRNA classifier for high-grade prostate cancer and increased risk of progression: A multi-center study. Cell Rep Med 2023; 4:101240. [PMID: 37852185 PMCID: PMC10591064 DOI: 10.1016/j.xcrm.2023.101240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 07/03/2023] [Accepted: 09/21/2023] [Indexed: 10/20/2023]
Abstract
To construct a urine extracellular vesicle long non-coding RNA (lncRNA) classifier that can detect high-grade prostate cancer (PCa) of grade group 2 or greater and estimate the risk of progression during active surveillance, we identify high-grade PCa-specific lncRNAs by combined analyses of cohorts from TAHSY, TCGA, and the GEO database. We develop and validate a 3-lncRNA diagnostic model (Clnc, being made of AC015987.1, CTD-2589M5.4, RP11-363E6.3) that can detect high-grade PCa. Clnc shows higher accuracy than prostate cancer antigen 3 (PCA3), multiparametric magnetic resonance imaging (mpMRI), and two risk calculators (Prostate Cancer Prevention Trial [PCPT]-RC 2.0 and European Randomized Study of Screening for Prostate Cancer [ERSPC]-RC) in the training cohort (n = 350), two independent cohorts (n = 232; n = 251), and TCGA cohort (n = 499). In the prospective active surveillance cohort (n = 182), Clnc at diagnosis remains a powerful independent predictor for overall active surveillance progression. Thus, Clnc is a potential biomarker for high-grade PCa and can also serve as a biomarker for improved selection of candidates for active surveillance.
Collapse
Affiliation(s)
- Wen Tao
- Department of Urology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Bang-Yu Wang
- Department of Anesthesiology, Shanghai General Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200080, China
| | - Liang Luo
- Department of Urology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Qing Li
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin 999077, Hong Kong
| | - Zhan-Ao Meng
- Department of Radiology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Tao-Lin Xia
- Department of Urology, Foshan First Municipal People's Hospital, Sun Yat-sen University, Foshan 528000, China
| | - Wei-Ming Deng
- Department of Urology, The First Affiliated Hospital, University of South China, Hengyang 421000, China
| | - Ming Yang
- Department of Urology, Foshan Municipal Chinese Medicine Hospital, Foshan 528000, China
| | - Jing Zhou
- Department of Pathology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China
| | - Xin Zhang
- Department of Pathology, Foshan First Municipal People's Hospital, Sun Yat-sen University, Foshan 528000, China
| | - Xin Gao
- Department of Urology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China.
| | - Liao-Yuan Li
- Department of Urology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China.
| | - Ya-Di He
- Health Management Center, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510630, China.
| |
Collapse
|
19
|
Kim K, Jun TH, Ha BK, Wang S, Sun H. New statistical selection method for pleiotropic variants associated with both quantitative and qualitative traits. BMC Bioinformatics 2023; 24:381. [PMID: 37817069 PMCID: PMC10563219 DOI: 10.1186/s12859-023-05505-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 09/28/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Identification of pleiotropic variants associated with multiple phenotypic traits has received increasing attention in genetic association studies. Overlapping genetic associations from multiple traits help to detect weak genetic associations missed by single-trait analyses. Many statistical methods were developed to identify pleiotropic variants with most of them being limited to quantitative traits when pleiotropic effects on both quantitative and qualitative traits have been observed. This is a statistically challenging problem because there does not exist an appropriate multivariate distribution to model both quantitative and qualitative data together. Alternatively, meta-analysis methods can be applied, which basically integrate summary statistics of individual variants associated with either a quantitative or a qualitative trait without accounting for correlations among genetic variants. RESULTS We propose a new statistical selection method based on a unified selection score quantifying how a genetic variant, i.e., a pleiotropic variant associates with both quantitative and qualitative traits. In our extensive simulation studies where various types of pleiotropic effects on both quantitative and qualitative traits were considered, we demonstrated that the proposed method outperforms the existing meta-analysis methods in terms of true positive selection. We also applied the proposed method to a peanut dataset with 6 quantitative and 2 qualitative traits, and a cowpea dataset with 2 quantitative and 6 qualitative traits. We were able to detect some potentially pleiotropic variants missed by the existing methods in both analyses. CONCLUSIONS The proposed method is able to locate pleiotropic variants associated with both quantitative and qualitative traits. It has been implemented into an R package 'UNISS', which can be downloaded from http://github.com/statpng/uniss.
Collapse
Affiliation(s)
- Kipoong Kim
- Department of Statistic, Pusan National University, 46241, Busan, Korea
| | - Tae-Hwan Jun
- Department of Plant Bioscience, Pusan National University, 50463, Miryang, Korea
| | - Bo-Keun Ha
- Department of Applied Plant Science, Chonnam National University, 61186, Gwangju, Korea
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, 10032, USA
| | - Hokeun Sun
- Department of Statistic, Pusan National University, 46241, Busan, Korea.
| |
Collapse
|
20
|
Wen Z, Long J, Zhu L, Liu S, Zeng X, Huang D, Qiu X, Su L. Associations of dietary, sociodemographic, and anthropometric factors with anemia among the Zhuang ethnic adults: a cross-sectional study in Guangxi Zhuang Autonomous Region, China. BMC Public Health 2023; 23:1934. [PMID: 37803356 PMCID: PMC10557179 DOI: 10.1186/s12889-023-16697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Accepted: 09/04/2023] [Indexed: 10/08/2023] Open
Abstract
BACKGROUND After decades of rapid economic development, anemia remains a significant public health challenge globally. This study aimed to estimate the associations of sociodemographic, dietary, and body composition factors with anemia among the Zhuang in Guangxi Zhuang Autonomous Region, China. METHODS Our study population from the baseline survey of the Guangxi ethnic minority Cohort Study of Chronic Diseases consisted of 13,465 adults (6,779 women and 6,686 men) aged 24-82 years. A validated interviewer-administered laptop-based questionnaire system was used to collect information on participants' sociodemographic, lifestyle, and dietary factors. Each participant underwent a physical examination, and hematological indices were measured. Least absolute shrinkage and selection operator (LASSO) regression was used to select the variables, and logistic regression was applied to estimate the associations of independent risk factors with anemia. RESULTS The overall prevalences of anemia in men and women were 9.63% (95% CI: 8.94-10.36%) and 18.33% (95% CI: 17.42─19.28%), respectively. LASSO and logistic regression analyses showed that age was positively associated with anemia for both women and men. For diet in women, red meat consumption for 5-7 days/week (OR = 0.79, 95% CI: 0.65-0.98, p = 0.0290) and corn/sweet potato consumption for 5-7 days/week (OR = 0.73, 95% CI: 0.55-0.96, p = 0.0281) were negatively associated with anemia. For men, fruit consumption for 5-7 days/week (OR = 0.75, 95% CI: 0.60-0.94, p = 0.0130) and corn/sweet potato consumption for 5-7 days/week (OR = 0.66, 95% CI: 0.46-0.91, p = 0.0136) were negatively correlated with anemia. Compared with a normal body water percentage (55-65%), a body water percentage below normal (< 55%) was negatively related to anemia (OR = 0.68, 95% CI: 0.53-0.86, p = 0.0014). Conversely, a body water percentage above normal (> 65%) was positively correlated with anemia in men (OR = 1.73, 95% CI: 1.38-2.17, p < 0.0001). CONCLUSIONS Anemia remains a moderate public health problem for premenopausal women and the elderly population in the Guangxi Zhuang minority region. The prevention of anemia at the population level requires multifaceted intervention measures according to sex and age, with a focus on dietary factors and the control of body composition.
Collapse
Affiliation(s)
- Zheng Wen
- Department of Epidemiology and Health Statistics, School of Public Health, Guangxi Medical University, 22 Shuangyong Road, Nanning, 530021, Guangxi, China
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China
| | - Jianxiong Long
- Department of Epidemiology and Health Statistics, School of Public Health, Guangxi Medical University, 22 Shuangyong Road, Nanning, 530021, Guangxi, China
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China
| | - Lulu Zhu
- Department of Epidemiology and Health Statistics, School of Public Health, Guangxi Medical University, 22 Shuangyong Road, Nanning, 530021, Guangxi, China
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China
| | - Shun Liu
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China
- Department of Maternal, Child and Adolescent Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoyun Zeng
- Department of Epidemiology and Health Statistics, School of Public Health, Guangxi Medical University, 22 Shuangyong Road, Nanning, 530021, Guangxi, China
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China
| | - Dongping Huang
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China
- Department of Sanitary Chemistry, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China
| | - Xiaoqiang Qiu
- Department of Epidemiology and Health Statistics, School of Public Health, Guangxi Medical University, 22 Shuangyong Road, Nanning, 530021, Guangxi, China.
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China.
| | - Li Su
- Department of Epidemiology and Health Statistics, School of Public Health, Guangxi Medical University, 22 Shuangyong Road, Nanning, 530021, Guangxi, China.
- Guangxi Colleges and Universities Key Laboratory of Prevention and Control of Highly Prevalent Diseases, Guangxi Medical University, Nanning, 530021, Guangxi, China.
| |
Collapse
|
21
|
Dang T, Fermin ASR, Machizawa MG. oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data. Front Neuroinform 2023; 17:1266713. [PMID: 37829329 PMCID: PMC10566623 DOI: 10.3389/fninf.2023.1266713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/08/2023] [Indexed: 10/14/2023] Open
Abstract
The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.
Collapse
Affiliation(s)
- Tung Dang
- Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Alan S. R. Fermin
- Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
| | - Maro G. Machizawa
- Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
| |
Collapse
|
22
|
Lee E, Ibrahim JG, Zhu H. Bayesian bi-level variable selection for genome-wide survival study. Genomics Inform 2023; 21:e28. [PMID: 37813624 PMCID: PMC10584651 DOI: 10.5808/gi.23047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 10/11/2023] Open
Abstract
Mild cognitive impairment (MCI) is a clinical syndrome characterized by the onset and evolution of cognitive impairments, often considered a transitional stage to Alzheimer's disease (AD). The genetic traits of MCI patients who experience a rapid progression to AD can enhance early diagnosis capabilities and facilitate drug discovery for AD. While a genome-wide association study (GWAS) is a standard tool for identifying single nucleotide polymorphisms (SNPs) related to a disease, it fails to detect SNPs with small effect sizes due to stringent control for multiple testing. Additionally, the method does not consider the group structures of SNPs, such as genes or linkage disequilibrium blocks, which can provide valuable insights into the genetic architecture. To address the limitations, we propose a Bayesian bi-level variable selection method that detects SNPs associated with time of conversion from MCI to AD. Our approach integrates group inclusion indicators into an accelerated failure time model to identify important SNP groups. Additionally, we employ data augmentation techniques to impute censored time values using a predictive posterior. We adapt Dirichlet-Laplace shrinkage priors to incorporate the group structure for SNP-level variable selection. In the simulation study, our method outperformed other competing methods regarding variable selection. The analysis of Alzheimer's Disease Neuroimaging Initiative (ADNI) data revealed several genes directly or indirectly related to AD, whereas a classical GWAS did not identify any significant SNPs.
Collapse
Affiliation(s)
- Eunjee Lee
- Department of Information and Statistics, Chungnam National University, Daejeon 34134, Korea
| | - Joseph G. Ibrahim
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
23
|
Obry L, Dalmasso C. Weighted multiple testing procedures in genome-wide association studies. PeerJ 2023; 11:e15369. [PMID: 37337586 PMCID: PMC10276986 DOI: 10.7717/peerj.15369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 04/17/2023] [Indexed: 06/21/2023] Open
Abstract
Multiple testing procedures controlling the false discovery rate (FDR) are increasingly used in the context of genome wide association studies (GWAS), and weighted multiple testing procedures that incorporate covariate information are efficient to improve the power to detect associations. In this work, we evaluate some recent weighted multiple testing procedures in the specific context of GWAS through a simulation study. We also present a new efficient procedure called wBHa that prioritizes the detection of genetic variants with low minor allele frequencies while maximizing the overall detection power. The results indicate good performance of our procedure compared to other weighted multiple testing procedures. In particular, in all simulated settings, wBHa tends to outperform other procedures in detecting rare variants while maintaining good overall power. The use of the different procedures is illustrated with a real dataset.
Collapse
Affiliation(s)
- Ludivine Obry
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| | - Cyril Dalmasso
- Université Paris-Saclay, CNRS, Univ Evry, Laboratoire de Mathématiques et Modélisation d’Evry, Evry-Courcouronnes, France
| |
Collapse
|
24
|
Ansari M, White AD. Learning Peptide Properties with Positive Examples Only. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543289. [PMID: 37333233 PMCID: PMC10274696 DOI: 10.1101/2023.06.01.543289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Deep learning can create accurate predictive models by exploiting existing large-scale experimental data, and guide the design of molecules. However, a major barrier is the requirement of both positive and negative examples in the classical supervised learning frameworks. Notably, most peptide databases come with missing information and low number of observations on negative examples, as such sequences are hard to obtain using high-throughput screening methods. To address this challenge, we solely exploit the limited known positive examples in a semi-supervised setting, and discover peptide sequences that are likely to map to certain antimicrobial properties via positive-unlabeled learning (PU). In particular, we use the two learning strategies of adapting base classifier and reliable negative identification to build deep learning models for inferring solubility, hemolysis, binding against SHP-2, and non-fouling activity of peptides, given their sequence. We evaluate the predictive performance of our PU learning method and show that by only using the positive data, it can achieve competitive performance when compared with the classical positive-negative (PN) classification approach, where there is access to both positive and negative examples.
Collapse
Affiliation(s)
- Mehrad Ansari
- Department of Chemical Engineering, University of Rochester, Rochester, NY, 14627, USA
| | - Andrew D. White
- Department of Chemical Engineering, University of Rochester, Rochester, NY, 14627, USA
| |
Collapse
|
25
|
Jarquin D, Roy A, Clarke B, Ghosal S. Combining phenotypic and genomic data to improve prediction of binary traits. J Appl Stat 2023; 51:1497-1523. [PMID: 38863802 PMCID: PMC11164039 DOI: 10.1080/02664763.2023.2208773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 04/22/2023] [Indexed: 06/13/2024]
Abstract
Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here 'main traits') of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or 'phenotypes') that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.
Collapse
Affiliation(s)
- D. Jarquin
- Agronomy, University of Florida, Gainesville, FL, USA
| | - A. Roy
- Biostatistics Department, University of Florida, Gainesville, FL, USA
| | - B. Clarke
- Statistics, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - S. Ghosal
- Statistics, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
26
|
Chu BB, Ko S, Zhou JJ, Jensen A, Zhou H, Sinsheimer JS, Lange K. Multivariate genome-wide association analysis by iterative hard thresholding. Bioinformatics 2023; 39:btad193. [PMID: 37067496 PMCID: PMC10133532 DOI: 10.1093/bioinformatics/btad193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 04/07/2023] [Accepted: 04/13/2023] [Indexed: 04/18/2023] Open
Abstract
MOTIVATION In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive. RESULTS We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA's linear mixed models and mv-PLINK's canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits. AVAILABILITY AND IMPLEMENTATION Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.
Collapse
Affiliation(s)
- Benjamin B Chu
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
| | - Seyoon Ko
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
| | - Jin J Zhou
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
| | - Aubrey Jensen
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
| | - Hua Zhou
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
| | - Janet S Sinsheimer
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
| | - Kenneth Lange
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095-1554, United States
- Department of Statistics at UCLA, Los Angeles, CA 90095-1554, United States
| |
Collapse
|
27
|
Manthena V, Jarquín D, Howard R. Integrating and optimizing genomic, weather, and secondary trait data for multiclass classification. Front Genet 2023; 13:1032691. [PMID: 37065625 PMCID: PMC10090538 DOI: 10.3389/fgene.2022.1032691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 12/22/2022] [Indexed: 04/18/2023] Open
Abstract
Modern plant breeding programs collect several data types such as weather, images, and secondary or associated traits besides the main trait (e.g., grain yield). Genomic data is high-dimensional and often over-crowds smaller data types when naively combined to explain the response variable. There is a need to develop methods able to effectively combine different data types of differing sizes to improve predictions. Additionally, in the face of changing climate conditions, there is a need to develop methods able to effectively combine weather information with genotype data to predict the performance of lines better. In this work, we develop a novel three-stage classifier to predict multi-class traits by combining three data types-genomic, weather, and secondary trait. The method addressed various challenges in this problem, such as confounding, differing sizes of data types, and threshold optimization. The method was examined in different settings, including binary and multi-class responses, various penalization schemes, and class balances. Then, our method was compared to standard machine learning methods such as random forests and support vector machines using various classification accuracy metrics and using model size to evaluate the sparsity of the model. The results showed that our method performed similarly to or better than machine learning methods across various settings. More importantly, the classifiers obtained were highly sparse, allowing for a straightforward interpretation of relationships between the response and the selected predictors.
Collapse
Affiliation(s)
- Vamsi Manthena
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Diego Jarquín
- Agronomy Department, University of Florida, Gainesville, FL, United States
| | - Reka Howard
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, United States
| |
Collapse
|
28
|
Xiong W, Pan H, Wang J, Tian M. An efficient model-free approach to interaction screening for high dimensional data. Stat Med 2023; 42:1583-1605. [PMID: 36857779 DOI: 10.1002/sim.9688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 12/02/2022] [Accepted: 02/06/2023] [Indexed: 03/03/2023]
Abstract
An innovated model-free interaction screening procedure called the MCVIS is proposed for high dimensional data analysis. Specifically, we adopt the introduced MCV index for quantifying the importance of an interaction effect among predictors. Our proposed method is fully nonparametric and is capable of successfully selecting interactions even if the signal of parental main effects is weak. The MCVIS procedure has many distinctive features: (i) it can work with discrete, categorical and continuous covariates; (ii) it can deal with both categorical and continuous response, even handle the missing response; (iii) it is robust for heavy-tailed distributions, thus well accommodates heterogeneity typically caused by high dimensionality; (iv) it enjoys the sure screening and ranking consistency properties, therefore achieves dimension reduction without information loss. In another respect, computational feasibility is a top concern in high dimensional data analysis, by transforming our MCV into several variants, the MCVIS procedure is simple and fast to implement. Extensive numerical experiments and comparisons confirm the effectiveness and wide applicability of our MCVIS procedure. We further illustrate the proposed methodology by empirical study of two real datasets. Supplementary materials are available online.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Han Pan
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Jianrong Wang
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Maozai Tian
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| |
Collapse
|
29
|
Li K, Wang F, Yang L, Liu R. Deep Feature Screening: Feature Selection for Ultra High-Dimensional Data via Deep Neural Networks. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.03.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
30
|
Jung J, Ohk J, Kim H, Holt CE, Park HJ, Jung H. mRNA transport, translation, and decay in adult mammalian central nervous system axons. Neuron 2023; 111:650-668.e4. [PMID: 36584679 DOI: 10.1016/j.neuron.2022.12.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/31/2022] [Accepted: 12/08/2022] [Indexed: 12/30/2022]
Abstract
Localized mRNA translation regulates synapse function and axon maintenance, but how compartment-specific mRNA repertoires are regulated is largely unknown. We developed an axonal transcriptome capture method that allows deep sequencing of metabolically labeled mRNAs from retinal ganglion cell axon terminals in mouse. Comparing axonal-to-somal transcriptomes and axonal translatome-to-transcriptome enables genome-wide visualization of mRNA transport and translation and unveils potential regulators tuned to each process. FMRP and TDP-43 stand out as key regulators of transport, and experiments in Fmr1 knockout mice validate FMRP's role in the axonal transportation of synapse-related mRNAs. Pulse-and-chase experiments enable genome-wide assessment of mRNA stability in axons and reveal a strong coupling between mRNA translation and decay. Measuring the absolute mRNA abundance per axon terminal shows that the adult axonal transcriptome is stably maintained by persistent transport. Our datasets provide a rich resource for unique insights into RNA-based mechanisms in maintaining presynaptic structure and function in vivo.
Collapse
Affiliation(s)
- Jane Jung
- Department of Anatomy, Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Jiyeon Ohk
- Department of Anatomy, Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Hyeyoung Kim
- Department of Anatomy, Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul 03722, Republic of Korea
| | - Christine E Holt
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Hyun Jung Park
- Samsung Genome Institute, Samsung Medical Center, Seoul, Republic of Korea.
| | - Hosung Jung
- Department of Anatomy, Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul 03722, Republic of Korea.
| |
Collapse
|
31
|
Chen J, Li Q, Chen HY. Testing generalized linear models with high-dimensional nuisance parameter. Biometrika 2023; 110:83-99. [PMID: 36816791 PMCID: PMC9933885 DOI: 10.1093/biomet/asac021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Generalized linear models often have a high-dimensional nuisance parameters, as seen in applications such as testing gene-environment interactions or gene-gene interactions. In these scenarios, it is essential to test the significance of a high-dimensional sub-vector of the model's coefficients. Although some existing methods can tackle this problem, they often rely on the bootstrap to approximate the asymptotic distribution of the test statistic, and thus are computationally expensive. Here, we propose a computationally efficient test with a closed-form limiting distribution, which allows the parameter being tested to be either sparse or dense. We show that under certain regularity conditions, the type I error of the proposed method is asymptotically correct, and we establish its power under high-dimensional alternatives. Extensive simulations demonstrate the good performance of the proposed test and its robustness when certain sparsity assumptions are violated. We also apply the proposed method to Chinese famine sample data in order to show its performance when testing the significance of gene-environment interactions.
Collapse
Affiliation(s)
- Jinsong Chen
- College of Applied Health Sciences, University of Illinois at Chicago, 1919 W Taylor St, Chicago, Illinois 60612, U.S.A
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Hua Yun Chen
- School of Public Health, University of Illinois at Chicago, 2121 W Taylor St, Chicago, Illinois 60612, U.S.A
| |
Collapse
|
32
|
Zhang L, Pan H, Liu Z, Gao J, Xu X, Wang L, Wang J, Tang Y, Cao X, Kan Y, Wen Z, Chen J, Huang D, Chen S, Li Y. Multicenter clinical radiomics-integrated model based on [ 18F]FDG PET and multi-modal MRI predict ATRX mutation status in IDH-mutant lower-grade gliomas. Eur Radiol 2023; 33:872-883. [PMID: 35984514 DOI: 10.1007/s00330-022-09043-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/23/2022] [Accepted: 07/01/2022] [Indexed: 02/03/2023]
Abstract
OBJECTIVES To develop a clinical radiomics-integrated model based on 18 F-fluorodeoxyglucose positron emission tomography ([18F]FDG PET) and multi-modal MRI for predicting alpha thalassemia/mental retardation X-linked (ATRX) mutation status of IDH-mutant lower-grade gliomas (LGGs). METHODS One hundred and two patients (47 ATRX mutant-type, 55 ATRX wild-type) diagnosed with IDH-mutant LGGs (CNS WHO grades 1 and 2) were retrospectively enrolled. A total of 5540 radiomics features were extracted from structural MR (sMR) images (contrast-enhanced T1-weighted imaging, CE-T1WI; T2-weighted imaging, and T2WI), functional MR (fMR) images (apparent diffusion coefficient, ADC; cerebral blood volume, CBV), and metabolic PET images ([18F]FDG PET). The random forest algorithm was used to establish a clinical radiomics-integrated model, integrating the optimal multi-modal radiomics model with three clinical parameters. The predictive effectiveness of the models was evaluated by receiver operating characteristic (ROC) and decision curve analysis (DCA). RESULTS The optimal multi-modal model incorporated sMR (CE-T1WI), fMR (ADC), and metabolic ([18F]FDG) images ([18F]FDG PET+ADC+ CE-T1WI) with the area under curves (AUCs) in the training and test groups of 0.971 and 0.962, respectively. The clinical radiomics-integrated model, incorporating [18F]FDG PET+ADC+CE-T1WI, three clinical parameters (KPS, SFSD, and ATGR), showed the best predictive effectiveness in the training and test groups (0.987 and 0.975, respectively). CONCLUSIONS The clinical radiomics-integrated model with metabolic, structural, and functional information based on [18F]FDG PET and multi-modal MRI achieved promising performance for predicting the ATRX mutation status of IDH-mutant LGGs. KEY POINTS • The clinical radiomics-integrated model based on [18F]FDG PET and multi-modal MRI achieved promising performance for predicting ATRX mutation status in LGGs. • The study investigated the value of multicenter clinical radiomics-integrated model based on [18F]FDG PET and multi-modal MRI in LGGs regarding ATRX mutation status prediction. • The integrated model provided structural, functional, and metabolic information simultaneously and demonstrated with satisfactory calibration and discrimination in the training and test groups (0.987 and 0.975, respectively).
Collapse
Affiliation(s)
- Liqiang Zhang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Hongyu Pan
- College of Computer & Information Science, Southwest University, Chongqing, 400715, China
| | - Zhi Liu
- Department of Radiology, Chongqing Hospital of Traditional Chinese Medicine, Chongqing, 400021, China
| | - Jueni Gao
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Xinyi Xu
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Linlin Wang
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Jie Wang
- Department of Nuclear Medicine, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China
| | - Yi Tang
- Molecular Medicine Diagnostic and Testing Center, Chongqing Medical University, Chongqing, China
| | - Xu Cao
- School of Medical and Life Sciences Chengdu University of Traditional Chinese Medicine, Chengdu, 610032, China
| | - Yubo Kan
- Department of Nuclear Medicine, United Medical Imaging Center, Chongqing, 400038, China
| | - Zhipeng Wen
- Department of Radiology, Sichuan Cancer Hospital, Chengdu, 610042, China
| | - Jianjun Chen
- Department of Nuclear Medicine, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing, 400038, China
| | - Dingde Huang
- Department of Nuclear Medicine, Southwest Hospital, Third Military Medical University (Army Medical University), Chongqing, 400038, China.
| | - Shanxiong Chen
- College of Computer & Information Science, Southwest University, Chongqing, 400715, China.
| | - Yongmei Li
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
| |
Collapse
|
33
|
Song M, Lee M, Park T, Park M. MP-LASSO chart: a multi-level polar chart for visualizing group LASSO analysis of genomic data. Genomics Inform 2022; 20:e48. [PMID: 36617655 PMCID: PMC9847381 DOI: 10.5808/gi.22075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 12/19/2022] [Indexed: 12/31/2022] Open
Abstract
Penalized regression has been widely used in genome-wide association studies for jointanalyses to find genetic associations. Among penalized regression models, the least absolute shrinkage and selection operator (Lasso) method effectively removes some coefficientsfrom the model by shrinking them to zero. To handle group structures, such as genes andpathways, several modified Lasso penalties have been proposed, including group Lasso andsparse group Lasso. Group Lasso ensures sparsity at the level of pre-defined groups, eliminating unimportant groups. Sparse group Lasso performs group selection as in group Lasso,but also performs individual selection as in Lasso. While these sparse methods are useful inhigh-dimensional genetic studies, interpreting the results with many groups and coefficients is not straightforward. Lasso's results are often expressed as trace plots of regressioncoefficients. However, few studies have explored the systematic visualization of group information. In this study, we propose a multi-level polar Lasso (MP-Lasso) chart, which caneffectively represent the results from group Lasso and sparse group Lasso analyses. An Rpackage to draw MP-Lasso charts was developed. Through a real-world genetic data application, we demonstrated that our MP-Lasso chart package effectively visualizes the resultsof Lasso, group Lasso, and sparse group Lasso.
Collapse
Affiliation(s)
- Min Song
- Department of Statistics, Korea University, Seoul 02841, Korea
| | - Minhyuk Lee
- Department of Statistics, Korea University, Seoul 02841, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Mira Park
- Department of Preventive Medicine, Eulji University, Daejeon 34824, Korea,Corresponding author: E-mail:
| |
Collapse
|
34
|
Wang X, Liu Y, Li J, Wang G. StackCirRNAPred: computational classification of long circRNA from other lncRNA based on stacking strategy. BMC Bioinformatics 2022; 23:563. [PMID: 36575368 PMCID: PMC9793644 DOI: 10.1186/s12859-022-05118-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND CircRNAs are essential for the regulation of post-transcriptional gene expression, including as miRNA sponges, and play an important role in disease development. Some computational tools have been proposed recently to predict circRNA, since only one classifier is used, there is still much that can be done to improve the performance. RESULTS StackCirRNAPred was proposed, the computational classification of long circRNA from other lncRNA based on stacking strategy. In order to cope with the potential problem that a single feature might not be able to distinguish circRNA well from other lncRNA, we first extracted features from different sources, including nucleic acid composition, sequence spatial features and physicochemical properties, Alu and tandem repeats. We innovatively apply the stacking strategy to integrate the more advantageous classifiers of RF, LightGBM, XGBoost. This allows the model to incorporate these features more flexibly. StackCirRNAPred was found to be significantly better than other tools, with precision, accuracy, F1, recall and MCC of 0.843, 0.833, 0.831, 0.819 and 0.666 respectively. We tested it directly on the mouse dataset. StackCirRNAPred was still significantly better than other methods, with precision, accuracy, F1, recall and MCC of 0.837, 0.839, 0.839, 0.841, 0.677. CONCLUSIONS We proposed StackCirRNAPred based on stacking strategy to distinguish long circRNAs from other lncRNAs. With the test results demonstrating the validity and robustness of StackCirRNAPred, we hope StackCirRNAPred will complement existing circRNA prediction methods and is helpful in down-stream research.
Collapse
Affiliation(s)
- Xin Wang
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Liu
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Li
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Guohua Wang
- grid.19373.3f0000 0001 0193 3564School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
35
|
Fitzgerald T, Jones A, Engelhardt BE. A Poisson reduced-rank regression model for association mapping in sequencing data. BMC Bioinformatics 2022; 23:529. [PMID: 36482321 PMCID: PMC9733401 DOI: 10.1186/s12859-022-05054-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 11/14/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. RESULTS We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. CONCLUSION We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.
Collapse
Affiliation(s)
- Tiana Fitzgerald
- Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Andrew Jones
- Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Barbara E. Engelhardt
- Department of Computer Science, Princeton University, Princeton, NJ USA
- Data Science and Biotechnology Institute, Gladstone Institutes, San Francisco, CA USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA USA
| |
Collapse
|
36
|
Naz M, Benavides-Mendoza A, Tariq M, Zhou J, Wang J, Qi S, Dai Z, Du D. CRISPR/Cas9 technology as an innovative approach to enhancing the phytoremediation: Concepts and implications. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 323:116296. [PMID: 36261968 DOI: 10.1016/j.jenvman.2022.116296] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 09/03/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]
Abstract
Phytoremediation is currently an active field of research focusing chiefly on identifying and characterizing novel and high chelation action super-accumulators. In the last few years, molecular tools have been widely exploited to understand better metal absorption, translocation, cation, and tolerance mechanisms in plants. Recently more advanced CRISPR-Cas9 genome engineering technology is also employed to enhance detoxification efficiency. Further, advances in molecular science will trigger the understanding of adaptive phytoremediation ability plant production in current global warming conditions. The enhanced abilities of nucleases for genome modification can improve plant repair capabilities by modifying the genome, thereby achieving a sustainable ecosystem. The purpose of this manuscript focuses on biotechnology's fundamental principles and application to promote climate-resistant metal plants, especially the CRISPR-Cas9 genome editing system for enhancing the phytoremediation of harmful contamination and pollutants.
Collapse
Affiliation(s)
- Misbah Naz
- Institute of Environment and Ecology, School of the Environment and Safety Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 21201, Jiangsu Province, PR China
| | - Adalberto Benavides-Mendoza
- Department of Horticulture, Autonomous Agricultural University Antonio Narro, 1923 Saltillo, C.P. 25315, Mexico
| | - Muhammad Tariq
- Department of Pharmacology, Lahore Pharmacy College, 54000, Lahore, Pakistan
| | - Jianyu Zhou
- Institute of Environment and Ecology, School of the Environment and Safety Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 21201, Jiangsu Province, PR China
| | - Jiahao Wang
- Key Laboratory of Modern Agricultural Equipment and Technology, Ministry of Education, School of Agricultural Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 21201, Jiangsu Province, PR China
| | - Shanshan Qi
- Key Laboratory of Modern Agricultural Equipment and Technology, Ministry of Education, School of Agricultural Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 21201, Jiangsu Province, PR China
| | - Zhicong Dai
- Institute of Environment and Ecology, School of the Environment and Safety Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 21201, Jiangsu Province, PR China; Jiangsu Collaborative Innovation Center of Technology and Material of Water Treatment, Suzhou University of Science and Technology, 99 Xuefu Road, Suzhou, 215009, Jiangsu Province, PR China.
| | - Daolin Du
- Institute of Environment and Ecology, School of the Environment and Safety Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 21201, Jiangsu Province, PR China
| |
Collapse
|
37
|
Tuo S, Li C, Liu F, Zhu Y, Chen T, Feng Z, Liu H, Li A. A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdiscip Sci 2022; 14:814-832. [PMID: 35788965 DOI: 10.1007/s12539-022-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
MOTIVATION Linear or nonlinear interactions of multiple single-nucleotide polymorphisms (SNPs) play an important role in understanding the genetic basis of complex human diseases. However, combinatorial analytics in high-dimensional space makes it extremely challenging to detect multiorder SNP interactions. Most classic approaches can only perform one task (for detecting k-order SNP interactions) in each run. Since prior knowledge of a complex disease is usually not available, it is difficult to determine the value of k for detecting k-order SNP interactions. METHODS A novel multitasking ant colony optimization algorithm (named MTACO-DMSI) is proposed to detect multiorder SNP interactions, and it is divided into two stages: searching and testing. In the searching stage, multiple multiorder SNP interaction detection tasks (from 2nd-order to kth-order) are executed in parallel, and two subpopulations that separately adopt the Bayesian network-based K2-score and Jensen-Shannon divergence (JS-score) as evaluation criteria are generated for each task to improve the global search capability and the discrimination ability for various disease models. In the testing stage, the G test statistical test is adopted to further verify the authenticity of candidate solutions to reduce the error rate. RESULT Three multiorder simulated disease models with different interaction effects and three real age-related macular degeneration (AMD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) datasets were used to investigate the performance of the proposed MTACO-DMSI. The experimental results show that the MTACO-DMSI has a faster search speed and higher discriminatory power for diverse simulation disease models than traditional single-task algorithms. The results on real AMD data and RA and T1D datasets indicate that MTACO-DMSI has the ability to detect multiorder SNP interactions at a genome-wide scale. Availability and implementation: https://github.com/shouhengtuo/MTACO-DMSI/.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China.
| | - Chao Li
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Fan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - YanLing Zhu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - TianRui Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - ZengYu Feng
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Aimin Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
38
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments estimators for complex phenotype prediction. Bioinformatics 2022; 38:5222-5228. [PMID: 36205617 DOI: 10.1093/bioinformatics/btac659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/27/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Linear mixed models (LMMs) have long been the method of choice for risk prediction analysis on high-dimensional data. However, it remains computationally challenging to simultaneously model a large amount of variants that can be noise or have predictive effects of complex forms. RESULTS In this work, we have developed a penalized LMM with generalized method of moments (pLMMGMM) estimators for prediction analysis. pLMMGMM is built within the LMM framework, where random effects are used to model the joint predictive effects from all variants within a region. Different from existing methods that focus on linear relationships and use empirical criteria for variable screening, pLMMGMM can efficiently detect regions that harbor genetic variants with both linear and non-linear predictive effects. In addition, unlike existing LMMs that can only handle a very limited number of random effects, pLMMGMM is much less computationally demanding. It can jointly consider a large number of regions and accurately detect those that are predictive. Through theoretical investigations, we have shown that our method has the selection consistency and asymptotic normality. Through extensive simulations and the analysis of PET-imaging outcomes, we have demonstrated that pLMMGMM outperformed existing models and it can accurately detect regions that harbor risk factors with various forms of predictive effects. AVAILABILITY AND IMPLEMENTATION The R-package is available at https://github.com/XiaQiong/GMMLasso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
39
|
Kim M, Lee JH, Joo L, Jeong B, Kim S, Ham S, Yun J, Kim N, Chung SR, Choi YJ, Baek JH, Lee JY, Kim JH. Development and Validation of a Model Using Radiomics Features from an Apparent Diffusion Coefficient Map to Diagnose Local Tumor Recurrence in Patients Treated for Head and Neck Squamous Cell Carcinoma. Korean J Radiol 2022; 23:1078-1088. [PMID: 36126954 PMCID: PMC9614290 DOI: 10.3348/kjr.2022.0299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 07/25/2022] [Accepted: 08/17/2022] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVE To develop and validate a model using radiomics features from apparent diffusion coefficient (ADC) map to diagnose local tumor recurrence in head and neck squamous cell carcinoma (HNSCC). MATERIALS AND METHODS This retrospective study included 285 patients (mean age ± standard deviation, 62 ± 12 years; 220 male, 77.2%), including 215 for training (n = 161) and internal validation (n = 54) and 70 others for external validation, with newly developed contrast-enhancing lesions at the primary cancer site on the surveillance MRI following definitive treatment of HNSCC between January 2014 and October 2019. Of the 215 and 70 patients, 127 and 34, respectively, had local tumor recurrence. Radiomics models using radiomics scores were created separately for T2-weighted imaging (T2WI), contrast-enhanced T1-weighted imaging (CE-T1WI), and ADC maps using non-zero coefficients from the least absolute shrinkage and selection operator in the training set. Receiver operating characteristic (ROC) analysis was used to evaluate the diagnostic performance of each radiomics score and known clinical parameter (age, sex, and clinical stage) in the internal and external validation sets. RESULTS Five radiomics features from T2WI, six from CE-T1WI, and nine from ADC maps were selected and used to develop the respective radiomics models. The area under ROC curve (AUROC) of ADC radiomics score was 0.76 (95% confidence interval [CI], 0.62-0.89) and 0.77 (95% CI, 0.65-0.88) in the internal and external validation sets, respectively. These were significantly higher than the AUROC values of T2WI (0.53 [95% CI, 0.40-0.67], p = 0.006), CE-T1WI (0.53 [95% CI, 0.40-0.67], p = 0.012), and clinical parameters (0.53 [95% CI, 0.39-0.67], p = 0.021) in the external validation set. CONCLUSION The radiomics model using ADC maps exhibited higher diagnostic performance than those of the radiomics models using T2WI or CE-T1WI and clinical parameters in the diagnosis of local tumor recurrence in HNSCC following definitive treatment.
Collapse
Affiliation(s)
- Minjae Kim
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.,Department of Radiology and Research Institute of Radiological Science and Center for Clinical Imaging Data Science, Yonsei University College of Medicine, Seoul, Korea
| | - Jeong Hyun Lee
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Leehi Joo
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Boryeong Jeong
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Seonok Kim
- Department of Clinical Epidemiology and Biostatistics, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Sungwon Ham
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Jihye Yun
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - NamKug Kim
- Department of Convergence Medicine, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Sae Rom Chung
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Young Jun Choi
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Jung Hwan Baek
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Ji Ye Lee
- Department of Radiology, Seoul National University Hospital, Seoul, Korea.,Department of Radiology, Seoul National University College of Medicine, Seoul, Korea
| | - Ji-hoon Kim
- Department of Radiology, Seoul National University Hospital, Seoul, Korea.,Department of Radiology, Seoul National University College of Medicine, Seoul, Korea
| |
Collapse
|
40
|
Wang MJ, Song Y, Guo XQ, Wei D, Cao XT, Sun Y, Xu YG, Hu XM. The Construction of ITP Diagnostic Modeling Based on the Expressions of Hub Genes Associated with M1 Polarization of Macrophages. J Inflamm Res 2022; 15:5905-5915. [PMID: 36274827 PMCID: PMC9581081 DOI: 10.2147/jir.s364414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 09/06/2022] [Indexed: 11/07/2022] Open
Abstract
Purpose Primary immune thrombocytopenia (ITP) is an immune disease with a diagnosis of exclusion, since no validated biomarkers have been identified. In this study, we explored biomarkers associated with the development of ITP from an immune perspective to inform the clinical diagnosis. Patients and Methods Differentially expressed genes (DEGs) between normal and ITP samples were analyzed using limma package. Random forest algorithm and LASSO regression were further used to screen for DEGs associated with ITP. The expression of these hub genes was validated by PCR. The relationship between DEGs and immunity was explored by enrichment analysis. Immune cell infiltration in ITP was analyzed by CIBERSORT and ssGSEA, and the relationship between DEGs and infiltrating immune cells was analyzed by Spearman’s rank correlation analysis. Finally, a diagnostic model related to DEGs was constructed by the neural network, and its efficiency was detected by the ROC curve. Results After screening the GEO database and validation by PCR analysis, The expression of CTH and TAF8 were higher and while OSBP2 expression was lower in ITP patients compared to normal subjects (P<0.05). GO enrichment analysis showed that these DEGs were associated with inflammatory immune-related diseases, and KEGG analysis showed that they mainly regulated signaling pathways such as JAK-STAT. CIBERSORT and ssGSEA analyses showed that these DEGs were mainly associated with macrophage M1 polarization. The expression of CTH and TAF8 were positively correlated with M1 expression, while OSBP2 was negatively correlated with M1 expression. The ROC curve showed high accuracy of the neural network model [AUC= 0.939, 95% CI (0.8–1)]. Conclusion Our results suggest that CTH, TAF8, and OSBP2 can be used as effective diagnostic biomarkers of ITP.
Collapse
Affiliation(s)
- Ming-Jing Wang
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China,Graduate School, China Academy of Chinese Medical Sciences, Beijing, 100700, People’s Republic of China
| | - Ying Song
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China,Graduate School, China Academy of Chinese Medical Sciences, Beijing, 100700, People’s Republic of China
| | - Xiao-Qing Guo
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China
| | - Diu Wei
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China,Graduate School, China Academy of Chinese Medical Sciences, Beijing, 100700, People’s Republic of China
| | - Xin-Tian Cao
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China,Graduate School, Beijing University of Chinese Medicine, Beijing, 100029, People’s Republic of China
| | - Yan Sun
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China,Graduate School, Beijing University of Chinese Medicine, Beijing, 100029, People’s Republic of China
| | - Yong-Gang Xu
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China
| | - Xiao-Mei Hu
- Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, People’s Republic of China,Correspondence: Xiao-Mei Hu; Yong-Gang Xu, Department of Hematology, Xiyuan Hospital, China Academy of Chinese Medical Sciences, No. 1 Xiyuancaochang, Haidian District, Beijing, 100091, People’s Republic of China, Tel +86 010-6283-5361, Email ;
| |
Collapse
|
41
|
Identification and Validation of an Inflammatory Response-Related Polygenic Risk Score as a Prognostic Marker in Hepatocellular Carcinoma. DISEASE MARKERS 2022; 2022:1739995. [PMID: 36212175 PMCID: PMC9534708 DOI: 10.1155/2022/1739995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 08/16/2022] [Indexed: 11/23/2022]
Abstract
Aims We hypothesized that the expression patterns of inflammatory response-related genes may be a potential tool for hepatocellular carcinoma (HCC) risk scoring. Background Inflammatory response plays a pivotal role in the pathogenesis of HCC. Objective To establish and validate a hallmark inflammatory response gene-based polygenic risk score as a prognostic tool in HCC. Methods We screened differentially expressed inflammatory response genes and established an inflammatory response-related polygenic risk score (IRPRS) in an HCC-related dataset. Patients with HCC were categorized into high- and low-risk groups according to the median IRPRS, and the overall survival between the two groups was compared. The IRPRS was validated in an independent external dataset. Tumor-infiltrating lymphocytes (TILs) in high- and low-risk groups were compared, and gene set enrichment analysis was performed to characterize high-risk HCC identified using this IRPRS. Results Four differentially expressed hallmark inflammatory response genes (CD14, AQP9, SERPINE1, and ITGA5) were identified to construct the IRPRS. Patients in the high-risk group had significantly shorter overall survival than those in the low-risk group in both the training set and the test set. Furthermore, the IRPRS remained an independent prognostic factor compared to the routine clinicopathological characteristics. Many cancer-related hallmark gene sets and TILs were significantly enriched in the high-risk group. Conclusions We established and validated a four-hallmark inflammatory response gene-based polygenic risk score, which could successfully divide patients with HCC into high-risk and low-risk groups. These two risk groups of HCC possess significantly distinct prognostic and biological characteristics.
Collapse
|
42
|
Ghosh A, Jaenada M, Pardo L. Classification of COVID19 Patients Using Robust Logistic Regression. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2022; 16:67. [PMID: 36164412 PMCID: PMC9491676 DOI: 10.1007/s42519-022-00295-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/02/2022] [Indexed: 10/29/2022]
Abstract
Coronavirus disease 2019 (COVID19) has triggered a global pandemic affecting millions of people. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causing the COVID-19 disease is hypothesized to gain entry into humans via the airway epithelium, where it initiates a host response. The expression levels of genes at the upper airway that interact with the SARS-CoV-2 could be a telltale sign of virus infection. However, gene expression data have been flagged as suspicious of containing different contamination errors via techniques for extracting such information, and clinical diagnosis may contain labelling errors due to the specificity and sensitivity of diagnostic tests. We propose to fit the regularized logistic regression model as a classifier for COVID-19 diagnosis, which simultaneously identifies genes related to the disease and predicts the COVID-19 cases based on the expression values of the selected genes. We apply a robust estimating methods based on the density power divergence to obtain stable results ignoring the effects of contamination or labelling errors in the data and compare its performance with respect to the classical maximum likelihood estimator with different penalties, including the LASSO and the general adaptive LASSO penalties.
Collapse
Affiliation(s)
- Abhik Ghosh
- Indian Statistical Institute, Kolkata, India
| | - María Jaenada
- Department of Statistics and O.R., Complutense University of Madrid, Madrid, Spain
| | - Leandro Pardo
- Department of Statistics and O.R., Complutense University of Madrid, Madrid, Spain
| |
Collapse
|
43
|
Survival Analysis with High-Dimensional Omics Data Using a Threshold Gradient Descent Regularization-Based Neural Network Approach. Genes (Basel) 2022; 13:genes13091674. [PMID: 36140842 PMCID: PMC9498566 DOI: 10.3390/genes13091674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 09/13/2022] [Accepted: 09/16/2022] [Indexed: 11/17/2022] Open
Abstract
Analysis of data with a censored survival response and high-dimensional omics measurements is now common. Most of the existing analyses are based on specific (semi)parametric models, in particular the Cox model. Such analyses may be limited by not having sufficient flexibility, for example, in accommodating nonlinearity. For categorical and continuous responses, neural networks (NNs) have provided a highly competitive alternative. Comparatively, NNs for censored survival data remain limited. Omics measurements are usually high-dimensional, and only a small subset is expected to be survival-associated. As such, regularized estimation and selection are needed. In the existing NN studies, this is usually achieved via penalization. In this article, we propose adopting the threshold gradient descent regularization (TGDR) technique, which has competitive performance (for example, when compared to penalization) and unique advantages in regression analysis, but has not been adopted with NNs. The TGDR-based NN has a highly sensible formulation and an architecture different from the unregularized and penalization-based ones. Simulations show its satisfactory performance. Its practical effectiveness is further established via the analysis of two cancer omics datasets. Overall, this study can provide a practical and useful new way in the NN paradigm for survival analysis with high-dimensional omics measurements.
Collapse
|
44
|
Yu L, Gong C. Pancancer analysis of a potential gene mutation model in the prediction of immunotherapy outcomes. Front Genet 2022; 13:917118. [PMID: 36092890 PMCID: PMC9459043 DOI: 10.3389/fgene.2022.917118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 07/27/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Immune checkpoint blockade (ICB) represents a promising treatment for cancer, but predictive biomarkers are needed. We aimed to develop a cost-effective signature to predict immunotherapy benefits across cancers.Methods: We proposed a study framework to construct the signature. Specifically, we built a multivariate Cox proportional hazards regression model with LASSO using 80% of an ICB-treated cohort (n = 1661) from MSKCC. The desired signature named SIGP was the risk score of the model and was validated in the remaining 20% of patients and an external ICB-treated cohort (n = 249) from DFCI.Results: SIGP was based on 18 candidate genes (NOTCH3, CREBBP, RNF43, PTPRD, FAM46C, SETD2, PTPRT, TERT, TET1, ROS1, NTRK3, PAK7, BRAF, LATS1, IL7R, VHL, TP53, and STK11), and we classified patients into SIGP high (SIGP-H), SIGP low (SIGP-L) and SIGP wild type (SIGP-WT) groups according to the SIGP score. A multicohort validation demonstrated that patients in SIGP-L had significantly longer overall survival (OS) in the context of ICB therapy than those in SIGP-WT and SIGP-H (44.00 months versus 13.00 months and 14.00 months, p < 0.001 in the test set). The survival of patients grouped by SIGP in non-ICB-treated cohorts was different, and SIGP-WT performed better than the other groups. In addition, SIGP-L + TMB-L (approximately 15% of patients) had similar survivals to TMB-H, and patients with both SIGP-L and TMB-H had better survival. Further analysis on tumor-infiltrating lymphocytes demonstrated that the SIGP-L group had significantly increased abundances of CD8+ T cells.Conclusion: Our proposed model of the SIGP signature based on 18-gene mutations has good predictive value for the clinical benefit of ICB in pancancer patients. Additional patients without TMB-H were identified by SIGP as potential candidates for ICB, and the combination of both signatures showed better performance than the single signature.
Collapse
Affiliation(s)
- Lishan Yu
- Yanqi Lake Beijing Institute Mathematical Sciences and Applications, Beijing, China
- Yau Mathematical Sciences Center, Tsinghua University, Beijing, China
| | - Caifeng Gong
- Department of Medical Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- *Correspondence: Caifeng Gong,
| |
Collapse
|
45
|
Feronato SG, Silva MLM, Izbicki R, Farias TDJ, Shigunov P, Dallagiovanna B, Passetti F, dos Santos HG. Selecting Genetic Variants and Interactions Associated with Amyotrophic Lateral Sclerosis: A Group LASSO Approach. J Pers Med 2022; 12:jpm12081330. [PMID: 36013279 PMCID: PMC9410070 DOI: 10.3390/jpm12081330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/10/2022] [Accepted: 08/12/2022] [Indexed: 11/16/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a multi-system neurodegenerative disease that affects both upper and lower motor neurons, resulting from a combination of genetic, environmental, and lifestyle factors. Usually, the association between single-nucleotide polymorphisms (SNPs) and this disease is tested individually, which leads to the testing of multiple hypotheses. In addition, this classical approach does not support the detection of interaction-dependent SNPs. We applied a two-step procedure to select SNPs and pairwise interactions associated with ALS. SNP data from 276 ALS patients and 268 controls were analyzed by a two-step group LASSO in 2000 iterations. In the first step, we fitted a group LASSO model to a bootstrap sample and a random subset of predictors (25%) from the original data set aiming to screen for important SNPs and, in the second step, we fitted a hierarchical group LASSO model to evaluate pairwise interactions. An in silico analysis was performed on a set of variables, which were prioritized according to their bootstrap selection frequency. We identified seven SNPs (rs16984239, rs10459680, rs1436918, rs1037666, rs4552942, rs10773543, and rs2241493) and two pairwise interactions (rs16984239:rs2118657 and rs16984239:rs3172469) potentially involved in nervous system conservation and function. These results may contribute to the understanding of ALS pathogenesis, its diagnosis, and therapeutic strategy improvement.
Collapse
Affiliation(s)
| | | | - Rafael Izbicki
- Department of Statistics, Universidade Federal de São Carlos, São Carlos 13565-905, Brazil
| | - Ticiana D. J. Farias
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
- Division of Biomedical Informatics, Department of Immunology and Microbiology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Patrícia Shigunov
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
| | | | - Fabio Passetti
- Instituto Carlos Chagas, Fundação Oswaldo Cruz, Curitiba 81310-020, Brazil
| | | |
Collapse
|
46
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments for prediction analysis on high-dimensional multi-omics data. Brief Bioinform 2022; 23:bbac193. [PMID: 35649346 PMCID: PMC9310531 DOI: 10.1093/bib/bbac193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 04/27/2022] [Indexed: 11/13/2022] Open
Abstract
With the advances in high-throughput biotechnologies, high-dimensional multi-layer omics data become increasingly available. They can provide both confirmatory and complementary information to disease risk and thus have offered unprecedented opportunities for risk prediction studies. However, the high-dimensionality and complex inter/intra-relationships among multi-omics data have brought tremendous analytical challenges. Here we present a computationally efficient penalized linear mixed model with generalized method of moments estimator (MpLMMGMM) for the prediction analysis on multi-omics data. Our method extends the widely used linear mixed model proposed for genomic risk predictions to model multi-omics data, where kernel functions are used to capture various types of predictive effects from different layers of omics data and penalty terms are introduced to reduce the impact of noise. Compared with existing penalized linear mixed models, the proposed method adopts the generalized method of moments estimator and it is much more computationally efficient. Through extensive simulation studies and the analysis of positron emission tomography imaging outcomes, we have demonstrated that MpLMMGMM can simultaneously consider a large number of variables and efficiently select those that are predictive from the corresponding omics layers. It can capture both linear and nonlinear predictive effects and achieves better prediction performance than competing methods.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, 1010, Auckland, New Zealand
| |
Collapse
|
47
|
Accurate Evaluation of Feature Contributions for Sentinel Lymph Node Status Classification in Breast Cancer. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The current guidelines recommend the sentinel lymph node biopsy to evaluate the lymph node involvement for breast cancer patients with clinically negative lymph nodes on clinical or radiological examination. Machine learning (ML) models have significantly improved the prediction of lymph nodes status based on clinical features, thus avoiding expensive, time-consuming and invasive procedures. However, the classification of sentinel lymph node status represents a typical example of an unbalanced classification problem. In this work, we developed a ML framework to explore the effects of unbalanced populations on the performance and stability of feature ranking for sentinel lymph node status classification in breast cancer. Our results indicate state-of-the-art AUC (Area under the Receiver Operating Characteristic curve) values on a hold-out set (67%) while providing particularly stable features related to tumor size, histological subtype and estrogen receptor expression, which should therefore be considered as potential biomarkers.
Collapse
|
48
|
Yoo JE, Rho M. Large-Scale Survey Data Analysis with Penalized Regression: A Monte Carlo Simulation on Missing Categorical Predictors. MULTIVARIATE BEHAVIORAL RESEARCH 2022; 57:642-657. [PMID: 33703972 DOI: 10.1080/00273171.2021.1891856] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
With the advent of the big data era, machine learning methods have evolved and proliferated. This study focused on penalized regression, a procedure that builds interpretive prediction models among machine learning methods. In particular, penalized regression coupled with large-scale data can explore hundreds or thousands of variables in one statistical model without convergence problems and identify yet uninvestigated important predictors. As one of the first Monte Carlo simulation studies to investigate predictive modeling with missing categorical predictors in the context of social science research, this study endeavored to emulate real social science large-scale data. Likert-scaled variables were simulated as well as multiple-category and count variables. Due to the inclusion of the categorical predictors in modeling, penalized regression methods that consider the grouping effect were employed such as group Mnet. We also examined the applicability of the simulation conditions with a real large-scale dataset that the simulation study referenced. Particularly, the study presented selection counts of variables after multiple iterations of modeling in order to consider the bias resulting from data-splitting in model validation. Selection counts turned out to be a necessary tool when variable selection is of research interest. Efforts to utilize large-scale data to the fullest appear to offer a valid approach to mitigate the effect of nonignorable missingness. Overall, penalized regression which assumes linearity is a viable method to analyze social science large-scale survey data.
Collapse
Affiliation(s)
- Jin Eun Yoo
- Department of Education, Korea National University of Education
| | - Minjeong Rho
- Department of Education, Korea National University of Education
| |
Collapse
|
49
|
Chen Z, Lu Y, Cao B, Zhang W, Edwards A, Zhang K. Driver gene detection through Bayesian network integration of mutation and expression profiles. Bioinformatics 2022; 38:2781-2790. [PMID: 35561191 PMCID: PMC9113331 DOI: 10.1093/bioinformatics/btac203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 03/12/2022] [Accepted: 04/06/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The identification of mutated driver genes and the corresponding pathways is one of the primary goals in understanding tumorigenesis at the patient level. Integration of multi-dimensional genomic data from existing repositories, e.g., The Cancer Genome Atlas (TCGA), offers an effective way to tackle this issue. In this study, we aimed to leverage the complementary genomic information of individuals and create an integrative framework to identify cancer-related driver genes. Specifically, based on pinpointed differentially expressed genes, variants in somatic mutations and a gene interaction network, we proposed an unsupervised Bayesian network integration (BNI) method to detect driver genes and estimate the disease propagation at the patient and/or cohort levels. This new method first captures inherent structural information to construct a functional gene mutation network and then extracts the driver genes and their controlled downstream modules using the minimum cover subset method. RESULTS Using other credible sources (e.g. Cancer Gene Census and Network of Cancer Genes), we validated the driver genes predicted by the BNI method in three TCGA pan-cancer cohorts. The proposed method provides an effective approach to address tumor heterogeneity faced by personalized medicine. The pinpointed drivers warrant further wet laboratory validation. AVAILABILITY AND IMPLEMENTATION The supplementary tables and source code can be obtained from https://xavieruniversityoflouisiana.sharefile.com/d-se6df2c8d0ebe4800a3030311efddafe5. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhong Chen
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - You Lu
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Bo Cao
- Division of Basic and Pharmaceutical Sciences, College of Pharmacy, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Wensheng Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Andrea Edwards
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Kun Zhang
- To whom correspondence should be addressed
| |
Collapse
|
50
|
Xu T, Jin T, Lu X, Pan Z, Tan Z, Zheng C, Liu Y, Hu X, Ba L, Ren H, Chen J, Zhu C, Ge M, Huang P. A signature of circadian rhythm genes in driving anaplastic thyroid carcinoma malignant progression. Cell Signal 2022; 95:110332. [DOI: 10.1016/j.cellsig.2022.110332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 04/07/2022] [Accepted: 04/11/2022] [Indexed: 01/02/2023]
|