1
|
Gomez-Ochoa SA, Lanzer JD, Levinson RT. Disease Network-Based Approaches to Study Comorbidity in Heart Failure: Current State and Future Perspectives. Curr Heart Fail Rep 2024; 22:6. [PMID: 39725810 DOI: 10.1007/s11897-024-00693-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/26/2024] [Indexed: 12/28/2024]
Abstract
PURPOSE OF REVIEW Heart failure (HF) is often accompanied by a constellation of comorbidities, leading to diverse patient presentations and clinical trajectories. While traditional methods have provided valuable insights into our understanding of HF, network medicine approaches seek to leverage these complex relationships by analyzing disease at a systems level. This review introduces the concepts of network medicine and explores the use of comorbidity networks to study HF and heart disease. RECENT FINDINGS Comorbidity networks are used to understand disease trajectories, predict outcomes, and uncover potential molecular mechanisms through identification of genes and pathways relevant to comorbidity. These networks have shown the importance of non-cardiovascular comorbidities to the clinical journey of patients with HF. However, the community should be aware of important limitations in developing and implementing these methods. Network approaches hold promise for unraveling the impact of comorbidities in the complex presentation and genetics of HF. Methods that consider comorbidity presence and timing have the potential to help optimize management strategies and identify pathophysiological mechanisms.
Collapse
Affiliation(s)
- Sergio Alejandro Gomez-Ochoa
- Department of General Internal Medicine and Psychosomatics, Heidelberg University Hospital, Im Neuenheimer Feld 410, 69120, Heidelberg, Germany
| | - Jan D Lanzer
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University, Heidelberg University Hospital, Heidelberg, Germany
| | - Rebecca T Levinson
- Department of General Internal Medicine and Psychosomatics, Heidelberg University Hospital, Im Neuenheimer Feld 410, 69120, Heidelberg, Germany.
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University, Heidelberg University Hospital, Heidelberg, Germany.
| |
Collapse
|
2
|
Moore MR, DeClouette B, Wolfe I, Kingery MT, Sandoval-Hernandez C, Isber R, Kirsch T, Strauss EJ. Levels of Synovial Fluid Inflammatory Biomarkers on Day of Arthroscopic Partial Meniscectomy Predict Long-Term Outcomes and Conversion to TKA: A 10-Year Mean Follow-up Study. J Bone Joint Surg Am 2024; 106:2330-2337. [PMID: 39264991 DOI: 10.2106/jbjs.23.01392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/14/2024]
Abstract
BACKGROUND The purpose of the present study was to evaluate the relationships of the concentrations of pro- and anti-inflammatory biomarkers in the knee synovial fluid at the time of arthroscopic partial meniscectomy (APM) to long-term patient-reported outcomes (PROs) and conversion to total knee arthroplasty (TKA). METHODS A database of patients who underwent APM for isolated meniscal injury was analyzed. Synovial fluid had been aspirated from the operatively treated knee prior to the surgical incision, and concentrations of pro- and anti-inflammatory biomarkers (RANTES, IL-6, MCP-1, MIP-1β, VEGF, TIMP-1, TIMP-2, IL-1RA, MMP-3, and bFGF) were quantified. Prior to surgery and again at the time of final follow-up, patients were asked to complete a survey that included a visual analog scale (VAS) for pain and Lysholm, Tegner, and Knee injury and Osteoarthritis Outcome Score-Physical Function Short Form (KOOS-PS) questionnaires. Clustering analysis of the 10 biomarkers of interest was carried out with the k-means algorithm. RESULTS Of the 82 patients who met the inclusion criteria for the study, 59 had not undergone subsequent ipsilateral TKA or APM, and 43 (73%) of the 59 completed PRO questionnaires at long-term follow-up. The mean follow-up time was 10.6 ± 1.3 years (range, 8.7 to 12.4 years). Higher concentrations of individual pro-inflammatory biomarkers including MCP-1 (β = 13.672, p = 0.017) and MIP-1β (β = -0.385, p = 0.012) were associated with worse VAS pain and Tegner scores, respectively. K-means clustering analysis separated the cohort of 82 patients into 2 groups, one with exclusively higher levels of pro-inflammatory biomarkers than the second group. The "pro-inflammatory phenotype" cohort had a significantly higher VAS pain score (p = 0.024) and significantly lower Lysholm (p = 0.022), KOOS-PS (p = 0.047), and Tegner (p = 0.009) scores at the time of final follow-up compared with the "anti-inflammatory phenotype" cohort. The rate of conversion to TKA was higher in the pro-inflammatory cohort (29.4% versus 12.2%, p = 0.064). Logistic regression analysis demonstrated that the pro-inflammatory phenotype was significantly correlated with conversion to TKA (odds ratio = 7.220, 95% confidence interval = 1.028 to 50.720, p = 0.047). CONCLUSIONS The concentrations of synovial fluid biomarkers on the day of APM can be used to cluster patients into pro- and anti-inflammatory cohorts that are predictive of PROs and conversion to TKA at long-term follow-up. LEVEL OF EVIDENCE Prognostic Level III . See Instructions for Authors for a complete description of levels of evidence.
Collapse
Affiliation(s)
- Michael R Moore
- NYU Langone Orthopedic Hospital, NYU Langone Health, New York, NY
| | | | - Isabel Wolfe
- NYU Langone Orthopedic Hospital, NYU Langone Health, New York, NY
| | | | | | - Ryan Isber
- NYU Langone Orthopedic Hospital, NYU Langone Health, New York, NY
| | - Thorsten Kirsch
- NYU Langone Orthopedic Hospital, NYU Langone Health, New York, NY
- Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY
| | - Eric J Strauss
- NYU Langone Orthopedic Hospital, NYU Langone Health, New York, NY
| |
Collapse
|
3
|
Breimann S, Frishman D. AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales. BIOINFORMATICS ADVANCES 2024; 4:vbae165. [PMID: 39544628 PMCID: PMC11562964 DOI: 10.1093/bioadv/vbae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/10/2024] [Accepted: 10/23/2024] [Indexed: 11/17/2024]
Abstract
Summary Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications. Availability and implementation The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.
Collapse
Affiliation(s)
- Stephan Breimann
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
- Division of Metabolic Biochemistry, Biomedical Center (BMC), LMU Munich, Munich, 81377, Germany
- Biochemistry of γ-Secretase, German Center for Neurodegenerative Diseases (DZNE), Munich, 81377, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
| |
Collapse
|
4
|
Mei H, Peng J, Wang T, Zhou T, Zhao H, Zhang T, Yang Z. Overcoming the Limits of Cross-Sensitivity: Pattern Recognition Methods for Chemiresistive Gas Sensor Array. NANO-MICRO LETTERS 2024; 16:269. [PMID: 39141168 PMCID: PMC11324646 DOI: 10.1007/s40820-024-01489-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 07/21/2024] [Indexed: 08/15/2024]
Abstract
As information acquisition terminals for artificial olfaction, chemiresistive gas sensors are often troubled by their cross-sensitivity, and reducing their cross-response to ambient gases has always been a difficult and important point in the gas sensing area. Pattern recognition based on sensor array is the most conspicuous way to overcome the cross-sensitivity of gas sensors. It is crucial to choose an appropriate pattern recognition method for enhancing data analysis, reducing errors and improving system reliability, obtaining better classification or gas concentration prediction results. In this review, we analyze the sensing mechanism of cross-sensitivity for chemiresistive gas sensors. We further examine the types, working principles, characteristics, and applicable gas detection range of pattern recognition algorithms utilized in gas-sensing arrays. Additionally, we report, summarize, and evaluate the outstanding and novel advancements in pattern recognition methods for gas identification. At the same time, this work showcases the recent advancements in utilizing these methods for gas identification, particularly within three crucial domains: ensuring food safety, monitoring the environment, and aiding in medical diagnosis. In conclusion, this study anticipates future research prospects by considering the existing landscape and challenges. It is hoped that this work will make a positive contribution towards mitigating cross-sensitivity in gas-sensitive devices and offer valuable insights for algorithm selection in gas recognition applications.
Collapse
Affiliation(s)
- Haixia Mei
- Key Lab Intelligent Rehabil & Barrier Free Disable (Ministry of Education), Changchun University, Changchun, 130022, People's Republic of China
| | - Jingyi Peng
- Key Lab Intelligent Rehabil & Barrier Free Disable (Ministry of Education), Changchun University, Changchun, 130022, People's Republic of China
| | - Tao Wang
- Shanghai Key Laboratory of Intelligent Sensing and Detection Technology, School of Mechanical and Power Engineering, East China University of Science and Technology, Shanghai, 200237, People's Republic of China.
| | - Tingting Zhou
- State Key Laboratory of Integrated Optoelectronics, College of Electronic Science and Engineering, Jilin University, Changchun, 130012, People's Republic of China
| | - Hongran Zhao
- State Key Laboratory of Integrated Optoelectronics, College of Electronic Science and Engineering, Jilin University, Changchun, 130012, People's Republic of China
| | - Tong Zhang
- State Key Laboratory of Integrated Optoelectronics, College of Electronic Science and Engineering, Jilin University, Changchun, 130012, People's Republic of China.
| | - Zhi Yang
- National Key Laboratory of Advanced Micro and Nano Manufacture Technology, Department of Micro/Nano Electronics, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, People's Republic of China.
| |
Collapse
|
5
|
Hsiang JC, Shen N, Soto F, Kerschensteiner D. Distributed feature representations of natural stimuli across parallel retinal pathways. Nat Commun 2024; 15:1920. [PMID: 38429280 PMCID: PMC10907388 DOI: 10.1038/s41467-024-46348-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 02/22/2024] [Indexed: 03/03/2024] Open
Abstract
How sensory systems extract salient features from natural environments and organize them across neural pathways is unclear. Combining single-cell and population two-photon calcium imaging in mice, we discover that retinal ON bipolar cells (second-order neurons of the visual system) are divided into two blocks of four types. The two blocks distribute temporal and spatial information encoding, respectively. ON bipolar cell axons co-stratify within each block, but separate laminarly between them (upper block: diverse temporal, uniform spatial tuning; lower block: diverse spatial, uniform temporal tuning). ON bipolar cells extract temporal and spatial features similarly from artificial and naturalistic stimuli. In addition, they differ in sensitivity to coherent motion in naturalistic movies. Motion information is distributed across ON bipolar cells in the upper and the lower blocks, multiplexed with temporal and spatial contrast, independent features of natural scenes. Comparing the responses of different boutons within the same arbor, we find that axons of all ON bipolar cell types function as computational units. Thus, our results provide insights into the visual feature extraction from naturalistic stimuli and reveal how structural and functional organization cooperate to generate parallel ON pathways for temporal and spatial information in the mammalian retina.
Collapse
Affiliation(s)
- Jen-Chun Hsiang
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Ning Shen
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Florentina Soto
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Daniel Kerschensteiner
- Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, MO, 63110, USA.
- Department of Neuroscience, Washington University School of Medicine, St. Louis, MO, 63110, USA.
- Department of Biomedical Engineering, Washington University School of Medicine, St. Louis, MO, 63110, USA.
| |
Collapse
|
6
|
Mancheno-Ferris A, Immarigeon C, Rivero A, Depierre D, Schickele N, Fosseprez O, Chanard N, Aughey G, Lhoumaud P, Anglade J, Southall T, Plaza S, Payre F, Cuvier O, Polesello C. Crosstalk between chromatin and Shavenbaby defines transcriptional output along the Drosophila intestinal stem cell lineage. iScience 2024; 27:108624. [PMID: 38174321 PMCID: PMC10762455 DOI: 10.1016/j.isci.2023.108624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 07/05/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
The transcription factor Shavenbaby (Svb), the only member of the OvoL family in Drosophila, controls the fate of various epithelial embryonic cells and adult stem cells. Post-translational modification of Svb produces two protein isoforms, Svb-ACT and Svb-REP, which promote adult intestinal stem cell renewal or differentiation, respectively. To define Svb mode of action, we used engineered cell lines and develop an unbiased method to identify Svb target genes across different contexts. Within a given cell type, Svb-ACT and Svb-REP antagonistically regulate the expression of a set of target genes, binding specific enhancers whose accessibility is constrained by chromatin landscape. Reciprocally, Svb-REP can influence local chromatin marks of active enhancers to help repressing target genes. Along the intestinal lineage, the set of Svb target genes progressively changes, together with chromatin accessibility. We propose that Svb-ACT-to-REP transition promotes enterocyte differentiation of intestinal stem cells through direct gene regulation and chromatin remodeling.
Collapse
Affiliation(s)
- Alexandra Mancheno-Ferris
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Clément Immarigeon
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Alexia Rivero
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - David Depierre
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Naomi Schickele
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Olivier Fosseprez
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Nicolas Chanard
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Gabriel Aughey
- Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK
| | - Priscilla Lhoumaud
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
- Institut Jacques Monod, Université Paris Cité/CNRS, 15 rue Hélène Brion, 75205 Paris Cedex 13, France
| | - Julien Anglade
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Tony Southall
- Imperial College London, Sir Ernst Chain Building, South Kensington Campus, London SW7 2AZ, UK
| | - Serge Plaza
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Laboratoire de Recherche en Sciences Végétales, CNRS/UPS/INPT, 31320 Auzeville-Tolosane, France
| | - François Payre
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Olivier Cuvier
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Chromatin Dynamics and Cell Proliferation team, CBI, CNRS, UPS, 31062 Toulouse, France
| | - Cédric Polesello
- Molecular, Cellular and Developmental biology department (MCD), Centre de Biologie Integrative (CBI), Université de Toulouse, CNRS, UPS, 31062 Toulouse, France
- Control of cell shape remodeling team, CBI, CNRS, UPS, 31062 Toulouse, France
| |
Collapse
|
7
|
Chen P, Zhang S, Zhao K, Kang X, Rittman T, Liu Y. Robustly uncovering the heterogeneity of neurodegenerative disease by using data-driven subtyping in neuroimaging: A review. Brain Res 2024; 1823:148675. [PMID: 37979603 DOI: 10.1016/j.brainres.2023.148675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/19/2023] [Accepted: 11/07/2023] [Indexed: 11/20/2023]
Abstract
Neurodegenerative diseases are associated with heterogeneity in genetics, pathology, and clinical manifestation. Understanding this heterogeneity is particularly relevant for clinical prognosis and stratifying patients for disease modifying treatments. Recently, data-driven methods based on neuroimaging have been applied to investigate the subtyping of neurodegenerative disease, helping to disentangle this heterogeneity. We reviewed brain-based subtyping studies in aging and representative neurodegenerative diseases, including Alzheimer's disease, mild cognitive impairment, frontotemporal dementia, and Lewy body dementia, from January 2000 to November 2022. We summarized clustering methods, validation, robustness, reproducibility, and clinical relevance of 71 eligible studies in the present study. We found vast variations in approaches between studies, including ten neuroimaging modalities, 24 cluster algorithms, and 41 methods of cluster number determination. The clinical relevance of subtyping studies was evaluated by summarizing the analysis method of clinical measurements, showing a relatively low clinical utility in the current studies. Finally, we conclude that future studies of heterogeneity in neurodegenerative disease should focus on validation, comparison between subtyping approaches, and prioritise clinical utility.
Collapse
Affiliation(s)
- Pindong Chen
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Department of Clinical Neurosciences, University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Shirui Zhang
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
| | - Kun Zhao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
| | - Xiaopeng Kang
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Timothy Rittman
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Yong Liu
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.
| |
Collapse
|
8
|
Emanuel RH, Docherty PD, Lunt H, Murray R, Campbell RE. Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning. INTELLIGENCE-BASED MEDICINE 2024; 9:100135. [DOI: 10.1016/j.ibmed.2024.100135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2025]
|
9
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
10
|
Daams MN. Estimating the allocation of land to business. PLoS One 2023; 18:e0288647. [PMID: 37531343 PMCID: PMC10396024 DOI: 10.1371/journal.pone.0288647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/01/2023] [Indexed: 08/04/2023] Open
Abstract
This paper is uniquely focused on mapping business land in satellite imagery, with the aim to introduce a standardized approach to estimating how much land in an observed area is allocated to business. Business land and control categories of land are defined and operationalized in a straightforward setting of pixel-based classification. The resultant map as well as information from a sample-based quantification of the map's accuracy are used jointly to estimate business land's total area more precisely. In particular, areas where so-called errors of omission are possibly concentrated are accounted for by post-stratifying the map in an extension of recent advances in remote sensing. In specific, a post-stratum is designed to enclose areas where business activity is co-located. This then enhances the area estimation in a spatially explicit way that is informed by urban and regional economic thought and observation. In demonstrating the methodology, a map for the San Francisco Bay Area metropolitan area is obtained at a producer's accuracy of 0.89 (F1-score = 0.84) or 0.82 to 0.94 when sub-selecting reference sample pixels by confidence in class assignment. Overall, the methodological approach is able to infer the allocation of land to business (in km2 ± 95% C.I.) on a timely and accurate basis. This inter-disciplinary study may offer some fundamental ground for a potentially more refined assessment and understanding of the spatial distribution of production factors as well as the related structure and implications of land use.
Collapse
Affiliation(s)
- Michiel N Daams
- Department of Economic Geography, Faculty of Spatial Sciences, University of Groningen, Groningen, the Netherlands
- Rudolf Agricola School for Sustainable Development, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
11
|
Erdem C, Gross SM, Heiser LM, Birtwistle MR. MOBILE pipeline enables identification of context-specific networks and regulatory mechanisms. Nat Commun 2023; 14:3991. [PMID: 37414767 PMCID: PMC10326020 DOI: 10.1038/s41467-023-39729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Robust identification of context-specific network features that control cellular phenotypes remains a challenge. We here introduce MOBILE (Multi-Omics Binary Integration via Lasso Ensembles) to nominate molecular features associated with cellular phenotypes and pathways. First, we use MOBILE to nominate mechanisms of interferon-γ (IFNγ) regulated PD-L1 expression. Our analyses suggest that IFNγ-controlled PD-L1 expression involves BST2, CLIC2, FAM83D, ACSL5, and HIST2H2AA3 genes, which were supported by prior literature. We also compare networks activated by related family members transforming growth factor-beta 1 (TGFβ1) and bone morphogenetic protein 2 (BMP2) and find that differences in ligand-induced changes in cell size and clustering properties are related to differences in laminin/collagen pathway activity. Finally, we demonstrate the broad applicability and adaptability of MOBILE by analyzing publicly available molecular datasets to investigate breast cancer subtype specific networks. Given the ever-growing availability of multi-omics datasets, we envision that MOBILE will be broadly useful for identification of context-specific molecular features and pathways.
Collapse
Affiliation(s)
- Cemal Erdem
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA
| | - Sean M Gross
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Laura M Heiser
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA.
| | - Marc R Birtwistle
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC, USA.
- Department of Bioengineering, Clemson University, Clemson, SC, USA.
| |
Collapse
|
12
|
Li A, Xiong S, Li J, Mallik S, Liu Y, Fei R, Zhou H, Liu G. AngClust: Angle Feature-Based Clustering for Short Time Series Gene Expression Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1574-1580. [PMID: 35853049 DOI: 10.1109/tcbb.2022.3192306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
When clustering gene expression, it is expected that correlation coefficients of genes in the same clusters are high, and that gene ontology (GO) enrichment analysis of most clusters will be significant. However, existing short-term gene expression clustering algorithms have limitations. To address this problem, we proposed a novel clustering process based on angular features for short-term gene expression. Our method (named AngClust) uses angular features to indicate the change of trend in gene expression levels at two neighboring time points. The changes of angles at multiple time points reflects the change of trend of the overall expression levels. Such changes are used to measure whether the expression trends of different genes are similar. To obtain functionally significant clusters from the clustering results, we evaluated numbers of genes in clusters, average correlation coefficient, fluctuation, and their correlation with GO term enrichment. The efficacy of AngClust outperform two other measures, Euclidean distance (ED) and dynamic time warping of correlation (DTW), on a dataset of yeast gene expression. The ratios of GO and pathway term-enriched of clusters of AngClust is higher than or equal to that of STEM and TMixClust on human, mouse, and yeast time series of gene expression.
Collapse
|
13
|
Esnault C, Rollot M, Guilmin P, Zucker JD. Qluster: An easy-to-implement generic workflow for robust clustering of health data. Front Artif Intell 2023; 5:1055294. [PMID: 36814808 PMCID: PMC9939832 DOI: 10.3389/frai.2022.1055294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 12/22/2022] [Indexed: 02/08/2023] Open
Abstract
The exploration of heath data by clustering algorithms allows to better describe the populations of interest by seeking the sub-profiles that compose it. This therefore reinforces medical knowledge, whether it is about a disease or a targeted population in real life. Nevertheless, contrary to the so-called conventional biostatistical methods where numerous guidelines exist, the standardization of data science approaches in clinical research remains a little discussed subject. This results in a significant variability in the execution of data science projects, whether in terms of algorithms used, reliability and credibility of the designed approach. Taking the path of parsimonious and judicious choice of both algorithms and implementations at each stage, this article proposes Qluster, a practical workflow for performing clustering tasks. Indeed, this workflow makes a compromise between (1) genericity of applications (e.g. usable on small or big data, on continuous, categorical or mixed variables, on database of high-dimensionality or not), (2) ease of implementation (need for few packages, few algorithms, few parameters, ...), and (3) robustness (e.g. use of proven algorithms and robust packages, evaluation of the stability of clusters, management of noise and multicollinearity). This workflow can be easily automated and/or routinely applied on a wide range of clustering projects. It can be useful both for data scientists with little experience in the field to make data clustering easier and more robust, and for more experienced data scientists who are looking for a straightforward and reliable solution to routinely perform preliminary data mining. A synthesis of the literature on data clustering as well as the scientific rationale supporting the proposed workflow is also provided. Finally, a detailed application of the workflow on a concrete use case is provided, along with a practical discussion for data scientists. An implementation on the Dataiku platform is available upon request to the authors.
Collapse
Affiliation(s)
| | | | | | - Jean-Daniel Zucker
- Sorbonne University, IRD, UMMISCO, Bondy, France
- Sorbonne University, INSERM, NUTRIOMICS, Paris, France
| |
Collapse
|
14
|
Guan J, Li S, He X, Chen J. Clustering by fast detection of main density peaks within a peak digraph. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
15
|
Putri GH, Chung J, Edwards DN, Marsh-Wakefield F, Koprinska I, Dervish S, King NJC, Ashhurst TM, Read MN. TrackSOM: Mapping immune response dynamics through clustering of time-course cytometry data. Cytometry A 2023; 103:54-70. [PMID: 35758217 DOI: 10.1002/cyto.a.24668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 06/02/2022] [Accepted: 06/24/2022] [Indexed: 01/20/2023]
Abstract
Mapping the dynamics of immune cell populations over time or disease-course is key to understanding immunopathogenesis and devising putative interventions. We present TrackSOM, a novel method for delineating cellular populations and tracking their development over a time- or disease-course cytometry datasets. We demonstrate TrackSOM-enabled elucidation of the immune response to West Nile Virus infection in mice, uncovering heterogeneous subpopulations of immune cells and relating their functional evolution to disease severity. TrackSOM is easy to use, encompasses few parameters, is quick to execute, and enables an integrative and dynamic overview of the immune system kinetics that underlie disease progression and/or resolution.
Collapse
Affiliation(s)
- Givanna H Putri
- School of Computer Science, The University of Sydney, Sydney, New South Wales, Australia.,Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia
| | - Jonathan Chung
- The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia.,Viral Immunopathology Laboratory, Discipline of Pathology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Davis N Edwards
- The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia.,School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Felix Marsh-Wakefield
- The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia.,School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia.,Vascular Immunology Unit, Department of Pathology, The University of Sydney, Sydney, New South Wales, Australia.,Sydney Cytometry Core Research Facility, The University of Sydney and Centenary Institute, Sydney, New South Wales, Australia
| | - Irena Koprinska
- The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia
| | - Suat Dervish
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| | - Nicholas J C King
- The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia.,School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia.,Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, New South Wales, Australia.,Sydney Nano, The University of Sydney, Sydney, New South Wales, Australia
| | - Thomas M Ashhurst
- The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia.,Vascular Immunology Unit, Department of Pathology, The University of Sydney, Sydney, New South Wales, Australia.,Sydney Institute for Infectious Diseases, The University of Sydney, Sydney, New South Wales, Australia.,Sydney Nano, The University of Sydney, Sydney, New South Wales, Australia
| | - Mark N Read
- Charles Perkins Centre, The University of Sydney, Sydney, New South Wales, Australia.,The Westmead Initiative, The University of Sydney, Sydney, New South Wales, Australia.,Viral Immunopathology Laboratory, Discipline of Pathology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
16
|
Ren M, Zhang Q, Zhang S, Zhong T, Huang J, Ma S. Hierarchical cancer heterogeneity analysis based on histopathological imaging features. Biometrics 2022; 78:1579-1591. [PMID: 34390584 PMCID: PMC8995088 DOI: 10.1111/biom.13544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 08/01/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022]
Abstract
In cancer research, supervised heterogeneity analysis has important implications. Such analysis has been traditionally based on clinical/demographic/molecular variables. Recently, histopathological imaging features, which are generated as a byproduct of biopsy, have been shown as effective for modeling cancer outcomes, and a handful of supervised heterogeneity analysis has been conducted based on such features. There are two types of histopathological imaging features, which are extracted based on specific biological knowledge and using automated imaging processing software, respectively. Using both types of histopathological imaging features, our goal is to conduct the first supervised cancer heterogeneity analysis that satisfies a hierarchical structure. That is, the first type of imaging features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. It has satisfactory statistical and numerical properties. In the analysis of lung adenocarcinoma data, it identifies a heterogeneity structure significantly different from the alternatives and has satisfactory prediction and stability performance.
Collapse
Affiliation(s)
- Mingyang Ren
- School of Mathematics Sciences, University of Chinese Academy of Sciences, Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China
| | - Qingzhao Zhang
- MOE Key Laboratory of Economics, Department of Statistics, School of Economics, The Wang Yanan Institute for Studies in Economics and Fujian Key Lab of Statistics, Xiamen University, Xiamen, China
| | - Sanguo Zhang
- School of Mathematics Sciences, University of Chinese Academy of Sciences, Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China
| | - Tingyan Zhong
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Jian Huang
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
17
|
Niu X, Taylor A, Shinohara RT, Kounios J, Zhang F. Multidimensional brain-age prediction reveals altered brain developmental trajectory in psychiatric disorders. Cereb Cortex 2022; 32:5036-5049. [PMID: 35094075 DOI: 10.1093/cercor/bhab530] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 12/22/2021] [Accepted: 12/23/2021] [Indexed: 12/27/2022] Open
Abstract
Brain-age prediction has emerged as a novel approach for studying brain development. However, brain regions change in different ways and at different rates. Unitary brain-age indices represent developmental status averaged across the whole brain and therefore do not capture the divergent developmental trajectories of various brain structures. This staggered developmental unfolding, determined by genetics and postnatal experience, is implicated in the progression of psychiatric and neurological disorders. We propose a multidimensional brain-age index (MBAI) that provides regional age predictions. Using a database of 556 individuals, we identified clusters of imaging features with distinct developmental trajectories and built machine learning models to obtain brain-age predictions from each of the clusters. Our results show that the MBAI provides a flexible analysis of region-specific brain-age changes that are invisible to unidimensional brain-age. Importantly, brain-ages computed from region-specific feature clusters contain complementary information and demonstrate differential ability to distinguish disorder groups (e.g., depression and oppositional defiant disorder) from healthy controls. In summary, we show that MBAI is sensitive to alterations in brain structures and captures distinct regional change patterns that may serve as biomarkers that contribute to our understanding of healthy and pathological brain development and the characterization and diagnosis of psychiatric disorders.
Collapse
Affiliation(s)
- Xin Niu
- Department of Psychology, Drexel University, Philadelphia, PA 19104, USA
| | - Alexei Taylor
- Department of Psychology, Drexel University, Philadelphia, PA 19104, USA
| | - Russell T Shinohara
- Perelman School of Medicine, Center for Biomedical Image Computation and Analytics, University of Pennsylvania, Philadelphia, PA 19104, USA.,Department of Biostatistics, Epidemiology and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - John Kounios
- Department of Psychology, Drexel University, Philadelphia, PA 19104, USA
| | - Fengqing Zhang
- Department of Psychology, Drexel University, Philadelphia, PA 19104, USA
| |
Collapse
|
18
|
Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022; 141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) algorithms are increasingly being used to help implement clinical decision support systems. In this new field, we define as "translational machine learning", joint efforts and strong communication between data scientists and clinicians help to span the gap between ML and its adoption in the clinic. These collaborations also improve interpretability and trust in translational ML methods and ultimately aim to result in generalizable and reproducible models. To help clinicians and bioinformaticians refine their translational ML pipelines, we review the steps from model building to the use of ML in the clinic. We discuss experimental setup, computational analysis, interpretability and reproducibility, and emphasize the challenges involved. We highly advise collaboration and data sharing between consortia and institutes to build multi-centric cohorts that facilitate ML methodologies that generalize across centers. In the end, we hope that this review provides a way to streamline translational ML and helps to tackle the challenges that come with it.
Collapse
Affiliation(s)
- Artuur Couckuyt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Ruth Seurinck
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Annelies Emmaneel
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Katrien Quintelier
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
- Department of Pulmonary Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - David Novak
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Sofie Van Gassen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium.
| |
Collapse
|
19
|
A New Clustering Method Based on the Inversion Formula. MATHEMATICS 2022. [DOI: 10.3390/math10152559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Data clustering is one area of data mining that falls into the data mining class of unsupervised learning. Cluster analysis divides data into different classes by discovering the internal structure of data set objects and their relationship. This paper presented a new density clustering method based on the modified inversion formula density estimation. This new method should allow one to improve the performance and robustness of the k-means, Gaussian mixture model, and other methods. The primary process of the proposed clustering algorithm consists of three main steps. Firstly, we initialized parameters and generated a T matrix. Secondly, we estimated the densities of each point and cluster. Third, we updated mean, sigma, and phi matrices. The new method based on the inversion formula works quite well with different datasets compared with K-means, Gaussian Mixture Model, and Bayesian Gaussian Mixture model. On the other hand, new methods have limitations because this one method in the current state cannot work with higher-dimensional data (d > 15). This will be solved in the future versions of the model, detailed further in future work. Additionally, based on the results, we can see that the MIDEv2 method works the best with generated data with outliers in all datasets (0.5%, 1%, 2%, 4% outliers). The interesting point is that a new method based on the inversion formula can cluster the data even if data do not have outliers; one of the most popular, for example, is the Iris dataset.
Collapse
|
20
|
Wang Z, Wang C, Xie Z, Huang X, ShangGuan H, Zhu W, Wang S. Echocardiographic phenotypes of Chinese patients with type 2 diabetes may indicate early diabetic myocardial disease. ESC Heart Fail 2022; 9:3327-3344. [PMID: 35831174 DOI: 10.1002/ehf2.14062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 05/19/2022] [Accepted: 06/21/2022] [Indexed: 11/06/2022] Open
Abstract
AIM Type 2 diabetes may impair cardiac structure and function at very early stage, other factors, for example, obesity and hypertension, can induce aforementioned abnormalities individually. This study aimed to explore precise prevention and treatment of diabetic cardiomyopathy (DCM) by using cluster analysis of echocardiographic variables. METHODS AND RESULTS A total of 66 536 inpatients with diabetes from 2013 to 2018 were investigated, and 7112 patients were available for analysis after nadir. The cluster analysis was performed on echocardiographic variables to assess the clinical profiles and risk factors of clusters. Two clusters were identified. Cluster 1 with 3576 patients (50.3%, including 62.5% female) had hypertension in 62.4%, while the lower rate of obesity (13.7%). Ultrasound findings showed that 79.9% of them had left ventricular diastolic dysfunction (LVDD), the most characteristic change in the early stages of DCM. Systolic blood pressure (SBP), uric acid and antithrombin III were independent risk factors for LVDD (P < 0.0001); 64.0% of the 3536 patients in the second group were male, with a high prevalence of obesity (30.1%) and a higher prevalence of hypertension (79.5%), In particular, decreased systolic function and a high rate of LV hypertrophy (46.8%) represented the progressive phase of DCM (P < 0.0001). SBP, diastolic blood pressure, BMI and creatinine were independent correlates of LV mass index (P < 0.05). CONCLUSION The cluster analysis of echocardiographic variables may improve the identification of groups of patients with similar risks and different disease courses and will facilitate the achievement of targeted early prevention and treatment of DCM.
Collapse
Affiliation(s)
- Zheng Wang
- Department of Endocrinology, The Affiliated ZhongDa Hospital of Southeast University, Nanjing, China.,School of Medicine, Southeast University, Nanjing, China
| | - ChenChen Wang
- Department of Endocrinology, The Affiliated ZhongDa Hospital of Southeast University, Nanjing, China.,School of Medicine, Southeast University, Nanjing, China
| | - ZuoLing Xie
- Department of Endocrinology, The Affiliated ZhongDa Hospital of Southeast University, Nanjing, China.,School of Medicine, Southeast University, Nanjing, China
| | - Xi Huang
- Department of Endocrinology, The Affiliated ZhongDa Hospital of Southeast University, Nanjing, China
| | - HaiYan ShangGuan
- School of Medicine, Southeast University, Nanjing, China.,Nanjing Central Hospital, Nanjing, China
| | - WenWen Zhu
- School of Medicine, Southeast University, Nanjing, China
| | - ShaoHua Wang
- Department of Endocrinology, The Affiliated ZhongDa Hospital of Southeast University, Nanjing, China
| |
Collapse
|
21
|
Guo W, Wang W, Zhao S, Niu Y, Zhang Z, Liu X. Density Peak Clustering with connectivity estimation. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108501] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
22
|
MoSBi: Automated signature mining for molecular stratification and subtyping. Proc Natl Acad Sci U S A 2022; 119:e2118210119. [PMID: 35412913 PMCID: PMC9169782 DOI: 10.1073/pnas.2118210119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Molecular patient stratification and disease subtyping are ongoing and high-impact problems that rely on the identification of characteristic molecular signatures. Current computational methods show high sensitivity to custom parameterization, which leads to inconsistent performance on different molecular data. Our new method, MoSBi (molecular signature identification using biclustering), 1) enables so far unmatched high performance for stratification and subtyping across datasets of various different biomolecules, 2) provides a scalable solution for visualizing the results and their correspondence to clinical factors, and 3) has immediate practical relevance through its automatic workflow where individual selection, parameterization, screening, and visualization of biclustering algorithms is not required. MoSBi is a major step forward with a high impact for clinical and wet-lab researchers. The improving access to increasing amounts of biomedical data provides completely new chances for advanced patient stratification and disease subtyping strategies. This requires computational tools that produce uniformly robust results across highly heterogeneous molecular data. Unsupervised machine learning methodologies are able to discover de novo patterns in such data. Biclustering is especially suited by simultaneously identifying sample groups and corresponding feature sets across heterogeneous omics data. The performance of available biclustering algorithms heavily depends on individual parameterization and varies with their application. Here, we developed MoSBi (molecular signature identification using biclustering), an automated multialgorithm ensemble approach that integrates results utilizing an error model-supported similarity network. We systematically evaluated the performance of 11 available and established biclustering algorithms together with MoSBi. For this, we used transcriptomics, proteomics, and metabolomics data, as well as synthetic datasets covering various data properties. Profiting from multialgorithm integration, MoSBi identified robust group and disease-specific signatures across all scenarios, overcoming single algorithm specificities. Furthermore, we developed a scalable network-based visualization of bicluster communities that supports biological hypothesis generation. MoSBi is available as an R package and web service to make automated biclustering analysis accessible for application in molecular sample stratification.
Collapse
|
23
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
24
|
Buechler E, Powell S, Sun T, Astier N, Zanocco C, Bolorinos J, Flora J, Boudet H, Rajagopal R. Global changes in electricity consumption during COVID-19. iScience 2022; 25:103568. [PMID: 34877481 PMCID: PMC8641442 DOI: 10.1016/j.isci.2021.103568] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 09/08/2021] [Accepted: 11/30/2021] [Indexed: 01/19/2023] Open
Abstract
Understanding how the COVID-19 pandemic has altered electricity consumption can provide insights into society's responses to future shocks and other extreme events. We quantify changes in electricity consumption in 58 different countries/regions around the world from January-October 2020 and examine how those changes relate to government restrictions, health outcomes, GDP, mobility metrics, and electricity sector characteristics in different countries. We cluster the timeseries of electricity consumption changes to identify impact groupings that capture systematic differences in timing, depth of initial changes, and recovery rate, revealing substantial heterogeneity. Results show that stricter government restrictions and larger decreases in mobility (particularly retail and recreation) are most tightly linked to decreases in electricity consumption, although these relationships are strongest during the initial phase of the pandemic. We find indications that decreases in electricity consumption relate to pre-pandemic sensitivity to holidays, suggesting a new direction for future research.
Collapse
Affiliation(s)
| | - Siobhan Powell
- Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Tao Sun
- Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA
| | | | - Chad Zanocco
- Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA
| | - Jose Bolorinos
- Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA
| | - June Flora
- Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA
| | - Hilary Boudet
- School of Public Policy, Oregon State University, Corvallis, OR 97331, USA
| | - Ram Rajagopal
- Civil and Environmental Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
25
|
Fratello M, Cattelani L, Federico A, Pavel A, Scala G, Serra A, Greco D. Unsupervised Algorithms for Microarray Sample Stratification. Methods Mol Biol 2022; 2401:121-146. [PMID: 34902126 DOI: 10.1007/978-1-0716-1839-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.
Collapse
Affiliation(s)
- Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Giovanni Scala
- Department of Biology, University of Naples Federico II, Naples, Italy
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
- BioMediTech Institute, Tampere University, Tampere, Finland.
- Finnish Hub for Development and Validation of Integrated Approaches (FHAIVE), Tampere University, Tampere, Finland.
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
26
|
Zhong W, Gu F. Predicting Local Protein 3D Structures Using Clustering Deep Recurrent Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:593-604. [PMID: 32750880 DOI: 10.1109/tcbb.2020.3005972] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Since protein 3D structure prediction is very important for biochemical study and drug design, researchers have developed many machine learning algorithms to predict protein 3D structures using the sequence information only. Understanding the sequence-to-structure relationship is key for the successful structure prediction. Previous approaches including the single shallow learning model, the single deep learning model and clustering algorithms all have disadvantages to understand precise sequence-to-structure relationship. In order to further improve the performance of the local protein structure prediction, a novel deep learning model called Clustering Recurrent Neural Network (CRNN) is proposed. In this model, the whole protein dataset is divided into multiple cluster subtrees. A RNN is trained for each cluster in the subtrees so that each RNN can be used to learn the computationally simpler local sequence-to-structure relationship instead of attempting to capture the global sequence-to-structure relationship. After learning the local sequence-to-structure relationship using RNN, CRNN is designed to predict distance matrices, torsion angles and secondary structures for backbone α-carbon atoms of protein sequence segments. Our experimental analysis indicates that 3D structure prediction accuracy is comparable or better than other state-of-art approaches.
Collapse
|
27
|
Madjar K, Zucknick M, Ickstadt K, Rahnenführer J. Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression. BMC Bioinformatics 2021; 22:586. [PMID: 34895139 PMCID: PMC8665528 DOI: 10.1186/s12859-021-04483-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 11/15/2021] [Indexed: 11/12/2022] Open
Abstract
Background Important objectives in cancer research are the prediction of a patient’s risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is often challenging because patient cohorts are typically small and can be heterogeneous. In classical subgroup analysis, a separate prediction model is fitted using only the data of one specific cohort. However, this can lead to a loss of power when the sample size is small. Simple pooling of all cohorts, on the other hand, can lead to biased results, especially when the cohorts are heterogeneous. Results We propose a new Bayesian approach suitable for continuous molecular measurements and survival outcome that identifies the important predictors and provides a separate risk prediction model for each cohort. It allows sharing information between cohorts to increase power by assuming a graph linking predictors within and across different cohorts. The graph helps to identify pathways of functionally related genes and genes that are simultaneously prognostic in different cohorts. Conclusions Results demonstrate that our proposed approach is superior to the standard approaches in terms of prediction performance and increased power in variable selection when the sample size is small. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04483-z.
Collapse
Affiliation(s)
- Katrin Madjar
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany.
| | - Manuela Zucknick
- Department of Biostatistics, Oslo Centre for Biostatistics and Epidemiology, University of Oslo, 0317, Oslo, Norway
| | - Katja Ickstadt
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany
| |
Collapse
|
28
|
Madjar K, Rahnenführer J. Weighted Cox regression for the prediction of heterogeneous patient subgroups. BMC Med Inform Decis Mak 2021; 21:342. [PMID: 34876106 PMCID: PMC8650299 DOI: 10.1186/s12911-021-01698-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 11/23/2021] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling high-dimensional data are good prediction performance and feature selection to find a subset of predictors that are truly associated with a clinical outcome such as a time-to-event endpoint. In clinical practice, this task is challenging since patient cohorts are typically small and can be heterogeneous with regard to their relationship between predictors and outcome. When data of several subgroups of patients with the same or similar disease are available, it is tempting to combine them to increase sample size, such as in multicenter studies. However, heterogeneity between subgroups can lead to biased results and subgroup-specific effects may remain undetected. METHODS For this situation, we propose a penalized Cox regression model with a weighted version of the Cox partial likelihood that includes patients of all subgroups but assigns them individual weights based on their subgroup affiliation. The weights are estimated from the data such that patients who are likely to belong to the subgroup of interest obtain higher weights in the subgroup-specific model. RESULTS Our proposed approach is evaluated through simulations and application to real lung cancer cohorts, and compared to existing approaches. Simulation results demonstrate that our proposed model is superior to standard approaches in terms of prediction performance and variable selection accuracy when the sample size is small. CONCLUSIONS The results suggest that sharing information between subgroups by incorporating appropriate weights into the likelihood can increase power to identify the prognostic covariates and improve risk prediction.
Collapse
Affiliation(s)
- Katrin Madjar
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany.
| | - Jörg Rahnenführer
- Department of Statistics, TU Dortmund University, 44221, Dortmund, Germany
| |
Collapse
|
29
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
30
|
Distance-based clustering challenges for unbiased benchmarking studies. Sci Rep 2021; 11:18988. [PMID: 34556686 PMCID: PMC8460803 DOI: 10.1038/s41598-021-98126-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 09/02/2021] [Indexed: 02/08/2023] Open
Abstract
Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures. Clustering yields arbitrary labels and often depends on the trial, leading to varying results. Moreover, recent research indicated that all partition comparison measures can yield the same results for different clustering solutions. Consequently, algorithm selection and parameter optimization by unsupervised quality measures (QM) are always biased and misleading. Only if the predefined structures happen to meet the particular clustering criterion and QM, can the clusters be recovered. Results are presented based on 41 open-source algorithms which are particularly useful in biomedical scenarios. Furthermore, comparative analysis with mirrored density plots provides a significantly more detailed benchmark than that with the typically used box plots or violin plots.
Collapse
|
31
|
Guan J, Li S, He X, Zhu J, Chen J. Fast hierarchical clustering of local density peaks via an association degree transfer method. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.071] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
32
|
Clinical and biological clusters of sepsis patients using hierarchical clustering. PLoS One 2021; 16:e0252793. [PMID: 34347776 PMCID: PMC8336799 DOI: 10.1371/journal.pone.0252793] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 05/24/2021] [Indexed: 11/24/2022] Open
Abstract
Background Heterogeneity in sepsis expression is multidimensional, including highly disparate data such as the underlying disorders, infection source, causative micro-organismsand organ failures. The aim of the study is to identify clusters of patients based on clinical and biological characteristic available at patients’ admission. Methods All patients included in a national prospective multicenter ICU cohort OUTCOMEREA and admitted for sepsis or septic shock (Sepsis 3.0 definition) were retrospectively analyzed. A hierarchical clustering was performed in a training set of patients to build clusters based on a comprehensive set of clinical and biological characteristics available at ICU admission. Clusters were described, and the 28-day, 90-day, and one-year mortality were compared with log-rank rates. Risks of mortality were also compared after adjustment on SOFA score and year of ICU admission. Results Of the 6,046 patients with sepsis in the cohort, 4,050 (67%) were randomly allocated to the training set. Six distinct clusters were identified: young patients without any comorbidities, admitted in ICU for community-acquired pneumonia (n = 1,603 (40%)); young patients without any comorbidities, admitted in ICU for meningitis or encephalitis (n = 149 (4%)); elderly patients with COPD, admitted in ICU for bronchial infection with few organ failures (n = 243 (6%)); elderly patients, with several comorbidities and organ failures (n = 1,094 (27%)); patients admitted after surgery, with a nosocomial infection (n = 623 (15%)); young patients with immunosuppressive conditions (e.g., AIDS, chronic steroid therapy or hematological malignancy) (n = 338 (8%)). Clusters differed significantly in early or late mortality (p < .001), even after adjustment on severity of organ dysfunctions (SOFA) and year of ICU admission. Conclusions Clinical and biological features commonly available at ICU admission of patients with sepsis or septic shock enabled to set up six clusters of patients, with very distinct outcomes. Considering these clusters may improve the care management and the homogeneity of patients in future studies.
Collapse
|
33
|
Prakash J, Wang V, Quinn RE, Mitchell CS. Unsupervised Machine Learning to Identify Separable Clinical Alzheimer's Disease Sub-Populations. Brain Sci 2021; 11:977. [PMID: 34439596 PMCID: PMC8392842 DOI: 10.3390/brainsci11080977] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 07/10/2021] [Accepted: 07/20/2021] [Indexed: 11/20/2022] Open
Abstract
Heterogeneity among Alzheimer's disease (AD) patients confounds clinical trial patient selection and therapeutic efficacy evaluation. This work defines separable AD clinical sub-populations using unsupervised machine learning. Clustering (t-SNE followed by k-means) of patient features and association rule mining (ARM) was performed on the ADNIMERGE dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Patient sociodemographics, brain imaging, biomarkers, cognitive tests, and medication usage were included for analysis. Four AD clinical sub-populations were identified using between-cluster mean fold changes [cognitive performance, brain volume]: cluster-1 represented least severe disease [+17.3, +13.3]; cluster-0 [-4.6, +3.8] and cluster-3 [+10.8, -4.9] represented mid-severity sub-populations; cluster-2 represented most severe disease [-18.4, -8.4]. ARM assessed frequently occurring pharmacologic substances within the 4 sub-populations. No drug class was associated with the least severe AD (cluster-1), likely due to lesser antecedent disease. Anti-hyperlipidemia drugs associated with cluster-0 (mid-severity, higher volume). Interestingly, antioxidants vitamin C and E associated with cluster-3 (mid-severity, higher cognition). Anti-depressants like Zoloft associated with most severe disease (cluster-2). Vitamin D is protective for AD, but ARM identified significant underutilization across all AD sub-populations. Identification and feature characterization of four distinct AD sub-population "clusters" using standard clinical features enhances future clinical trial selection criteria and cross-study comparative analysis.
Collapse
Affiliation(s)
- Jayant Prakash
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology and Emory University School of Medicine, Atlanta, GA 30332, USA; (J.P.); (V.W.); (R.E.Q.III)
- Department of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Velda Wang
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology and Emory University School of Medicine, Atlanta, GA 30332, USA; (J.P.); (V.W.); (R.E.Q.III)
| | - Robert E. Quinn
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology and Emory University School of Medicine, Atlanta, GA 30332, USA; (J.P.); (V.W.); (R.E.Q.III)
- Department of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology and Emory University School of Medicine, Atlanta, GA 30332, USA; (J.P.); (V.W.); (R.E.Q.III)
- Center for Machine Learning, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
34
|
Sheikhi G, Altınçay H. A novel dissimilarity metric based on feature‐to‐feature scatter frequencies for clustering‐based feature selection in biomedical data. Comput Intell 2021. [DOI: 10.1111/coin.12470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ghazaal Sheikhi
- Department of Computer Engineering Final International University Kyrenia North Cyprus Turkey
| | - Hakan Altınçay
- Department of Computer Engineering Eastern Mediterranean University Famagusta North Cyprus Turkey
| |
Collapse
|
35
|
Hajian R, DeCastro J, Parkinson J, Kane A, Camelo AFR, Chou PP, Yang J, Wong N, Hernandez EDO, Goldsmith B, Conboy I, Aran K. Rapid and Electronic Identification and Quantification of Age-Specific Circulating Exosomes via Biologically Activated Graphene Transistors. Adv Biol (Weinh) 2021; 5:e2000594. [PMID: 33929095 DOI: 10.1002/adbi.202000594] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/23/2021] [Indexed: 12/12/2022]
Abstract
Increasing access to modern clinical practices concomitantly extends lifespan, ironically revealing new classes of degenerative and inflammatory diseases of later years. Here, an electronic graphene field-effect transistor (gFET) is reported, termed EV-chip, for label-free, rapid identification and quantification of exosomes (EV) associated with aging through specific surface markers, CD63 and CD151. Studies suggest that blood-derived exosomes carry specific biomolecules that can be used toward diagnostic applications of age and health. However, to observe improvements in patient outcomes, earlier detection at the point-of-care (POC) is required. Unfortunately, conventional techniques and other electronic-based platforms for exosome sensing are burdensome and inept for the POC distinction of aged blood factors. It is shown that EV-chip can quantitatively detect purified exosomes from plasma, with a limit of detection (LOD) of 2 × 104 particles mL-1 and a limit of quantification (LOQ) of 6 × 104 particles mL-1 . The sensitivity and compact electronics of the EV-chip improves upon previously published electronic biosensors, making it ideal for a physician's office or a simple biological laboratory. The sensitivity, selectivity, and portability of the EV-chip demonstrate the potential of the biosensor as a powerful point-of-care diagnostic and prognostic tool for age-related diseases.
Collapse
Affiliation(s)
- Reza Hajian
- Keck Graduate Institute, The Claremont Colleges, Claremont, CA, 91711, USA.,Cardea Bio Inc., 8969 Kenamar Dr. Suite 104, San Diego, CA, 92121, USA
| | - Jonalyn DeCastro
- Keck Graduate Institute, The Claremont Colleges, Claremont, CA, 91711, USA
| | | | - Alex Kane
- Cardea Bio Inc., 8969 Kenamar Dr. Suite 104, San Diego, CA, 92121, USA
| | | | - Peichi Peggy Chou
- Keck Science Department, Pitzer College, The Claremont Colleges, Claremont, CA, 91711, USA
| | - Jielin Yang
- Keck Science Department, Claremont McKenna College, The Claremont Colleges, Claremont, CA, 91711, USA
| | - Nathan Wong
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, 94720, USA
| | | | - Brett Goldsmith
- Cardea Bio Inc., 8969 Kenamar Dr. Suite 104, San Diego, CA, 92121, USA
| | - Irina Conboy
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Kiana Aran
- Keck Graduate Institute, The Claremont Colleges, Claremont, CA, 91711, USA.,Cardea Bio Inc., 8969 Kenamar Dr. Suite 104, San Diego, CA, 92121, USA.,Department of Bioengineering, University of California, Berkeley, Berkeley, CA, 94720, USA
| |
Collapse
|
36
|
Putri GH, Koprinska I, Ashhurst TM, King NJC, Read MN. Using single-cell cytometry to illustrate integrated multi-perspective evaluation of clustering algorithms using Pareto fronts. Bioinformatics 2021; 37:btab038. [PMID: 33508103 DOI: 10.1093/bioinformatics/btab038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 01/14/2021] [Accepted: 01/18/2021] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Many 'automated gating' algorithms now exist to cluster cytometry and single cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasise different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. RESULTS We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimises (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. AVAILABILITY Implementation of our Pareto front methodology and all scripts to reproduce this article are available at https://github.com/ghar1821/ParetoBench.
Collapse
Affiliation(s)
- Givanna H Putri
- School of Computer Science, The University of Sydney, Sydney, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, 2006, Australia
| | - Irena Koprinska
- School of Computer Science, The University of Sydney, Sydney, 2006, Australia
| | - Thomas M Ashhurst
- Sydney Cytometry Facility, The University of Sydney and Centenary Institute, Sydney, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, 2006, Australia
| | - Nicholas J C King
- Sydney Cytometry Facility, The University of Sydney and Centenary Institute, Sydney, 2006, Australia
- Discipline of Pathology, The University of Sydney, Sydney, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, 2006, Australia
| | - Mark N Read
- School of Computer Science, The University of Sydney, Sydney, 2006, Australia
- Westmead Initiative, The University of Sydney, Sydney, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, 2006, Australia
| |
Collapse
|
37
|
Identification of significantly mutated subnetworks in the breast cancer genome. Sci Rep 2021; 11:642. [PMID: 33436820 PMCID: PMC7804148 DOI: 10.1038/s41598-020-80204-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 12/17/2020] [Indexed: 11/24/2022] Open
Abstract
Recent studies showed that somatic cancer mutations target genes that are in specific signaling and cellular pathways. However, in each patient only a few of the pathway genes are mutated. Current approaches consider only existing pathways and ignore the topology of the pathways. For this reason, new efforts have been focused on identifying significantly mutated subnetworks and associating them with cancer characteristics. We applied two well-established network analysis approaches to identify significantly mutated subnetworks in the breast cancer genome. We took network topology into account for measuring the mutation similarity of a gene-pair to allow us to infer the significantly mutated subnetworks. Our goals are to evaluate whether the identified subnetworks can be used as biomarkers for predicting breast cancer patient survival and provide the potential mechanisms of the pathways enriched in the subnetworks, with the aim of improving breast cancer treatment. Using the copy number alteration (CNA) datasets from the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) study, we identified a significantly mutated yet clinically and functionally relevant subnetwork using two graph-based clustering algorithms. The mutational pattern of the subnetwork is significantly associated with breast cancer survival. The genes in the subnetwork are significantly enriched in retinol metabolism KEGG pathway. Our results show that breast cancer treatment with retinoids may be a potential personalized therapy for breast cancer patients since the CNA patterns of the breast cancer patients can imply whether the retinoids pathway is altered. We also showed that applying multiple bioinformatics algorithms at the same time has the potential to identify new network-based biomarkers, which may be useful for stratifying cancer patients for choosing optimal treatments.
Collapse
|
38
|
García-García JC, García-Ródenas R. A methodology for automatic parameter-tuning and center selection in density-peak clustering methods. Soft comput 2021. [DOI: 10.1007/s00500-020-05244-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
39
|
Ganesh S, Hu T, Woods E, Allam M, Cai S, Henderson W, Coskun AF. Spatially resolved 3D metabolomic profiling in tissues. SCIENCE ADVANCES 2021; 7:eabd0957. [PMID: 33571119 PMCID: PMC7840140 DOI: 10.1126/sciadv.abd0957] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 12/04/2020] [Indexed: 05/02/2023]
Abstract
Spatially resolved RNA and protein molecular analyses have revealed unexpected heterogeneity of cells. Metabolic analysis of individual cells complements these single-cell studies. Here, we present a three-dimensional spatially resolved metabolomic profiling framework (3D-SMF) to map out the spatial organization of metabolic fragments and protein signatures in immune cells of human tonsils. In this method, 3D metabolic profiles were acquired by time-of-flight secondary ion mass spectrometry to profile up to 189 compounds. Ion beams were used to measure sub-5-nanometer layers of tissue across 150 sections of a tonsil. To incorporate cell specificity, tonsil tissues were labeled by an isotope-tagged antibody library. To explore relations of metabolic and cellular features, we carried out data reduction, 3D spatial correlations and classifications, unsupervised K-means clustering, and network analyses. Immune cells exhibited spatially distinct lipidomic fragment distributions in lymphatic tissue. The 3D-SMF pipeline affects studying the immune cells in health and disease.
Collapse
Affiliation(s)
- Shambavi Ganesh
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Electrical and Computer Engineering Department, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Thomas Hu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Electrical and Computer Engineering Department, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Eric Woods
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
- Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Mayar Allam
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Shuangyi Cai
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Walter Henderson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
- Institute for Electronics and Nanotechnology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Ahmet F Coskun
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA.
| |
Collapse
|
40
|
Dimitrov D, Gu Q. BingleSeq: a user-friendly R package for bulk and single-cell RNA-Seq data analysis. PeerJ 2020; 8:e10469. [PMID: 33391870 PMCID: PMC7761193 DOI: 10.7717/peerj.10469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/11/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND RNA sequencing is an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is differential expression analysis and it is used to determine genetic loci with distinct expression across different conditions. An emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both of these approaches include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that they require programing expertise. Although some effort has been directed toward the development of user-friendly RNA-Seq analysis analysis tools, few have the flexibility to explore both Bulk and single-cell RNA sequencing. IMPLEMENTATION BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface which incorporates three state-of-the-art software packages for each type of the aforementioned analyses. Furthermore, BingleSeq includes additional features such as visualization techniques, extensive functional annotation analysis and rank-based consensus for differential gene analysis results. As a result, BingleSeq puts some of the best reviewed and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programing experience. AVAILABILITY BingleSeq is as an easy-to-install R package available on GitHub at https://github.com/dbdimitrov/BingleSeq/.
Collapse
Affiliation(s)
- Daniel Dimitrov
- MRC-University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, UK
| | - Quan Gu
- MRC-University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, UK
| |
Collapse
|
41
|
Nguyen QH, Le DH. Improving existing analysis pipeline to identify and analyze cancer driver genes using multi-omics data. Sci Rep 2020; 10:20521. [PMID: 33239644 PMCID: PMC7688645 DOI: 10.1038/s41598-020-77318-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 10/26/2020] [Indexed: 12/18/2022] Open
Abstract
The cumulative of genes carrying mutations is vital for the establishment and development of cancer. However, this driver gene exploring research line has selected and used types of tools and models of analysis unsystematically and discretely. Also, the previous studies may have neglected low-frequency drivers and seldom predicted subgroup specificities of identified driver genes. In this study, we presented an improved driver gene identification and analysis pipeline that comprises the four most widely focused analyses for driver genes: enrichment analysis, clinical feature association with expression profiles of identified driver genes as well as with their functional modules, and patient stratification by existing advanced computational tools integrating multi-omics data. The improved pipeline's general usability was demonstrated straightforwardly for breast cancer, validated by some independent databases. Accordingly, 31 validated driver genes, including four novel ones, were discovered. Subsequently, we detected cancer-related significantly enriched gene ontology terms and pathways, probable drug targets, two co-expressed modules associated significantly with several clinical features, such as number of positive lymph nodes, Nottingham prognostic index, and tumor stage, and two biologically distinct groups of BRCA patients. Data and source code of the case study can be downloaded at https://github.com/hauldhut/drivergene.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam.,Faculty of Pharmacy, Dainam University, Hanoi, Vietnam
| | - Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam. .,College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam.
| |
Collapse
|
42
|
Leonavicius K, Royer C, Miranda AMA, Tyser RCV, Kip A, Srinivas S. Spatial protein analysis in developing tissues: a sampling-based image processing approach. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190560. [PMID: 32829691 PMCID: PMC7482225 DOI: 10.1098/rstb.2019.0560] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/22/2020] [Indexed: 11/19/2022] Open
Abstract
Advances in fluorescence microscopy approaches have made it relatively easy to generate multi-dimensional image volumes and have highlighted the need for flexible image analysis tools for the extraction of quantitative information from such data. Here we demonstrate that by focusing on simplified feature-based nuclear segmentation and probabilistic cytoplasmic detection we can create a tool that is able to extract geometry-based information from diverse mammalian tissue images. Our open-source image analysis platform, called 'SilentMark', can cope with three-dimensional noisy images and with crowded fields of cells to quantify signal intensity in different cellular compartments. Additionally, it provides tissue geometry related information, which allows one to quantify protein distribution with respect to marked regions of interest. The lightweight SilentMark algorithms have the advantage of not requiring multiple processors, graphics cards or training datasets and can be run even with just several hundred megabytes of memory. This makes it possible to use the method as a Web application, effectively eliminating setup hurdles and compatibility issues with operating systems. We test this platform on mouse pre-implantation embryos, embryonic stem cell-derived embryoid bodies and mouse embryonic heart, and relate protein localization to tissue geometry. This article is part of a discussion meeting issue 'Contemporary morphogenesis'.
Collapse
|
43
|
Blumenberg L, Ruggles KV. Hypercluster: a flexible tool for parallelized unsupervised clustering optimization. BMC Bioinformatics 2020; 21:428. [PMID: 32993491 PMCID: PMC7525959 DOI: 10.1186/s12859-020-03774-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 09/22/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Unsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. RESULTS We present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Users can efficiently evaluate a huge range of clustering results from multiple models and hyperparameters to identify an optimal model. CONCLUSIONS Hypercluster improves ease of use, robustness and reproducibility for unsupervised clustering application for high throughput biology. Hypercluster is available on pip and bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster .
Collapse
Affiliation(s)
- Lili Blumenberg
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| | - Kelly V. Ruggles
- Institute of Systems Genetics, New York University Grossman School of Medicine, New York, NY 10016 USA
- Department of Medicine, New York University Grossman School of Medicine, New York, NY 10016 USA
| |
Collapse
|
44
|
Chen R, Yang L, Goodison S, Sun Y. Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data. Bioinformatics 2020; 36:1476-1483. [PMID: 31603461 DOI: 10.1093/bioinformatics/btz769] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 08/24/2019] [Accepted: 10/08/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Cancer subtype classification has the potential to significantly improve disease prognosis and develop individualized patient management. Existing methods are limited by their ability to handle extremely high-dimensional data and by the influence of misleading, irrelevant factors, resulting in ambiguous and overlapping subtypes. RESULTS To address the above issues, we proposed a novel approach to disentangling and eliminating irrelevant factors by leveraging the power of deep learning. Specifically, we designed a deep-learning framework, referred to as DeepType, that performs joint supervised classification, unsupervised clustering and dimensionality reduction to learn cancer-relevant data representation with cluster structure. We applied DeepType to the METABRIC breast cancer dataset and compared its performance to state-of-the-art methods. DeepType significantly outperformed the existing methods, identifying more robust subtypes while using fewer genes. The new approach provides a framework for the derivation of more accurate and robust molecular cancer subtypes by using increasingly complex, multi-source data. AVAILABILITY AND IMPLEMENTATION An open-source software package for the proposed method is freely available at http://www.acsu.buffalo.edu/~yijunsun/lab/DeepType.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Runpu Chen
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Le Yang
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| | - Steve Goodison
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Yijun Sun
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA.,Department of Microbiology and Immunology.,Department of Biostatistics, University at Buffalo, The State University of New York, Buffalo, NY 14214, USA
| |
Collapse
|
45
|
Cheng KS, Pan R, Pan H, Li B, Meena SS, Xing H, Ng YJ, Qin K, Liao X, Kosgei BK, Wang Z, Han RP. ALICE: a hybrid AI paradigm with enhanced connectivity and cybersecurity for a serendipitous encounter with circulating hybrid cells. Am J Cancer Res 2020; 10:11026-11048. [PMID: 33042268 PMCID: PMC7532685 DOI: 10.7150/thno.44053] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 05/11/2020] [Indexed: 12/12/2022] Open
Abstract
A fully automated and accurate assay of rare cell phenotypes in densely-packed fluorescently-labeled liquid biopsy images remains elusive. Methods: Employing a hybrid artificial intelligence (AI) paradigm that combines traditional rule-based morphological manipulations with modern statistical machine learning, we deployed a next generation software, ALICE (Automated Liquid Biopsy Cell Enumerator) to identify and enumerate minute amounts of tumor cell phenotypes bestrewed in massive populations of leukocytes. As a code designed for futurity, ALICE is armed with internet of things (IOT) connectivity to promote pedagogy and continuing education and also, an advanced cybersecurity system to safeguard against digital attacks from malicious data tampering. Results: By combining robust principal component analysis, random forest classifier and cubic support vector machine, ALICE was able to detect synthetic, anomalous and tampered input images with an average recall and precision of 0.840 and 0.752, respectively. In terms of phenotyping enumeration, ALICE was able to enumerate various circulating tumor cell (CTC) phenotypes with a reliability ranging from 0.725 (substantial agreement) to 0.961 (almost perfect) as compared to human analysts. Further, two subpopulations of circulating hybrid cells (CHCs) were serendipitously discovered and labeled as CHC-1 (DAPI+/CD45+/E-cadherin+/vimentin-) and CHC-2 (DAPI+ /CD45+/E-cadherin+/vimentin+) in the peripheral blood of pancreatic cancer patients. CHC-1 was found to correlate with nodal staging and was able to classify lymph node metastasis with a sensitivity of 0.615 (95% CI: 0.374-0.898) and specificity of 1.000 (95% CI: 1.000-1.000). Conclusion: This study presented a machine-learning-augmented rule-based hybrid AI algorithm with enhanced cybersecurity and connectivity for the automatic and flexibly-adapting enumeration of cellular liquid biopsies. ALICE has the potential to be used in a clinical setting for an accurate and reliable enumeration of CTC phenotypes.
Collapse
|
46
|
Bing X, Bunea F, Ning Y, Wegkamp M. Adaptive estimation in structured factor models with applications to overlapping clustering. Ann Stat 2020. [DOI: 10.1214/19-aos1877] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
47
|
Sabbah MS, Fayyaz AU, de Denus S, Felker GM, Borlaug BA, Dasari S, Carter RE, Redfield MM. Obese-Inflammatory Phenotypes in Heart Failure With Preserved Ejection Fraction. Circ Heart Fail 2020; 13:e006414. [PMID: 32809874 DOI: 10.1161/circheartfailure.119.006414] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
BACKGROUND Comorbidity-driven microvascular inflammation is posited as a unifying pathophysiologic mechanism for heart failure with preserved ejection fraction (HFpEF). Obesity is proinflammatory and common in HFpEF. We hypothesized that unique obesity-inflammation HFpEF phenotypes exist and are associated with differences in clinical features, fibrosis biomarkers, and functional performance. METHODS Patients (n=301) from 3 HFpEF clinical trials were studied. Unsupervised machine learning (hierarchical clustering) with obese status and 13 inflammatory biomarkers as input variables was performed. Associations of clusters with HFpEF severity and fibrosis biomarkers (PIIINP [procollagen III N-terminal peptide], CITP [C-telopeptide for type I collagen], IGFBP7 [insulin-like growth factor-binding protein-7], and GAL-3 [galectin-3]) were assessed. RESULTS Hierarchical clustering revealed 3 phenotypes: pan-inflammatory (n=129; 64% obese), noninflammatory (n=83; 55% obese), and obese high CRP (C-reactive protein; n=89; 98% obese). The pan-inflammatory phenotype had more comorbidities and heart failure hospitalizations; higher left atrial volume, NT-proBNP (N-terminal pro-B-type natriuretic peptide), and fibrosis biomarkers; and lower glomerular filtration rate, peak oxygen consumption, 6-minute walk distance, and active hours/day (P<0.05 for all). The noninflammatory phenotype had the most favorable values for all measures. The obese high CRP phenotype resembled the noninflammatory phenotype except for isolated elevation of CRP and lower functional performance. Hierarchical cluster assignment was independent of CRP genotype combinations that alter CRP levels and more biologically plausible than other clustering approaches. Multiple traditional analytic techniques confirmed and extended the hierarchical clustering findings. CONCLUSIONS Unique obesity-inflammation phenotypes exist in HFpEF and are associated with differences in comorbidity burden, HFpEF severity, and fibrosis. These data support comorbidity-driven microvascular inflammation as a pathophysiologic mechanism for many but not all HFpEF patients.
Collapse
Affiliation(s)
- Michael S Sabbah
- Department of Cardiovascular Disease (M.S.S., A.U.F., B.A.B., M.M.R.), Mayo Clinic, Rochester, MN.,Center for Regenerative Medicine (M.S.S.), Mayo Clinic, Rochester, MN
| | - Ahmed U Fayyaz
- Department of Cardiovascular Disease (M.S.S., A.U.F., B.A.B., M.M.R.), Mayo Clinic, Rochester, MN
| | - Simon de Denus
- Research Centre, Montreal Heart Institute, QC, Canada (S.d.D.).,Université de Montréal Beaulieu-Saucier Pharmacogenomics Center, QC, Canada (S.d.D.).,Department of Pharmacy, Université de Montréal, QC, Canada (S.d.D.)
| | - G Michael Felker
- Duke Clinical Research Institute, Duke University, Durham, NC (G.M.F.)
| | - Barry A Borlaug
- Department of Cardiovascular Disease (M.S.S., A.U.F., B.A.B., M.M.R.), Mayo Clinic, Rochester, MN
| | - Surendra Dasari
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL (S.D., R.E.C.)
| | - Rickey E Carter
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL (S.D., R.E.C.)
| | - Margaret M Redfield
- Department of Cardiovascular Disease (M.S.S., A.U.F., B.A.B., M.M.R.), Mayo Clinic, Rochester, MN
| |
Collapse
|
48
|
Marques JC, Orger MB. Clusterdv: a simple density-based clustering method that is robust, general and automatic. Bioinformatics 2020; 35:2125-2132. [PMID: 30407500 PMCID: PMC6581440 DOI: 10.1093/bioinformatics/bty932] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 10/15/2018] [Accepted: 11/07/2018] [Indexed: 12/14/2022] Open
Abstract
Motivation How to partition a dataset into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data vary widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. Results We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental datasets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data. Availability and implementation The clusterdv is implemented in Matlab. Its source code, together with example datasets are available on: https://github.com/jcbmarques/clusterdv. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- João C Marques
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, Doca de Pedrouços, Lisboa, Portugal.,Rowland Institute at Harvard, 100 Edwin H. Land Boulevard, Cambridge, MA, USA
| | - Michael B Orger
- Champalimaud Research, Champalimaud Centre for the Unknown, Avenida Brasília, Doca de Pedrouços, Lisboa, Portugal
| |
Collapse
|
49
|
Magland J, Jun JJ, Lovero E, Morley AJ, Hurwitz CL, Buccino AP, Garcia S, Barnett AH. SpikeForest, reproducible web-facing ground-truth validation of automated neural spike sorters. eLife 2020; 9:e55167. [PMID: 32427564 PMCID: PMC7237210 DOI: 10.7554/elife.55167] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 05/05/2020] [Indexed: 01/03/2023] Open
Abstract
Spike sorting is a crucial step in electrophysiological studies of neuronal activity. While many spike sorting packages are available, there is little consensus about which are most accurate under different experimental conditions. SpikeForest is an open-source and reproducible software suite that benchmarks the performance of automated spike sorting algorithms across an extensive, curated database of ground-truth electrophysiological recordings, displaying results interactively on a continuously-updating website. With contributions from eleven laboratories, our database currently comprises 650 recordings (1.3 TB total size) with around 35,000 ground-truth units. These data include paired intracellular/extracellular recordings and state-of-the-art simulated recordings. Ten of the most popular spike sorting codes are wrapped in a Python package and evaluated on a compute cluster using an automated pipeline. SpikeForest documents community progress in automated spike sorting, and guides neuroscientists to an optimal choice of sorter and parameters for a wide range of probes and brain regions.
Collapse
Affiliation(s)
- Jeremy Magland
- Center for Computational Mathematics, Flatiron InstituteNew YorkUnited States
| | - James J Jun
- Center for Computational Mathematics, Flatiron InstituteNew YorkUnited States
| | - Elizabeth Lovero
- Scientific Computing Core, Flatiron InstituteNew YorkUnited States
| | - Alexander J Morley
- Medical Research Council Brain Network Dynamics Unit, University of OxfordOxfordUnited Kingdom
| | - Cole Lincoln Hurwitz
- Institute for Adaptive and Neural Computation Informatics, University of EdinburghEdinburghUnited Kingdom
| | | | - Samuel Garcia
- Centre de Recherche en Neuroscience de Lyon, Université de LyonLyonFrance
| | - Alex H Barnett
- Center for Computational Mathematics, Flatiron InstituteNew YorkUnited States
| |
Collapse
|
50
|
Park S, Smith J, Dunkle RE, Ingersoll-Dayton B, Antonucci TC. Health and Social-Physical Environment Profiles Among Older Adults Living Alone: Associations With Depressive Symptoms. J Gerontol B Psychol Sci Soc Sci 2020. [PMID: 28637214 DOI: 10.1093/geronb/gbx003] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
OBJECTIVES We examined differences in depressive symptoms among people 65 and older who live alone, exploring whether these differences are associated with both health and environmental contexts. METHOD Data are from the 2006 wave of Health Retirement Study (N = 2,956, age range: 65-104). We used a two-step cluster analytical approach to identify subgroups of health-limitation profiles and environmental profiles. Logistic regression models determined associations between subgroups and depressive symptoms. RESULTS Cluster analysis identified four health-profile subgroups (sensory-cognitively impaired, physically impaired, multiply impaired, and healthy) and three different physical-social environmental-profile subgroups (physically average/socially unsupported, physically unsupported/socially supported, and physically supported/socially above average). Compared to members of healthier groups, members of the multiply impaired group were the oldest and were more likely both to live in senior housing and to have depressive symptoms if they lived in a physically average/socially unsupported environment. Members of the sensory-cognitively impaired group were more likely to have depressive symptoms when they lived in a physically unsupported/socially supported environment. DISCUSSION Findings regarding the range of both health and social-physical environmental profiles as well as the associations between person-environment profiles combinations (fit) and depressive symptomatology have important policy and intervention implications.
Collapse
Affiliation(s)
- Sojung Park
- Brown School of Social Work, Washington University in Saint Louis, Missouri
| | - Jacqui Smith
- Department of Psychology and Institute for Social Research, Ann Arbor
| | - Ruth E Dunkle
- School of Social Work, University of Michigan, Ann Arbor
| | | | - Toni C Antonucci
- Department of Psychology and Institute for Social Research, Ann Arbor
| |
Collapse
|