1
|
Shamir I, Assaf Y, Shamir R. Clustering the cortical laminae: in vivo parcellation. Brain Struct Funct 2024; 229:443-458. [PMID: 38193916 PMCID: PMC10917860 DOI: 10.1007/s00429-023-02748-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024]
Abstract
The laminar microstructure of the cerebral cortex has distinct anatomical characteristics of the development, function, connectivity, and even various pathologies of the brain. In recent years, multiple neuroimaging studies have utilized magnetic resonance imaging (MRI) relaxometry to visualize and explore this intricate microstructure, successfully delineating the cortical laminar components. Despite this progress, T1 is still primarily considered a direct measure of myeloarchitecture (myelin content), rather than a probe of tissue cytoarchitecture (cellular composition). This study aims to offer a robust, whole-brain validation of T1 imaging as a practical and effective tool for exploring the laminar composition of the cortex. To do so, we cluster complex microstructural cortical datasets of both human (N = 30) and macaque (N = 1) brains using an adaptation of an algorithm for clustering cell omics profiles. The resulting cluster patterns are then compared to established atlases of cytoarchitectonic features, exhibiting significant correspondence in both species. Lastly, we demonstrate the expanded applicability of T1 imaging by exploring some of the cytoarchitectonic features behind various unique skillsets, such as musicality and athleticism.
Collapse
Affiliation(s)
- Ittai Shamir
- Department of Neurobiology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| | - Yaniv Assaf
- Department of Neurobiology, Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
2
|
Barrett C, Bura AC, He Q, Huang FW, Li TJX, Reidys CM. Motifs in SARS-CoV-2 evolution. RNA (NEW YORK, N.Y.) 2023; 30:1-15. [PMID: 37903545 PMCID: PMC10726165 DOI: 10.1261/rna.079557.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 09/20/2023] [Indexed: 11/01/2023]
Abstract
We present a novel framework enhancing the prediction of whether novel lineage poses the threat of eventually dominating the viral population. The framework is based purely on genomic sequence data, without requiring prior established biological analysis. Its building blocks are sets of coevolving sites in the alignment (motifs), identified via coevolutionary signals. The collection of such motifs forms a relational structure over the polymorphic sites. Motifs are constructed using distances quantifying the coevolutionary coupling of pairs and manifest as coevolving clusters of sites. We present an approach to genomic surveillance based on this notion of relational structure. Our system will issue an alert regarding a lineage, based on its contribution to drastic changes in the relational structure. We then conduct a comprehensive retrospective analysis of the COVID-19 pandemic based on SARS-CoV-2 genomic sequence data in GISAID from October 2020 to September 2022, across 21 lineages and 27 countries with weekly resolution. We investigate the performance of this surveillance system in terms of its accuracy, timeliness, and robustness. Lastly, we study how well each lineage is classified by such a system.
Collapse
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
- Department of Computer Science, University of Virginia, Charlottesville, Virginia 22904, USA
| | - Andrei C Bura
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
| | - Qijun He
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
| | - Fenix W Huang
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
| | - Thomas J X Li
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
| | - Christian M Reidys
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, Virginia 22904, USA
- Department of Mathematics, University of Virginia, Charlottesville, Virginia 22904, USA
| |
Collapse
|
3
|
Zhu W, Shenoy A, Kundrotas P, Elofsson A. Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes. Bioinformatics 2023; 39:btad424. [PMID: 37405868 PMCID: PMC10348836 DOI: 10.1093/bioinformatics/btad424] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/25/2023] [Accepted: 07/04/2023] [Indexed: 07/07/2023] Open
Abstract
MOTIVATION Despite near-experimental accuracy on single-chain predictions, there is still scope for improvement among multimeric predictions. Methods like AlphaFold-Multimer and FoldDock can accurately model dimers. However, how well these methods fare on larger complexes is still unclear. Further, evaluation methods of the quality of multimeric complexes are not well established. RESULTS We analysed the performance of AlphaFold-Multimer on a homology-reduced dataset of homo- and heteromeric protein complexes. We highlight the differences between the pairwise and multi-interface evaluation of chains within a multimer. We describe why certain complexes perform well on one metric (e.g. TM-score) but poorly on another (e.g. DockQ). We propose a new score, Predicted DockQ version 2 (pDockQ2), to estimate the quality of each interface in a multimer. Finally, we modelled protein complexes (from CORUM) and identified two highly confident structures that do not have sequence homology to any existing structures. AVAILABILITY AND IMPLEMENTATION All scripts, models, and data used to perform the analysis in this study are freely available at https://gitlab.com/ElofssonLab/afm-benchmark.
Collapse
Affiliation(s)
- Wensi Zhu
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna 171 21, Sweden
| | - Aditi Shenoy
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna 171 21, Sweden
| | - Petras Kundrotas
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna 171 21, Sweden
- Center for Computational Biology, The University of Kansas, Lawrence, KS 66047, United States
| | - Arne Elofsson
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna 171 21, Sweden
| |
Collapse
|
4
|
Differential evolution-based transfer rough clustering algorithm. COMPLEX INTELL SYST 2023. [DOI: 10.1007/s40747-023-00987-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
AbstractDue to well processing the uncertainty in data, rough clustering methods have been successfully applied in many fields. However, when the capacity of the available data is limited or the data are disturbed by noise, the rough clustering algorithms always cannot effectively explore the structure of the data. Furthermore, rough clustering algorithms are usually sensitive to the initialized cluster centers and easy to fall into local optimum. To resolve the problems mentioned above, a novel differential evolution-based transfer rough clustering (DE-TRC) algorithm is proposed in this paper. First, transfer learning mechanism is introduced into rough clustering and a transfer rough clustering framework is designed, which utilizes the knowledge from the related domain to assist the clustering task. Then, the objective function of the transfer rough clustering algorithm is optimized by using the differential evolution algorithm to enhance the robustness of the algorithm. It can overcome the sensitivity to initialized cluster centers and meanwhile achieve the global optimal clustering. The proposed algorithm is validated on different synthetic and real-world datasets. Experimental results demonstrate the effectiveness of the proposed algorithm in comparison with both traditional rough clustering algorithms and other state-of-the-art clustering algorithms.
Collapse
|
5
|
Wang HY, Wang JS, Wang G. A Survey of Fuzzy Clustering Validity Evaluation Methods. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
6
|
DOCKGROUND membrane protein-protein set. PLoS One 2022; 17:e0267531. [PMID: 35580077 PMCID: PMC9113569 DOI: 10.1371/journal.pone.0267531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 04/10/2022] [Indexed: 11/19/2022] Open
Abstract
Membrane proteins are significantly underrepresented in Protein Data Bank despite their essential role in cellular mechanisms and the major progress in experimental protein structure determination. Thus, computational approaches are especially valuable in the case of membrane proteins and their assemblies. The main focus in developing structure prediction techniques has been on soluble proteins, in part due to much greater availability of the structural data. Currently, structure prediction of protein complexes (protein docking) is a well-developed field of study. However, the generic protein docking approaches are not optimal for the membrane proteins because of the differences in physicochemical environment and the spatial constraints imposed by the membranes. Thus, docking of the membrane proteins requires specialized computational methods. Development and benchmarking of the membrane protein docking approaches has to be based on high-quality sets of membrane protein complexes. In this study we present a new dataset of 456 non-redundant alpha helical binary interfaces. The set is significantly larger and more representative than the previously developed sets. In the future, it will become the basis for the development of docking and scoring benchmarks, similar to the ones for soluble proteins in the Dockground resource http://dockground.compbio.ku.edu.
Collapse
|
7
|
Tian F, Hu J, Yang W. GEOMScope: Large Field-of-view 3D Lensless Microscopy with Low Computational Complexity. LASER & PHOTONICS REVIEWS 2021; 15:2100072. [PMID: 34539926 PMCID: PMC8445384 DOI: 10.1002/lpor.202100072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Indexed: 05/12/2023]
Abstract
Imaging systems with miniaturized device footprint, real-time processing speed and high resolution three-dimensional (3D) visualization are critical to broad biomedical applications such as endoscopy. Most of existing imaging systems rely on bulky lenses and mechanically refocusing to perform 3D imaging. Here, we demonstrate GEOMScope, a lensless single-shot 3D microscope that forms image through a single layer of thin microlens array and reconstructs objects through an innovative algorithm combining geometrical-optics-based pixel back projection and background suppressions. We verify the effectiveness of GEOMScope on resolution target, fluorescent particles and volumetric objects. Comparing to other widefield lensless imaging devices, we significantly reduce the required computational resource and increase the reconstruction speed by orders of magnitude. This enables us to image and recover large volume 3D object in high resolution with near real-time processing speed. Such a low computational complexity is attributed to the joint design of imaging optics and reconstruction algorithms, and a joint application of geometrical optics and machine learning in the 3D reconstruction. More broadly, the excellent performance of GEOMScope in imaging resolution, volume, and reconstruction speed implicates that geometrical optics could greatly benefit and play an important role in computational imaging.
Collapse
Affiliation(s)
- Feng Tian
- Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA
| | - Junjie Hu
- Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA
| | - Weijian Yang
- Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA
| |
Collapse
|
8
|
Automatic Unsupervised Texture Recognition Framework Using Anisotropic Diffusion-Based Multi-Scale Analysis and Weight-Connected Graph Clustering. Symmetry (Basel) 2021. [DOI: 10.3390/sym13060925] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
A novel unsupervised texture classification technique is proposed in this research work. The proposed method clusters automatically the textures of an image collection in similarity classes whose number is not a priori known. A nonlinear diffusion-based multi-scale texture analysis approach is introduced first. It creates an effective scale-space by using a well-posed anisotropic diffusion filtering model that is proposed and approximated numerically here. A feature extraction process using a bank of circularly symmetric 2D filters is applied at each scale, then a rotation-invariant texture feature vector is achieved for the current image by combining the feature vectors computed at all these scales. Next, a weighted similarity graph, whose vertices correspond to the texture feature vectors and the weights of its edges are obtained from the distances computed between these vectors, is created. A novel weighted graph clustering technique is then applied to this similarity graph, to determine the texture classes. Numerical simulations and method comparisons illustrating the effectiveness of the described framework are also discussed in this work.
Collapse
|
9
|
Omranian S, Angeleska A, Nikoloski Z. PC2P: Parameter-free network-based prediction of protein complexes. Bioinformatics 2021; 37:73-81. [PMID: 33416831 PMCID: PMC8034538 DOI: 10.1093/bioinformatics/btaa1089] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/17/2020] [Accepted: 12/30/2020] [Indexed: 11/12/2022] Open
Abstract
Motivation Prediction of protein complexes from protein–protein interaction (PPI) networks is an important problem in systems biology, as they control different cellular functions. The existing solutions employ algorithms for network community detection that identify dense subgraphs in PPI networks. However, gold standards in yeast and human indicate that protein complexes can also induce sparse subgraphs, introducing further challenges in protein complex prediction. Results To address this issue, we formalize protein complexes as biclique spanned subgraphs, which include both sparse and dense subgraphs. We then cast the problem of protein complex prediction as a network partitioning into biclique spanned subgraphs with removal of minimum number of edges, called coherent partition. Since finding a coherent partition is a computationally intractable problem, we devise a parameter-free greedy approximation algorithm, termed Protein Complexes from Coherent Partition (PC2P), based on key properties of biclique spanned subgraphs. Through comparison with nine contenders, we demonstrate that PC2P: (i) successfully identifies modular structure in networks, as a prerequisite for protein complex prediction, (ii) outperforms the existing solutions with respect to a composite score of five performance measures on 75% and 100% of the analyzed PPI networks and gold standards in yeast and human, respectively, and (iii,iv) does not compromise GO semantic similarity and enrichment score of the predicted protein complexes. Therefore, our study demonstrates that clustering of networks in terms of biclique spanned subgraphs is a promising framework for detection of complexes in PPI networks. Availability and implementation https://github.com/SaraOmranian/PC2P. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sara Omranian
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476, Potsdam, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, 14476, Potsdam, Germany.,Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, 14476, Potsdam, Germany.,Centre of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
10
|
Vassaux M, Gopalakrishnan K, Sinclair RC, Richardson RA, Coveney PV. Accelerating Heterogeneous Multiscale Simulations of Advanced Materials Properties with Graph‐Based Clustering. ADVANCED THEORY AND SIMULATIONS 2020. [DOI: 10.1002/adts.202000234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Maxime Vassaux
- Centre for Computational Sciences University College London 20 Gordon Street London WC1H 0AJ UK
| | | | - Robert C. Sinclair
- Centre for Computational Sciences University College London 20 Gordon Street London WC1H 0AJ UK
| | - Robin. A. Richardson
- Centre for Computational Sciences University College London 20 Gordon Street London WC1H 0AJ UK
- Netherlands eScience Center Science Park 140, 1098 XG Amsterdam The Netherlands
| | - Peter V. Coveney
- Centre for Computational Sciences University College London 20 Gordon Street London WC1H 0AJ UK
| |
Collapse
|
11
|
Zhang P, Bhaskarabhatla S. How advocacy affects Twitter migraine conversations: A pilot cross-sectional survey of Northeast American “migraine” landscape on Twitter from May to June 2020. CEPHALALGIA REPORTS 2020. [DOI: 10.1177/2515816320972085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background: Twitter is a leading microblogging platform, with over 126 million daily active users as of 2019, which allows for large-scale analysis of tweets related to migraine. June 2020 encompassed the National Migraine and Headache Awareness Month in the United States and the American Headache Society’s virtual annual conference, which offer opportunities for us to study online migraine advocacy. Objective: We aim to study the content of individual tweets about migraine, as well as study patterns of other topics that were discussed in those tweets. In addition, we aim to study the sources of information that people reference within their tweets. Thirdly, we want to study how online awareness and advocacy movements shape these conversations about migraine. Methods: We designed a Twitter robot that records all unique public tweets containing the word “migraine” from May 8th, 2020 to June 23rd, 2020, within a 400 km radius of New Brunswick, New Jersey, United States. We built two network analysis models, one for the months of May 2020 and June 2020. The model for the month of May served as a control group for the model for the month of June, the Migraine Awareness Month. Our network model was developed with the following rule: if two hashtag topics co-exist in a single tweet, they are considered nodes connected by an edge in our network model. We then determine the top 30 most important hashtags in the month of May and June through applications of degree, between-ness, and closeness centrality. We also generated highly connected subgraphs (HCS) to categorize clusters of conversations within each of our models. Finally, we tally the websites referenced by these tweets during each month and categorized these websites according to the HCS subgroups. Results: Migraine advocacy related tweets are more popular in June when compared to May as judged by degree and closeness centrality measurements. They remained unchanged when judged by between-ness centralities. The HCS algorithm categorizes the hashtags into a large single dominant conversation in both months. In each of the months, advocacy related hashtags are apart of each of the dominant conversation. There are more hashtag topics as well as more unique websites referenced in the dominant conversation in June than in May. In addition, there are many smaller subgroups of migraine-related hashtags, and in each of these subgroups, there are a maximum of two websites referenced. Conclusion: We find a network analysis approach to be fruitful in the area of migraine social media research. Migraine advocacy tweets on Twitter not only rise in popularity during migraine awareness month but also may potentially bring in more diverse sources of online references into the Twitter migraine conversation. The smaller subgroups we identified suggest that there are marginalized conversations referencing a limited number of websites, creating a possibility of an “echo chamber” phenomenon. These subgroups provide an opportunity for targeted migraine advocacy. Our study therefore highlights the success as well as potential opportunities for social media advocacy on Twitter.
Collapse
Affiliation(s)
- Pengfei Zhang
- Department of Neurology, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
| | - Santosh Bhaskarabhatla
- Department of Neurology, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
| |
Collapse
|
12
|
Lamsal R, Katiyar S. cs-means: Determining optimal number of clusters based on a level-of-similarity. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-03582-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
13
|
Brooks BD, Closmore A, Yang J, Holland M, Cairns T, Cohen GH, Bailey-Kellogg C. Characterizing Epitope Binding Regions of Entire Antibody Panels by Combining Experimental and Computational Analysis of Antibody: Antigen Binding Competition. Molecules 2020; 25:molecules25163659. [PMID: 32796656 PMCID: PMC7464469 DOI: 10.3390/molecules25163659] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 07/27/2020] [Accepted: 07/28/2020] [Indexed: 11/16/2022] Open
Abstract
Vaccines and immunotherapies depend on the ability of antibodies to sensitively and specifically recognize particular antigens and specific epitopes on those antigens. As such, detailed characterization of antibody-antigen binding provides important information to guide development. Due to the time and expense required, high-resolution structural characterization techniques are typically used sparingly and late in a development process. Here, we show that antibody-antigen binding can be characterized early in a process for whole panels of antibodies by combining experimental and computational analyses of competition between monoclonal antibodies for binding to an antigen. Experimental "epitope binning" of monoclonal antibodies uses high-throughput surface plasmon resonance to reveal which antibodies compete, while a new complementary computational analysis that we call "dock binning" evaluates antibody-antigen docking models to identify why and where they might compete, in terms of possible binding sites on the antigen. Experimental and computational characterization of the identified antigenic hotspots then enables the refinement of the competitors and their associated epitope binding regions on the antigen. While not performed at atomic resolution, this approach allows for the group-level identification of functionally related monoclonal antibodies (i.e., communities) and identification of their general binding regions on the antigen. By leveraging extensive epitope characterization data that can be readily generated both experimentally and computationally, researchers can gain broad insights into the basis for antibody-antigen recognition in wide-ranging vaccine and immunotherapy discovery and development programs.
Collapse
Affiliation(s)
- Benjamin D. Brooks
- Department of Biomedical Sciences, Rocky Vista University, Ivins, UT 84738, USA
- Inovan Inc., Fargo, ND 58102, USA
- Department of Microbiology, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (T.C.); (G.H.C.)
- Correspondence: ; Tel.: +1-435-222-1403
| | - Adam Closmore
- Department of Pharmacy, North Dakota State University, Fargo, ND 58102, USA;
| | - Juechen Yang
- Department of Biomedical Engineering, North Dakota State University, Fargo, ND 58102, USA; (J.Y.); (M.H.)
| | - Michael Holland
- Department of Biomedical Engineering, North Dakota State University, Fargo, ND 58102, USA; (J.Y.); (M.H.)
| | - Tina Cairns
- Department of Microbiology, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (T.C.); (G.H.C.)
| | - Gary H. Cohen
- Department of Microbiology, School of Dental Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; (T.C.); (G.H.C.)
| | | |
Collapse
|
14
|
Tran D, Nguyen H, Le U, Bebis G, Luu HN, Nguyen T. A Novel Method for Cancer Subtyping and Risk Prediction Using Consensus Factor Analysis. Front Oncol 2020; 10:1052. [PMID: 32714868 PMCID: PMC7344292 DOI: 10.3389/fonc.2020.01052] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Accepted: 05/27/2020] [Indexed: 01/04/2023] Open
Abstract
Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. One critical unmet challenge is that molecular disease subtypes characterized by relevant clinical differences, such as survival, are difficult to differentiate. With the advancement of multi-omics technologies, subtyping methods have shifted toward data integration in order to differentiate among subtypes from a holistic perspective that takes into consideration phenomena at multiple levels. However, these integrative methods are still limited by their statistical assumption and their sensitivity to noise. In addition, they are unable to predict the risk scores of patients using multi-omics data. Here, we present a novel approach named Subtyping via Consensus Factor Analysis (SCFA) that can efficiently remove noisy signals from consistent molecular patterns in order to reliably identify cancer subtypes and accurately predict risk scores of patients. In an extensive analysis of 7,973 samples related to 30 cancers that are available at The Cancer Genome Atlas (TCGA), we demonstrate that SCFA outperforms state-of-the-art approaches in discovering novel subtypes with significantly different survival profiles. We also demonstrate that SCFA is able to predict risk scores that are highly correlated with true patient survival and vital status. More importantly, the accuracy of subtype discovery and risk prediction improves when more data types are integrated into the analysis. The SCFA software and TCGA data packages will be available on Bioconductor.
Collapse
Affiliation(s)
- Duc Tran
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Uyen Le
- NTT Hi-Tech Institute, Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam
| | - George Bebis
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Hung N. Luu
- Division of Cancer Control and Population Sciences, Hillman Cancer Canter, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
- Department of Epidemiology, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, United States
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| |
Collapse
|
15
|
Xu M, Jog V, Loh PL. Optimal rates for community estimation in the weighted stochastic block model. Ann Stat 2020. [DOI: 10.1214/18-aos1797] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
Rzymski C, Tresoldi T, Greenhill SJ, Wu MS, Schweikhard NE, Koptjevskaja-Tamm M, Gast V, Bodt TA, Hantgan A, Kaiping GA, Chang S, Lai Y, Morozova N, Arjava H, Hübler N, Koile E, Pepper S, Proos M, Van Epps B, Blanco I, Hundt C, Monakhov S, Pianykh K, Ramesh S, Gray RD, Forkel R, List JM. The Database of Cross-Linguistic Colexifications, reproducible analysis of cross-linguistic polysemies. Sci Data 2020; 7:13. [PMID: 31932593 PMCID: PMC6957499 DOI: 10.1038/s41597-019-0341-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 11/29/2019] [Indexed: 11/09/2022] Open
Abstract
Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world's languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.
Collapse
Affiliation(s)
- Christoph Rzymski
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany.
| | - Tiago Tresoldi
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany.
| | - Simon J Greenhill
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany.,ARC Centre of Excellence for the Dynamics of Language, Australian National University, Canberra, Australia
| | - Mei-Shin Wu
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Nathanael E Schweikhard
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | | | - Volker Gast
- Friedrich Schiller University, Jena, Germany
| | | | | | | | - Sophie Chang
- Independent English-Chinese Translator and linguistic researcher, Taipei, Taiwan
| | - Yunfan Lai
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Natalia Morozova
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | | | - Nataliia Hübler
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Ezequiel Koile
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | | | | | | | | | | | | | | | | | - Russell D Gray
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Robert Forkel
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany
| | - Johann-Mattis List
- Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany.
| |
Collapse
|
17
|
Affiliation(s)
- Xinfeng Yang
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic China
| | - Xiaodong Yan
- School of Economics, Shandong University, Jinan, People's Republic of China
| |
Collapse
|
18
|
Lensen A, Xue B, Zhang M. Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis. EVOLUTIONARY COMPUTATION 2019; 28:531-561. [PMID: 31599651 DOI: 10.1162/evco_a_00264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.
Collapse
Affiliation(s)
- Andrew Lensen
- Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Bing Xue
- Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Mengjie Zhang
- Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand
| |
Collapse
|
19
|
Chen W, Li W, Huang G, Flavel M. The Applications of Clustering Methods in Predicting Protein Functions. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666181212114612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The understanding of protein function is essential to the study of biological
processes. However, the prediction of protein function has been a difficult task for bioinformatics to
overcome. This has resulted in many scholars focusing on the development of computational methods
to address this problem.
Objective:
In this review, we introduce the recently developed computational methods of protein function
prediction and assess the validity of these methods. We then introduce the applications of clustering
methods in predicting protein functions.
Collapse
Affiliation(s)
- Weiyang Chen
- College of Information, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Weiwei Li
- College of Information, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Guohua Huang
- College of Information Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Matthew Flavel
- School of Life Sciences, La Trobe University, Bundoora, Vic 3083, Australia
| |
Collapse
|
20
|
A social interaction field model accurately identifies static and dynamic social groupings. Nat Hum Behav 2019; 3:847-855. [DOI: 10.1038/s41562-019-0618-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Accepted: 04/24/2019] [Indexed: 11/08/2022]
|
21
|
Guan J, Hsieh F, Koehl P. DCG++: A data-driven metric for geometric pattern recognition. PLoS One 2019; 14:e0217838. [PMID: 31170208 PMCID: PMC6553753 DOI: 10.1371/journal.pone.0217838] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 05/20/2019] [Indexed: 11/19/2022] Open
Abstract
Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure.
Collapse
Affiliation(s)
- Jiahui Guan
- Department of Statistics, University of California Davis, Davis, CA, United States of America
| | - Fushing Hsieh
- Department of Statistics, University of California Davis, Davis, CA, United States of America
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California Davis, Davis, CA, United States of America
- * E-mail:
| |
Collapse
|
22
|
Abstract
The cluster analysis has been widely applied by researchers from several scientific fields over the last decades. Advances in knowledge of biological phenomena have revived a great interest in cluster analysis due in part to the large amount of microarray data. Traditional clustering algorithms show, apart from the need of user-defined parameters, clear limitations to handle microarray data owing to its inherent characteristics: high-dimensional-low-sample-sized, highly redundant, and noisy. That has motivated the study of clustering algorithms tailored to the task of analyzing microarray data, which currently continue being developed and adapted. The present chapter is devoted to review clustering methods with different cluster analysis approaches in the challenging context of microarray data. Furthermore, the validation of the clustering results is briefly discussed by means of validity indexes used to assess the goodness of the number of clusters and the induced cluster assignments.
Collapse
Affiliation(s)
| | - Juana-María Vivo
- Department of Statistics and Operations Research, University of Murcia, Murcia, Spain.
| |
Collapse
|
23
|
|
24
|
Abstract
New reconstruction techniques are generating connectomes of unprecedented size. These must be analyzed to generate human comprehensible results. The analyses being used fall into three general categories. The first is interactive tools used during reconstruction, to help guide the effort, look for possible errors, identify potential cell classes, and answer other preliminary questions. The second type of analysis is support for formal documents such as papers and theses. Scientific norms here require that the data be archived and accessible, and the analysis reproducible. In contrast to some other “omic” fields such as genomics, where a few specific analyses dominate usage, connectomics is rapidly evolving and the analyses used are often specific to the connectome being analyzed. These analyses are typically performed in a variety of conventional programming language, such as Matlab, R, Python, or C++, and read the connectomic data either from a file or through database queries, neither of which are standardized. In the short term we see no alternative to the use of specific analyses, so the best that can be done is to publish the analysis code, and the interface by which it reads connectomic data. A similar situation exists for archiving connectome data. Each group independently makes their data available, but there is no standardized format and long-term accessibility is neither enforced nor funded. In the long term, as connectomics becomes more common, a natural evolution would be a central facility for storing and querying connectomic data, playing a role similar to the National Center for Biotechnology Information for genomes. The final form of analysis is the import of connectome data into downstream tools such as neural simulation or machine learning. In this process, there are two main problems that need to be addressed. First, the reconstructed circuits contain huge amounts of detail, which must be intelligently reduced to a form the downstream tools can use. Second, much of the data needed for these downstream operations must be obtained by other methods (such as genetic or optical) and must be merged with the extracted connectome.
Collapse
|
25
|
Fuzzy Rough C-Mean Based Unsupervised CNN Clustering for Large-Scale Image Data. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8101869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Deep learning has been well-known for a couple of years, and it indicates incredible possibilities for unsupervised learning of representations with the clustering algorithm. The forms of Convolution Neural Networks (CNN) are now state-of-the-art for many recognition and clustering tasks. However, with the perpetual incrementation of digital images, there exist more and more redundant, irrelevant, and noisy samples which cause CNN running to gradually decrease, and its clustering accuracy decreases concurrently. To conquer these issues, we proposed an effective clustering method for a large-scale image dataset which combines CNN and a Fuzzy-Rough C-Mean (FRCM) clustering algorithm. The main idea is that first a high-level representation, learned by multi-layers of CNN with one clustering layer, produce the initial cluster center, then during training image clusters, and representations, are updating jointly. FRCM is utilized to update the cluster centers in the forward pass, while the parameters of proposed CNN are updated by the backward pass based on Stochastic Gradient Descent (SGD). The concept of the rough set of lower and boundary approximations deal with uncertainty, vagueness, and incompleteness in cluster definition, and fuzzy sets enable efficient handling of overlapping partitions in the noisy environment. The experiment results show that the proposed FRCM based unsupervised CNN clustering method is better than the standard K-Mean, Fuzzy C-Mean, FRCM and also other deep-learning-based clustering algorithms on large-scale image data.
Collapse
|
26
|
Ostaszewski M, Kieffer E, Danoy G, Schneider R, Bouvry P. Clustering approaches for visual knowledge exploration in molecular interaction networks. BMC Bioinformatics 2018; 19:308. [PMID: 30157777 PMCID: PMC6116538 DOI: 10.1186/s12859-018-2314-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 08/14/2018] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually. RESULTS We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps. CONCLUSIONS In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration.
Collapse
Affiliation(s)
- Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-Belval, Luxembourg
| | - Emmanuel Kieffer
- Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg, 6, Avenue de la Fonte, Esch-Belval, Luxembourg
| | - Grégoire Danoy
- Computer Science and Communications Research Unit, University of Luxembourg, 6, Avenue de la Fonte, Esch-Belval, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-Belval, Luxembourg
| | - Pascal Bouvry
- Computer Science and Communications Research Unit, University of Luxembourg, 6, Avenue de la Fonte, Esch-Belval, Luxembourg
| |
Collapse
|
27
|
Dörpinghaus J, Schaaf S, Jacobs M. Soft document clustering using a novel graph covering approach. BioData Min 2018; 11:11. [PMID: 30026812 PMCID: PMC6047369 DOI: 10.1186/s13040-018-0172-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/27/2018] [Indexed: 11/26/2022] Open
Abstract
Background In text mining, document clustering describes the efforts to assign unstructured documents to clusters, which in turn usually refer to topics. Clustering is widely used in science for data retrieval and organisation. Results In this paper we present and discuss a novel graph-theoretical approach for document clustering and its application on a real-world data set. We will show that the well-known graph partition to stable sets or cliques can be generalized to pseudostable sets or pseudocliques. This allows to perform a soft clustering as well as a hard clustering. The software is freely available on GitHub. Conclusions The presented integer linear programming as well as the greedy approach for this \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$\mathcal {NP}$\end{document}NP-complete problem lead to valuable results on random instances and some real-world data for different similarity measures. We could show that PS-Document Clustering is a remarkable approach to document clustering and opens the complete toolbox of graph theory to this field. Electronic supplementary material The online version of this article (10.1186/s13040-018-0172-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jens Dörpinghaus
- Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Schloss Birlinghoven, Sankt Augustin, Germany
| | - Sebastian Schaaf
- Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Schloss Birlinghoven, Sankt Augustin, Germany
| | - Marc Jacobs
- Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Schloss Birlinghoven, Sankt Augustin, Germany
| |
Collapse
|
28
|
Barradas-Bautista D, Rosell M, Pallara C, Fernández-Recio J. Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems. PROTEIN-PROTEIN INTERACTIONS IN HUMAN DISEASE, PART A 2018; 110:203-249. [DOI: 10.1016/bs.apcsb.2017.06.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
29
|
Nguyen T, Tagett R, Diaz D, Draghici S. A novel approach for data integration and disease subtyping. Genome Res 2017; 27:2025-2039. [PMID: 29066617 PMCID: PMC5741060 DOI: 10.1101/gr.215129.116] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 09/13/2017] [Indexed: 11/24/2022]
Abstract
Advances in high-throughput technologies allow for measurements of many types of omics data, yet the meaningful integration of several different data types remains a significant challenge. Another important and difficult problem is the discovery of molecular disease subtypes characterized by relevant clinical differences, such as survival. Here we present a novel approach, called perturbation clustering for data integration and disease subtyping (PINS), which is able to address both challenges. The framework has been validated on thousands of cancer samples, using gene expression, DNA methylation, noncoding microRNA, and copy number variation data available from the Gene Expression Omnibus, the Broad Institute, The Cancer Genome Atlas (TCGA), and the European Genome-Phenome Archive. This simultaneous subtyping approach accurately identifies known cancer subtypes and novel subgroups of patients with significantly different survival profiles. The results were obtained from genome-scale molecular data without any other type of prior knowledge. The approach is sufficiently general to replace existing unsupervised clustering approaches outside the scope of bio-medical research, with the additional ability to integrate multiple types of data.
Collapse
Affiliation(s)
- Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, Nevada 89557, USA
| | - Rebecca Tagett
- Department of Computer Science, Wayne State University, Detroit, Michigan 48202, USA
| | - Diana Diaz
- Department of Computer Science, Wayne State University, Detroit, Michigan 48202, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, Michigan 48202, USA.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan 48201, USA
| |
Collapse
|
30
|
Simplified Swarm Optimization-Based Function Module Detection in Protein–Protein Interaction Networks. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7040412] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
31
|
Raghu Kisore N, B Koteswaraiah CH. Improving ATM coverage area using density based clustering algorithm and voronoi diagrams. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.09.058] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
32
|
Poirion OB, Zhu X, Ching T, Garmire L. Single-Cell Transcriptomics Bioinformatics and Computational Challenges. Front Genet 2016; 7:163. [PMID: 27708664 PMCID: PMC5030210 DOI: 10.3389/fgene.2016.00163] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2016] [Accepted: 09/02/2016] [Indexed: 12/21/2022] Open
Abstract
The emerging single-cell RNA-Seq (scRNA-Seq) technology holds the promise to revolutionize our understanding of diseases and associated biological processes at an unprecedented resolution. It opens the door to reveal intercellular heterogeneity and has been employed to a variety of applications, ranging from characterizing cancer cells subpopulations to elucidating tumor resistance mechanisms. Parallel to improving experimental protocols to deal with technological issues, deriving new analytical methods to interpret the complexity in scRNA-Seq data is just as challenging. Here, we review current state-of-the-art bioinformatics tools and methods for scRNA-Seq analysis, as well as addressing some critical analytical challenges that the field faces.
Collapse
Affiliation(s)
- Olivier B Poirion
- Epidemiology Program, University of Hawaii Cancer Center Honolulu, HI, USA
| | - Xun Zhu
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, USA; Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, USA
| | - Travers Ching
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, USA; Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, USA
| | - Lana Garmire
- Epidemiology Program, University of Hawaii Cancer Center Honolulu, HI, USA
| |
Collapse
|
33
|
Thomas J, Seo D, Sael L. Review on Graph Clustering and Subgraph Similarity Based Analysis of Neurological Disorders. Int J Mol Sci 2016; 17:ijms17060862. [PMID: 27258269 PMCID: PMC4926396 DOI: 10.3390/ijms17060862] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Revised: 05/10/2016] [Accepted: 05/24/2016] [Indexed: 01/03/2023] Open
Abstract
How can complex relationships among molecular or clinico-pathological entities of neurological disorders be represented and analyzed? Graphs seem to be the current answer to the question no matter the type of information: molecular data, brain images or neural signals. We review a wide spectrum of graph representation and graph analysis methods and their application in the study of both the genomic level and the phenotypic level of the neurological disorder. We find numerous research works that create, process and analyze graphs formed from one or a few data types to gain an understanding of specific aspects of the neurological disorders. Furthermore, with the increasing number of data of various types becoming available for neurological disorders, we find that integrative analysis approaches that combine several types of data are being recognized as a way to gain a global understanding of the diseases. Although there are still not many integrative analyses of graphs due to the complexity in analysis, multi-layer graph analysis is a promising framework that can incorporate various data types. We describe and discuss the benefits of the multi-layer graph framework for studies of neurological disease.
Collapse
Affiliation(s)
- Jaya Thomas
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
- Department of Computer Science, State University New York Korea, Incheon 406-840, Korea.
| | - Dongmin Seo
- Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea.
| | - Lee Sael
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
- Department of Computer Science, State University New York Korea, Incheon 406-840, Korea.
| |
Collapse
|
34
|
Paul S, Vera J. Rough hypercuboid based supervised clustering of miRNAs. MOLECULAR BIOSYSTEMS 2016; 11:2068-81. [PMID: 25996345 DOI: 10.1039/c5mb00213c] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The microRNAs are small, endogenous non-coding RNAs found in plants, animals, and some viruses, which function in RNA silencing and post-transcriptional regulation of gene expression. It is suggested by various genome-wide studies that a substantial fraction of miRNA genes is likely to form clusters. The coherent expression of the miRNA clusters can then be used to classify samples according to the clinical outcome. In this regard, a new clustering algorithm, termed as rough hypercuboid based supervised attribute clustering (RH-SAC), is proposed to find such groups of miRNAs. The proposed algorithm is based on the theory of rough set, which directly incorporates the information of sample categories into the miRNA clustering process, generating a supervised clustering algorithm for miRNAs. The effectiveness of the new approach is demonstrated on several publicly available miRNA expression data sets using support vector machine. The so-called B.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. The association of the miRNA clusters to various biological pathways is also shown by doing pathway enrichment analysis.
Collapse
Affiliation(s)
- Sushmita Paul
- Laboratory of Systems Tumor Immunology, Department of Dermatology, University of Erlangen-Nürnberg, Hartmannstr. 14, 91052 Erlangen, Germany.
| | | |
Collapse
|
35
|
|
36
|
Carson CG, Levine JS. The finite body triangulation: algorithms, subgraphs, homogeneity estimation and application. J Microsc 2016; 263:268-79. [PMID: 26917441 DOI: 10.1111/jmi.12388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 01/24/2016] [Indexed: 11/27/2022]
Abstract
The concept of a finite body Dirichlet tessellation has been extended to that of a finite body Delaunay 'triangulation' to provide a more meaningful description of the spatial distribution of nonspherical secondary phase bodies in 2- and 3-dimensional images. A finite body triangulation (FBT) consists of a network of minimum edge-to-edge distances between adjacent objects in a microstructure. From this is also obtained the characteristic object chords formed by the intersection of the object boundary with the finite body tessellation. These two sets of distances form the basis of a parsimonious homogeneity estimation. The characteristics of the spatial distribution are then evaluated with respect to the distances between objects and the distances within them. Quantitative analysis shows that more physically representative distributions can be obtained by selecting subgraphs, such as the relative neighbourhood graph and the minimum spanning tree, from the finite body tessellation. To demonstrate their potential, we apply these methods to 3-dimensional X-ray computed tomographic images of foamed cement and their 2-dimensional cross sections. The Python computer code used to estimate the FBT is made available. Other applications for the algorithm - such as porous media transport and crack-tip propagation - are also discussed.
Collapse
Affiliation(s)
- Cantwell G Carson
- National Energy Technology Laboratory, Pittsburgh, Pennsylvania, U.S.A
| | - Jonathan S Levine
- National Energy Technology Laboratory, Pittsburgh, Pennsylvania, U.S.A
| |
Collapse
|
37
|
He Y, Chen H, Huang Y, Wu D, Chen S. Parameter Self-Optimizing Clustering for Autonomous Extraction of the Weld Seam Based on Orientation Saliency in Robotic MAG Welding. J INTELL ROBOT SYST 2016. [DOI: 10.1007/s10846-015-0331-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
38
|
Yu J, Kim SB. A density-based noisy graph partitioning algorithm. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.10.085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
39
|
Acharya S, Saha S. Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework. MOLECULAR BIOSYSTEMS 2016; 12:3478-3501. [DOI: 10.1039/c6mb00609d] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Distance plays an important role in the clustering process for allocating data points to different clusters.
Collapse
Affiliation(s)
- Sudipta Acharya
- Department of Computer Science and Engineering
- Indian Institute of Technology Patna
- India
| | - Sriparna Saha
- Department of Computer Science and Engineering
- Indian Institute of Technology Patna
- India
| |
Collapse
|
40
|
Abstract
BACKGROUND It is well understood that distinct communities of bacteria are present at different sites of the body, and that changes in the structure of these communities have strong implications for human health. Yet, challenges remain in understanding the complex interconnections between the bacterial taxa within these microbial communities and how they change during the progression of diseases. Many recent studies attempt to analyze the human microbiome using traditional ecological measures and cataloging differences in bacterial community membership. In this paper, we show how to push metagenomic analyses beyond mundane questions related to the bacterial taxonomic profiles that differentiate one sample from another. METHODS We develop tools and techniques that help us to investigate the nature of social interactions in microbial communities, and demonstrate ways of compactly capturing extensive information about these networks and visually conveying them in an effective manner. We define the concept of bacterial "social clubs", which are groups of taxa that tend to appear together in many samples. More importantly, we define the concept of "rival clubs", entire groups that tend to avoid occurring together in many samples. We show how to efficiently compute social clubs and rival clubs and demonstrate their utility with the help of examples including a smokers' dataset and a dataset from the Human Microbiome Project (HMP). RESULTS The tools developed provide a framework for analyzing relationships between bacterial taxa modeled as bacterial co-occurrence networks. The computational techniques also provide a framework for identifying clubs and rival clubs and for studying differences in the microbiomes (and their interactions) of two or more collections of samples. CONCLUSIONS Microbial relationships are similar to those found in social networks. In this work, we assume that strong (positive or negative) tendencies to co-occur or co-infect is likely to have biological, physiological, or ecological significance, possibly as a result of cooperation or competition. As a consequence of the analysis, a variety of biological interpretations are conjectured. In the human microbiome context, the pattern of strength of interactions between bacterial taxa is unique to body site.
Collapse
Affiliation(s)
- Mitch Fernandez
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
- Dept. of Computational Medicine and Bioinformatics, College of Medicine, University of Michigan, 48109 Ann Arbor, MI, USA
| | - Juan D Riveros
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
| | - Michael Campos
- Pulmonary & Critical Care Medicine, Miller School of Medicine, University of Miami, 33136 Miami, FL, USA
| | - Kalai Mathee
- Human and Molecular Genetics, Herbert Wertheim College of Medicine, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
| |
Collapse
|
41
|
Gallagher SR, Goldberg DS. Characterization of known protein complexes using k-connectivity and other topological measures. F1000Res 2015; 2:172. [PMID: 26913183 PMCID: PMC4743144 DOI: 10.12688/f1000research.2-172.v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/03/2015] [Indexed: 11/20/2022] Open
Abstract
Many protein complexes are densely packed, so proteins within complexes often interact with several other proteins in the complex. Steric constraints prevent most proteins from simultaneously binding more than a handful of other proteins, regardless of the number of proteins in the complex. Because of this, as complex size increases, several measures of the complex decrease within protein-protein interaction networks. However, k-connectivity, the number of vertices or edges that need to be removed in order to disconnect a graph, may be consistently high for protein complexes. The property of k-connectivity has been little used previously in the investigation of protein-protein interactions. To understand the discriminative power of k-connectivity and other topological measures for identifying unknown protein complexes, we characterized these properties in known Saccharomyces cerevisiae protein complexes in networks generated both from highly accurate X-ray crystallography experiments which give an accurate model of each complex, and also as the complexes appear in high-throughput yeast 2-hybrid studies in which new complexes may be discovered. We also computed these properties for appropriate random subgraphs.We found that clustering coefficient, mutual clustering coefficient, and k-connectivity are better indicators of known protein complexes than edge density, degree, or betweenness. This suggests new directions for future protein complex-finding algorithms.
Collapse
Affiliation(s)
- Suzanne R Gallagher
- Department of Computer Science, University of Colorado, Boulder CO, 80302, USA
| | - Debra S Goldberg
- Department of Computer Science, University of Colorado, Boulder CO, 80302, USA
| |
Collapse
|
42
|
Jafari M, Mirzaie M, Sadeghi M. Interlog protein network: an evolutionary benchmark of protein interaction networks for the evaluation of clustering algorithms. BMC Bioinformatics 2015; 16:319. [PMID: 26437714 PMCID: PMC4595048 DOI: 10.1186/s12859-015-0755-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 09/29/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the field of network science, exploring principal and crucial modules or communities is critical in the deduction of relationships and organization of complex networks. This approach expands an arena, and thus allows further study of biological functions in the field of network biology. As the clustering algorithms that are currently employed in finding modules have innate uncertainties, external and internal validations are necessary. METHODS Sequence and network structure alignment, has been used to define the Interlog Protein Network (IPN). This network is an evolutionarily conserved network with communal nodes and less false-positive links. In the current study, the IPN is employed as an evolution-based benchmark in the validation of the module finding methods. The clustering results of five algorithms; Markov Clustering (MCL), Restricted Neighborhood Search Clustering (RNSC), Cartographic Representation (CR), Laplacian Dynamics (LD) and Genetic Algorithm; to find communities in Protein-Protein Interaction networks (GAPPI) are assessed by IPN in four distinct Protein-Protein Interaction Networks (PPINs). RESULTS The MCL shows a more accurate algorithm based on this evolutionary benchmarking approach. Also, the biological relevance of proteins in the IPN modules generated by MCL is compatible with biological standard databases such as Gene Ontology, KEGG and Reactome. CONCLUSION In this study, the IPN shows its potential for validation of clustering algorithms due to its biological logic and straightforward implementation.
Collapse
Affiliation(s)
- Mohieddin Jafari
- Drug Design and Bioinformatics Unit, Medical Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, 69 Pasteur St, PO Box 13164, Tehran, Iran.
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Shahid Lavasani St, PO Box 19395-5746, Tehran, Iran.
| | - Mehdi Mirzaie
- Department of Computational Biology, Faculty of High Technologies, Tarbiat Modares University, Jalal Ale Ahmad Highway, PO Box 14115-111, Tehran, Iran.
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology (NIGEB), Pajoohesh Blvd, 17 Km Tehran-Karaj Highway, PO Box 161-14965, Tehran, Iran.
| |
Collapse
|
43
|
|
44
|
Wen J, Leucci E, Vendramin R, Kauppinen S, Lund AH, Krogh A, Parker BJ. Transcriptome dynamics of the microRNA inhibition response. Nucleic Acids Res 2015; 43:6207-21. [PMID: 26089393 PMCID: PMC4513874 DOI: 10.1093/nar/gkv603] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We report a high-resolution time series study of transcriptome dynamics following antimiR-mediated inhibition of miR-9 in a Hodgkin lymphoma cell-line—the first such dynamic study of the microRNA inhibition response—revealing both general and specific aspects of the physiological response. We show miR-9 inhibition inducing a multiphasic transcriptome response, with a direct target perturbation before 4 h, earlier than previously reported, amplified by a downstream peak at ∼32 h consistent with an indirect response due to secondary coherent regulation. Predictive modelling indicates a major role for miR-9 in post-transcriptional control of RNA processing and RNA binding protein regulation. Cluster analysis identifies multiple co-regulated gene regulatory modules. Functionally, we observe a shift over time from mRNA processing at early time points to translation at later time points. We validate the key observations with independent time series qPCR and we experimentally validate key predicted miR-9 targets. Methodologically, we developed sensitive functional data analytic predictive methods to analyse the weak response inherent in microRNA inhibition experiments. The methods of this study will be applicable to similar high-resolution time series transcriptome analyses and provides the context for more accurate experimental design and interpretation of future microRNA inhibition studies.
Collapse
Affiliation(s)
- Jiayu Wen
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark
| | - Eleonora Leucci
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark Laboratory for Molecular Cancer Biology, Center for the Biology of Disease, VIB, 3000 Leuven, Belgium; Laboratory for Molecular Cancer Biology, Center of Human Genetics, VIB, 3000 Leuven, Belgium
| | - Roberto Vendramin
- Laboratory for Molecular Cancer Biology, Center for the Biology of Disease, VIB, 3000 Leuven, Belgium; Laboratory for Molecular Cancer Biology, Center of Human Genetics, VIB, 3000 Leuven, Belgium
| | - Sakari Kauppinen
- Department of Haematology, Aalborg University Hospital, A.C. Meyers Vnge 15, 2450 Copenhagen SV, Denmark
| | - Anders H Lund
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark
| | - Anders Krogh
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark
| | - Brian J Parker
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis street, #07-01, Singapore 138671
| |
Collapse
|
45
|
Anishchenko I, Kundrotas PJ, Tuzikov AV, Vakser IA. Structural templates for comparative protein docking. Proteins 2015; 83:1563-70. [PMID: 25488330 DOI: 10.1002/prot.24736] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 11/15/2014] [Accepted: 11/26/2014] [Indexed: 11/07/2022]
Abstract
Structural characterization of protein-protein interactions is important for understanding life processes. Because of the inherent limitations of experimental techniques, such characterization requires computational approaches. Along with the traditional protein-protein docking (free search for a match between two proteins), comparative (template-based) modeling of protein-protein complexes has been gaining popularity. Its development puts an emphasis on full and partial structural similarity between the target protein monomers and the protein-protein complexes previously determined by experimental techniques (templates). The template-based docking relies on the quality and diversity of the template set. We present a carefully curated, nonredundant library of templates containing 4950 full structures of binary complexes and 5936 protein-protein interfaces extracted from the full structures at 12 Å distance cut-off. Redundancy in the libraries was removed by clustering the PDB structures based on structural similarity. The value of the clustering threshold was determined from the analysis of the clusters and the docking performance on a benchmark set. High structural quality of the interfaces in the template and validation sets was achieved by automated procedures and manual curation. The library is included in the Dockground resource for molecular recognition studies at http://dockground.bioinformatics.ku.edu.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047.,United Institute of Informatics Problems, National Academy of Sciences, Minsk, 220012, Belarus
| | - Petras J Kundrotas
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047
| | - Alexander V Tuzikov
- United Institute of Informatics Problems, National Academy of Sciences, Minsk, 220012, Belarus
| | - Ilya A Vakser
- Center for Bioinformatics, The University of Kansas, Lawrence, Kansas, 66047.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66045
| |
Collapse
|
46
|
Ibrahim ZM, Ngom A. The relative vertex clustering value--a new criterion for the fast discovery of functional modules in protein interaction networks. BMC Bioinformatics 2015; 16 Suppl 4:S3. [PMID: 25734691 PMCID: PMC4347617 DOI: 10.1186/1471-2105-16-s4-s3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Cellular processes are known to be modular and are realized by groups of proteins implicated in common biological functions. Such groups of proteins are called functional modules, and many community detection methods have been devised for their discovery from protein interaction networks (PINs) data. In current agglomerative clustering approaches, vertices with just a very few neighbors are often classified as separate clusters, which does not make sense biologically. Also, a major limitation of agglomerative techniques is that their computational efficiency do not scale well to large PINs. Finally, PIN data obtained from large scale experiments generally contain many false positives, and this makes it hard for agglomerative clustering methods to find the correct clusters, since they are known to be sensitive to noisy data. Results We propose a local similarity premetric, the relative vertex clustering value, as a new criterion allowing to decide when a node can be added to a given node's cluster and which addresses the above three issues. Based on this criterion, we introduce a novel and very fast agglomerative clustering technique, FAC-PIN, for discovering functional modules and protein complexes from a PIN data. Conclusions Our proposed FAC-PIN algorithm is applied to nine PIN data from eight different species including the yeast PIN, and the identified functional modules are validated using Gene Ontology (GO) annotations from DAVID Bioinformatics Resources. Identified protein complexes are also validated using experimentally verified complexes. Computational results show that FAC-PIN can discover functional modules or protein complexes from PINs more accurately and more efficiently than HC-PIN and CNM, the current state-of-the-art approaches for clustering PINs in an agglomerative manner.
Collapse
|
47
|
Xu C, Su Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 2015; 31:1974-80. [PMID: 25805722 DOI: 10.1093/bioinformatics/btv088] [Citation(s) in RCA: 320] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 02/08/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The recent advance of single-cell technologies has brought new insights into complex biological phenomena. In particular, genome-wide single-cell measurements such as transcriptome sequencing enable the characterization of cellular composition as well as functional variation in homogenic cell populations. An important step in the single-cell transcriptome analysis is to group cells that belong to the same cell types based on gene expression patterns. The corresponding computational problem is to cluster a noisy high dimensional dataset with substantially fewer objects (cells) than the number of variables (genes). RESULTS In this article, we describe a novel algorithm named shared nearest neighbor (SNN)-Cliq that clusters single-cell transcriptomes. SNN-Cliq utilizes the concept of shared nearest neighbor that shows advantages in handling high-dimensional data. When evaluated on a variety of synthetic and real experimental datasets, SNN-Cliq outperformed the state-of-the-art methods tested. More importantly, the clustering results of SNN-Cliq reflect the cell types or origins with high accuracy. AVAILABILITY AND IMPLEMENTATION The algorithm is implemented in MATLAB and Python. The source code can be downloaded at http://bioinfo.uncc.edu/SNNCliq.
Collapse
Affiliation(s)
- Chen Xu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
48
|
Efficiency Sustainability Resource Visual Simulator for Clustered Desktop Virtualization Based on Cloud Infrastructure. SUSTAINABILITY 2014. [DOI: 10.3390/su6118079] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
49
|
|
50
|
Hüffner F, Komusiewicz C, Liebtrau A, Niedermeier R. Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:455-467. [PMID: 26356014 DOI: 10.1109/tcbb.2013.177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A popular clustering algorithm for biological networks which was proposed by Hartuv and Shamir identifies nonoverlapping highly connected components. We extend the approach taken by this algorithm by introducing the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 75 percent of the edges that are deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 6,000 vertices and 13,500 edges within five hours. Additionally, we present a new heuristic that finds more clusters than the method by Hartuv and Shamir.
Collapse
|