Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Huttenhower C, Flamholz AI, Landis JN, Sahi S, Myers CL, Olszewski KL, Hibbs MA, Siemers NO, Troyanskaya OG, Coller HA. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods. BMC Bioinformatics 2007;8:250. [PMID: 17626636 DOI: 10.1186/1471-2105-8-250] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2006] [Accepted: 07/12/2007] [Indexed: 11/23/2022] Open

For:	Huttenhower C, Flamholz AI, Landis JN, Sahi S, Myers CL, Olszewski KL, Hibbs MA, Siemers NO, Troyanskaya OG, Coller HA. Nearest Neighbor Networks: clustering expression data based on gene neighborhoods. BMC Bioinformatics 2007;8:250. [PMID: 17626636 DOI: 10.1186/1471-2105-8-250] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2006] [Accepted: 07/12/2007] [Indexed: 11/23/2022] Open

Number

Cited by Other Article(s)

Chen SL, Chin SC, Chan KC, Ho CY. A Machine Learning Approach to Assess Patients with Deep Neck Infection Progression to Descending Mediastinitis: Preliminary Results. Diagnostics (Basel) 2023;13:2736. [PMID: 37685275 PMCID: PMC10486957 DOI: 10.3390/diagnostics13172736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 07/25/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open

Szabo PM, Vajdi A, Kumar N, Tolstorukov MY, Chen BJ, Edwards R, Ligon KL, Chasalow SD, Chow KH, Shetty A, Bolisetty M, Holloway JL, Golhar R, Kidd BA, Hull PA, Houser J, Vlach L, Siemers NO, Saha S. Cancer-associated fibroblasts are the main contributors to epithelial-to-mesenchymal signatures in the tumor microenvironment. Sci Rep 2023;13:3051. [PMID: 36810872 PMCID: PMC9944255 DOI: 10.1038/s41598-023-28480-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 01/19/2023] [Indexed: 02/24/2023] Open

Affiliation(s)

Peter M. Szabo grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA ,2grid.428458.70000 0004 1792 8104Present Address: Fate Therapeutics, San Diego, CA USA
Amir Vajdi grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA ,4grid.417993.10000 0001 2260 0793Present Address: Merck & Co., Inc., Kenilworth, NJ USA
Namit Kumar Bristol Myers Squibb, San Diego, CA, USA.
Michael Y. Tolstorukov grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA
Benjamin J. Chen grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Cambridge, MA USA
Robin Edwards grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA ,7grid.428496.5Present Address: Daiichi Sankyo, Inc., Princeton, NJ USA
Keith L. Ligon grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA
Scott D. Chasalow grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA
Kin-Hoe Chow grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA
Aniket Shetty grid.65499.370000 0001 2106 9910Dana-Farber Cancer Institute, Boston, MA USA
Mohan Bolisetty grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA
James L. Holloway grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Seattle, WA USA
Ryan Golhar grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA
Brian A. Kidd grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Redwood City, CA USA
Philip Ansumana Hull grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Redwood City, CA USA
Jeff Houser grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Redwood City, CA USA
Logan Vlach grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Redwood City, CA USA ,10grid.152326.10000 0001 2264 7217Present Address: Vanderbilt University, Nashville, TN USA
Nathan O. Siemers grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA ,11Present Address: Fiveprime Group, Monterey, CA USA
Saurabh Saha grid.419971.30000 0004 0374 8313Bristol Myers Squibb, Princeton, NJ USA ,12Present Address: Centessa Pharmaceuticals, Cambridge, MA USA

Collapse

Lee AJ, Reiter T, Doing G, Oh J, Hogan DA, Greene CS. Using genome-wide expression compendia to study microorganisms. Comput Struct Biotechnol J 2022;20:4315-4324. [PMID: 36016717 PMCID: PMC9396250 DOI: 10.1016/j.csbj.2022.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/07/2022] [Accepted: 08/07/2022] [Indexed: 11/30/2022] Open

Park Y, Heider D, Hauschild AC. Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence. Cancers (Basel) 2021;13:3148. [PMID: 34202427 PMCID: PMC8269018 DOI: 10.3390/cancers13133148] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 12/18/2022] Open

Liu X, Shang H, Li B, Zhao L, Hua Y, Wu K, Hu M, Fan T. Exploration and validation of hub genes and pathways in the progression of hypoplastic left heart syndrome via weighted gene co-expression network analysis. BMC Cardiovasc Disord 2021;21:300. [PMID: 34130651 PMCID: PMC8204459 DOI: 10.1186/s12872-021-02108-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 06/08/2021] [Indexed: 12/18/2022] Open

Abstract

Background

Despite significant progress in surgical treatment of hypoplastic left heart syndrome (HLHS), its mortality and morbidity are still high. Little is known about the molecular abnormalities of the syndrome. In this study, we aimed to probe into hub genes and key pathways in the progression of the syndrome.

Methods

Differentially expressed genes (DEGs) were identified in left ventricle (LV) or right ventricle (RV) tissues between HLHS and controls using the GSE77798 dataset. Then, weighted gene co-expression network analysis (WGCNA) was performed and key modules were constructed for HLHS. Based on the genes in the key modules, protein–protein interaction networks were conducted, and hub genes and key pathways were screened. Finally, the GSE23959 dataset was used to validate hub genes between HLHS and controls.

Results

We identified 88 and 41 DEGs in LV and RV tissues between HLHS and controls, respectively. DEGs in LV tissues of HLHS were distinctly involved in heart development, apoptotic signaling pathway and ECM receptor interaction. DEGs in RV tissues of HLHS were mainly enriched in BMP signaling pathway, regulation of cell development and regulation of blood pressure. A total of 16 co-expression network were constructed. Among them, black module (r = 0.79 and p value = 2e−04) and pink module (r = 0.84 and p value = 4e−05) had the most significant correlation with HLHS, indicating that the two modules could be the most relevant for HLHS progression. We identified five hub genes in the black module (including Fbn1, Itga8, Itga11, Itgb5 and Thbs2), and five hub genes (including Cblb, Ccl2, Edn1, Itgb3 and Map2k1) in the pink module for HLHS. Their abnormal expression was verified in the GSE23959 dataset.

Conclusions

Our findings revealed hub genes and key pathways for HLHS through WGCNA, which could play key roles in the molecular mechanism of HLHS.

Collapse

Affiliation(s)

Xuelan Liu Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China
Honglei Shang Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China
Bin Li Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China
Liyun Zhao Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China
Ying Hua Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China
Kaiyuan Wu Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China
Manman Hu Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China
Taibing Fan Department of Children's Heart Center, Henan Provincial People's Hospital, Department of Children's Heart Center of Fuwai Central China Cardiovascular Hospital, Central China Fuwai Hospital of Zhengzhou University, Zhengzhou, 450003, Henan, China.

Collapse

Lu Y, Phillips CA, Langston MA. A robustness metric for biological data clustering algorithms. BMC Bioinformatics 2019;20:503. [PMID: 31874625 PMCID: PMC6929270 DOI: 10.1186/s12859-019-3089-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 09/10/2019] [Indexed: 02/05/2023] Open

Chen LP, Yi GY, Zhang Q, He W. Multiclass analysis and prediction with network structured covariates. JOURNAL OF STATISTICAL DISTRIBUTIONS AND APPLICATIONS 2019. [DOI: 10.1186/s40488-019-0094-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Analysis of SAP Log Data Based on Network Community Decomposition. INFORMATION 2019. [DOI: 10.3390/info10030092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Catanese HN, Brayton KA, Gebremedhin AH. A nearest-neighbors network model for sequence data reveals new insight into genotype distribution of a pathogen. BMC Bioinformatics 2018;19:475. [PMID: 30541438 PMCID: PMC6291930 DOI: 10.1186/s12859-018-2453-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 10/31/2018] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Sequence similarity networks are useful for classifying and characterizing biologically important proteins. Threshold-based approaches to similarity network construction using exact distance measures are prohibitively slow to compute and rely on the difficult task of selecting an appropriate threshold, while similarity networks based on approximate distance calculations compromise useful structural information.

RESULTS

We present an alternative network representation for a set of sequence data that overcomes these drawbacks. In our model, called the Directed Weighted All Nearest Neighbors (DiWANN) network, each sequence is represented by a node and is connected via a directed edge to only the closest sequence, or sequences in the case of ties, in the dataset. Our contributions span several aspects. Specifically, we: (i) Apply an all nearest neighbors network model to protein sequence data from three different applications and examine the structural properties of the networks; (ii) Compare the model against threshold-based networks to validate their semantic equivalence, and demonstrate the relative advantages the model offers; (iii) Demonstrate the model's resilience to missing sequences; and (iv) Develop an efficient algorithm for constructing a DiWANN network from a set of sequences. We find that the DiWANN network representation attains similar semantic properties to threshold-based graphs, while avoiding weaknesses of both high and low threshold graphs. Additionally, we find that approximate distance networks, using BLAST bitscores in place of exact edit distances, can cause significant loss of structural information. We show that the proposed DiWANN network construction algorithm provides a fourfold speedup over a standard threshold based approach to network construction. We also identify a relationship between the centrality of a sequence in a similarity network of an Anaplasma marginale short sequence repeat dataset and how broadly that sequence is dispersed geographically.

CONCLUSION

We demonstrate that using approximate distance measures to rapidly construct similarity networks may lead to significant deficiencies in the structure of that network in terms centrality and clustering analyses. We present a new network representation that maintains the structural semantics of threshold-based networks while increasing connectedness, and an algorithm for constructing the network using exact distance measures in a fraction of the time it would take to build a threshold-based equivalent.

Collapse

Li Z, Nie F, Chang X, Nie L, Zhang H, Yang Y. Rank-Constrained Spectral Clustering With Flexible Embedding. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018;29:6073-6082. [PMID: 29993916 DOI: 10.1109/tnnls.2018.2817538] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Genome-wide association analysis identifies genetic correlates of immune infiltrates in solid tumors. PLoS One 2017;12:e0179726. [PMID: 28749946 PMCID: PMC5531551 DOI: 10.1371/journal.pone.0179726] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2016] [Accepted: 06/02/2017] [Indexed: 12/27/2022] Open

Abstract

Therapeutic options for the treatment of an increasing variety of cancers have been expanded by the introduction of a new class of drugs, commonly referred to as checkpoint blocking agents, that target the host immune system to positively modulate anti-tumor immune response. Although efficacy of these agents has been linked to a pre-existing level of tumor immune infiltrate, it remains unclear why some patients exhibit deep and durable responses to these agents while others do not benefit. To examine the influence of tumor genetics on tumor immune state, we interrogated the relationship between somatic mutation and copy number alteration with infiltration levels of 7 immune cell types across 40 tumor cohorts in The Cancer Genome Atlas. Levels of cytotoxic T, regulatory T, total T, natural killer, and B cells, as well as monocytes and M2 macrophages, were estimated using a novel set of transcriptional signatures that were designed to resist interference from the cellular heterogeneity of tumors. Tumor mutational load and estimates of tumor purity were included in our association models to adjust for biases in multi-modal genomic data. Copy number alterations, mutations summarized at the gene level, and position-specific mutations were evaluated for association with tumor immune infiltration. We observed a strong relationship between copy number loss of a large region of chromosome 9p and decreased lymphocyte estimates in melanoma, pancreatic, and head/neck cancers. Mutations in the oncogenes PIK3CA, FGFR3, and RAS/RAF family members, as well as the tumor suppressor TP53, were linked to changes in immune infiltration, usually in restricted tumor types. Associations of specific WNT/beta-catenin pathway genetic changes with immune state were limited, but we noted a link between 9p loss and the expression of the WNT receptor FZD3, suggesting that there are interactions between 9p alteration and WNT pathways. Finally, two different cell death regulators, CASP8 and DIDO1, were often mutated in head/neck tumors that had higher lymphocyte infiltrates. In summary, our study supports the relevance of tumor genetics to questions of efficacy and resistance in checkpoint blockade therapies. It also highlights the need to assess genome-wide influences during exploration of any specific tumor pathway hypothesized to be relevant to therapeutic response. Some of the observed genetic links to immune state, like 9p loss, may influence response to cancer immune therapies. Others, like mutations in cell death pathways, may help guide combination therapeutic approaches.

Collapse

Yalcin D, Hakguder ZM, Otu HH. Bioinformatics approaches to single-cell analysis in developmental biology. Mol Hum Reprod 2015;22:182-92. [PMID: 26358759 DOI: 10.1093/molehr/gav050] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 09/04/2015] [Indexed: 12/17/2022] Open

Abstract

Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology.

Collapse

Imangaliyev S, Keijser B, Crielaard W, Tsivtsivadze E. Personalized microbial network inference via co-regularized spectral clustering. Methods 2015;83:28-35. [PMID: 25842007 DOI: 10.1016/j.ymeth.2015.03.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/19/2015] [Accepted: 03/24/2015] [Indexed: 01/23/2023] Open

Park H, Niida A, Miyano S, Imoto S. Sparse Overlapping Group Lasso for Integrative Multi-Omics Analysis. J Comput Biol 2015;22:73-84. [PMID: 25629319 DOI: 10.1089/cmb.2014.0197] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

A new system for comparative functional genomics of Saccharomyces yeasts. Genetics 2013;195:275-87. [PMID: 23852385 PMCID: PMC3761308 DOI: 10.1534/genetics.113.152918] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Pirim H, Ekşioğlu B, Perkins A, Yüceer Ç. Clustering of High Throughput Gene Expression Data. COMPUTERS & OPERATIONS RESEARCH 2012;39:3046-3061. [PMID: 23144527 PMCID: PMC3491664 DOI: 10.1016/j.cor.2012.03.008] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]

BELLO-ORGAZ GEMA, MENÉNDEZ HÉCTORD, CAMACHO DAVID. ADAPTIVE K-MEANS ALGORITHM FOR OVERLAPPED GRAPH CLUSTERING. Int J Neural Syst 2012;22:1250018. [DOI: 10.1142/s0129065712500189] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Arefin AS, Riveros C, Berretta R, Moscato P. GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs. PLoS One 2012;7:e44000. [PMID: 22937144 PMCID: PMC3429408 DOI: 10.1371/journal.pone.0044000] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 07/27/2012] [Indexed: 12/05/2022] Open

Abstract

Background

The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem.

Results

We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50–60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets.

Conclusion

Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN) provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL) at https://sourceforge.net/p/gpufsknn/.

Collapse

A systematic comparison of genome-scale clustering algorithms. BMC Bioinformatics 2012;13 Suppl 10:S7. [PMID: 22759431 PMCID: PMC3382433 DOI: 10.1186/1471-2105-13-s10-s7] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Abstract

Background

A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae.

Methods

For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method.

Results

Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods.

Conclusions

Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.

Collapse

Zhou F, Ma Q, Li G, Xu Y. QServer: a biclustering server for prediction and assessment of co-expressed gene clusters. PLoS One 2012;7:e32660. [PMID: 22403692 PMCID: PMC3293860 DOI: 10.1371/journal.pone.0032660] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 01/30/2012] [Indexed: 01/31/2023] Open

Judson RS, Mortensen HM, Shah I, Knudsen TB, Elloumi F. Using pathway modules as targets for assay development in xenobiotic screening. ACTA ACUST UNITED AC 2012;8:531-42. [DOI: 10.1039/c1mb05303e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Comparative microbial modules resource: generation and visualization of multi-species biclusters. PLoS Comput Biol 2011;7:e1002228. [PMID: 22144874 PMCID: PMC3228777 DOI: 10.1371/journal.pcbi.1002228] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2011] [Accepted: 08/29/2011] [Indexed: 11/24/2022] Open

Abstract

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation.

Advancing high-throughput experimental technologies are providing access to genome-wide measurements for multiple related species on multiple information levels (e.g. mRNA, protein, interactions, functional assays, etc.). We present a biclustering algorithm and an associated visualization system for generating and exploring regulatory modules derived from analysis of integrated multi-species genomics datasets. We use multi-species-cMonkey, an algorithm of our own construction that can integrate diverse systems-biology datatypes from multiple species to form biclusters, or condition-dependent regulatory modules, that are conserved across both the multiple species analyzed and biclusters that are specific to subsets of the processed species. Our resource is an integrated web and java based system that allows biologists to explore both conserved and species-specific biclusters in the context of the data, associated networks for both species, and existing annotations for both species. Our focus in this work is on the use of the integrated system with examples drawn from exploring modules associated with nitrogen metabolism in two Gram-negative bacteria, E. coli and S. Typhimurium.

Collapse

Dost B, Wu C, Su A, Bafna V. TCLUST: a fast method for clustering genome-scale expression data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:808-818. [PMID: 20479508 DOI: 10.1109/tcbb.2010.34] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

A graph clustering algorithm based on a clustering coefficient for weighted graphs. JOURNAL OF THE BRAZILIAN COMPUTER SOCIETY 2010. [DOI: 10.1007/s13173-010-0027-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Gene expression profiling: classification of mice with left ventricle systolic dysfunction using microarray analysis. Crit Care Med 2010;38:25-31. [PMID: 19770745 DOI: 10.1097/ccm.0b013e3181b427e8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Celton M, Malpertuy A, Lelandais G, de Brevern AG. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC Genomics 2010;11:15. [PMID: 20056002 PMCID: PMC2827407 DOI: 10.1186/1471-2164-11-15] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Accepted: 01/07/2010] [Indexed: 11/17/2022] Open

Abstract

Background

Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human.

Results

We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations.

Conclusions

More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.

Collapse

Mutwil M, Usadel B, Schütte M, Loraine A, Ebenhöh O, Persson S. Assembly of an interactive correlation network for the Arabidopsis genome using a novel heuristic clustering algorithm. PLANT PHYSIOLOGY 2010;152:29-43. [PMID: 19889879 PMCID: PMC2799344 DOI: 10.1104/pp.109.145318] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Zhang KX, Ouellette BFF. Pandora, a pathway and network discovery approach based on common biological evidence. ACTA ACUST UNITED AC 2009;26:529-35. [PMID: 20031970 PMCID: PMC2820679 DOI: 10.1093/bioinformatics/btp701] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Selga E, Oleaga C, Ramírez S, de Almagro MC, Noé V, Ciudad CJ. Networking of differentially expressed genes in human cancer cells resistant to methotrexate. Genome Med 2009;1:83. [PMID: 19732436 PMCID: PMC2768990 DOI: 10.1186/gm83] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2009] [Revised: 07/31/2009] [Accepted: 09/04/2009] [Indexed: 12/14/2022] Open

Abstract

Background

The need for an integrated view of data obtained from high-throughput technologies gave rise to network analyses. These are especially useful to rationalize how external perturbations propagate through the expression of genes. To address this issue in the case of drug resistance, we constructed biological association networks of genes differentially expressed in cell lines resistant to methotrexate (MTX).

Methods

Seven cell lines representative of different types of cancer, including colon cancer (HT29 and Caco2), breast cancer (MCF-7 and MDA-MB-468), pancreatic cancer (MIA PaCa-2), erythroblastic leukemia (K562) and osteosarcoma (Saos-2), were used. The differential expression pattern between sensitive and MTX-resistant cells was determined by whole human genome microarrays and analyzed with the GeneSpring GX software package. Genes deregulated in common between the different cancer cell lines served to generate biological association networks using the Pathway Architect software.

Results

Dikkopf homolog-1 (DKK1) is a highly interconnected node in the network generated with genes in common between the two colon cancer cell lines, and functional validations of this target using small interfering RNAs (siRNAs) showed a chemosensitization toward MTX. Members of the UDP-glucuronosyltransferase 1A (UGT1A) family formed a network of genes differentially expressed in the two breast cancer cell lines. siRNA treatment against UGT1A also showed an increase in MTX sensitivity. Eukaryotic translation elongation factor 1 alpha 1 (EEF1A1) was overexpressed among the pancreatic cancer, leukemia and osteosarcoma cell lines, and siRNA treatment against EEF1A1 produced a chemosensitization toward MTX.

Conclusions

Biological association networks identified DKK1, UGT1As and EEF1A1 as important gene nodes in MTX-resistance. Treatments using siRNA technology against these three genes showed chemosensitization toward MTX.

Collapse

Li G, Ma Q, Tang H, Paterson AH, Xu Y. QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res 2009;37:e101. [PMID: 19509312 PMCID: PMC2731891 DOI: 10.1093/nar/gkp491] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Reverse engineering the genotype-phenotype map with natural genetic variation. Nature 2008;456:738-44. [PMID: 19079051 DOI: 10.1038/nature07633] [Citation(s) in RCA: 162] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Zhu Y, Li H, Miller DJ, Wang Z, Xuan J, Clarke R, Hoffman EP, Wang Y. caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data. BMC Bioinformatics 2008;9:383. [PMID: 18801195 PMCID: PMC2566986 DOI: 10.1186/1471-2105-9-383] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2008] [Accepted: 09/18/2008] [Indexed: 12/31/2022] Open

Abstract

Background

The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables.

Results

In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA) for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive) hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy) and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample clustering, and phenotype clustering (wherein phenotype labels for samples are known), albeit with minor algorithm modifications customized to each of these tasks.

Conclusion

VISDA achieved robust and superior clustering accuracy, compared with several benchmark clustering schemes. The model order selection scheme in VISDA was shown to be effective for high dimensional genomic data clustering. On muscular dystrophy data and muscle regeneration data, VISDA identified biologically relevant co-expressed gene clusters. VISDA also captured the pathological relationships among different phenotypes revealed at the molecular level, through phenotype clustering on muscular dystrophy data and multi-category cancer data.

Collapse

Chen X, Liang S, Zheng W, Liao Z, Shang T, Ma W. Meta-analysis of nasopharyngeal carcinoma microarray data explores mechanism of EBV-regulated neoplastic transformation. BMC Genomics 2008;9:322. [PMID: 18605998 PMCID: PMC2491640 DOI: 10.1186/1471-2164-9-322] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2008] [Accepted: 07/07/2008] [Indexed: 11/10/2022] Open

Swindell WR. Genes regulated by caloric restriction have unique roles within transcriptional networks. Mech Ageing Dev 2008;129:580-92. [PMID: 18634819 DOI: 10.1016/j.mad.2008.06.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 06/09/2008] [Accepted: 06/15/2008] [Indexed: 02/06/2023]

Chen G, Larsen P, Almasri E, Dai Y. Rank-based edge reconstruction for scale-free genetic regulatory networks. BMC Bioinformatics 2008;9:75. [PMID: 18237422 PMCID: PMC2275249 DOI: 10.1186/1471-2105-9-75] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 01/31/2008] [Indexed: 11/12/2022] Open