Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A. Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization. BMC Bioinformatics 2006;7:78. [PMID: 16503973 PMCID: PMC1434777 DOI: 10.1186/1471-2105-7-78] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2005] [Accepted: 02/17/2006] [Indexed: 12/01/2022] Open

For:	Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A. Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization. BMC Bioinformatics 2006;7:78. [PMID: 16503973 PMCID: PMC1434777 DOI: 10.1186/1471-2105-7-78] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2005] [Accepted: 02/17/2006] [Indexed: 12/01/2022] Open

Number

Cited by Other Article(s)

Rasmussen M, Fredsøe J, Salachan PV, Blanke MPL, Larsen SH, Ulhøi BP, Jensen JB, Borre M, Sørensen KD. Stroma-specific gene expression signature identifies prostate cancer subtype with high recurrence risk. NPJ Precis Oncol 2024;8:48. [PMID: 38395986 PMCID: PMC10891092 DOI: 10.1038/s41698-024-00540-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 02/02/2024] [Indexed: 02/25/2024] Open

Abstract

Current prognostic tools cannot clearly distinguish indolent and aggressive prostate cancer (PC). We hypothesized that analyzing individual contributions of epithelial and stromal components in localized PC (LPC) could improve risk stratification, as stromal subtypes may have been overlooked due to the emphasis on malignant epithelial cells. Hence, we derived molecular subtypes of PC using gene expression analysis of LPC samples from prostatectomy patients (cohort 1, n = 127) and validated these subtypes in two independent prostatectomy cohorts (cohort 2, n = 406, cohort 3, n = 126). Stroma and epithelium-specific signatures were established from laser-capture microdissection data and non-negative matrix factorization was used to identify subtypes based on these signatures. Subtypes were functionally characterized by gene set and cell type enrichment analyses, and survival analysis was conducted. Three epithelial (E1-E3) and three stromal (S1-S3) PC subtypes were identified. While subtyping based on epithelial signatures showed inconsistent associations to biochemical recurrence (BCR), subtyping by stromal signatures was significantly associated with BCR in all three cohorts, with subtype S3 indicating high BCR risk. Subtype S3 exhibited distinct features, including significantly decreased cell-polarity and myogenesis, significantly increased infiltration of M2-polarized macrophages and CD8 + T-cells compared to subtype S1. For patients clinically classified as CAPRA-S intermediate risk, S3 improved prediction of BCR. This study demonstrates the potential of stromal signatures in identification of clinically relevant PC subtypes, and further indicated that stromal characterization may enhance risk stratification in LPC and may be particularly promising in cases with high prognostic ambiguity based on clinical parameters.

Collapse

Zhao L, Cunningham CM, Andruska AM, Schimmel K, Ali MK, Kim D, Gu S, Chang JL, Spiekerkoetter E, Nicolls MR. Rat microbial biogeography and age-dependent lactic acid bacteria in healthy lungs. Lab Anim (NY) 2024;53:43-55. [PMID: 38297075 PMCID: PMC10834367 DOI: 10.1038/s41684-023-01322-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 12/21/2023] [Indexed: 02/02/2024]

Abstract

The laboratory rat emerges as a useful tool for studying the interaction between the host and its microbiome. To advance principles relevant to the human microbiome, we systematically investigated and defined the multitissue microbial biogeography of healthy Fischer 344 rats across their lifespan. Microbial community profiling data were extracted and integrated with host transcriptomic data from the Sequencing Quality Control consortium. Unsupervised machine learning, correlation, taxonomic diversity and abundance analyses were performed to determine and characterize the rat microbial biogeography and identify four intertissue microbial heterogeneity patterns (P1-P4). We found that the 11 body habitats harbored a greater diversity of microbes than previously suspected. Lactic acid bacteria (LAB) abundance progressively declined in lungs from breastfed newborn to adolescence/adult, and was below detectable levels in elderly rats. Bioinformatics analyses indicate that the abundance of LAB may be modulated by the lung-immune axis. The presence and levels of LAB in lungs were further evaluated by PCR in two validation datasets. The lung, testes, thymus, kidney, adrenal and muscle niches were found to have age-dependent alterations in microbial abundance. The 357 microbial signatures were positively correlated with host genes in cell proliferation (P1), DNA damage repair (P2) and DNA transcription (P3). Our study established a link between the metabolic properties of LAB with lung microbiota maturation and development. Breastfeeding and environmental exposure influence microbiome composition and host health and longevity. The inferred rat microbial biogeography and pattern-specific microbial signatures could be useful for microbiome therapeutic approaches to human health and life quality enhancement.

Collapse

Affiliation(s)

Lan Zhao Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA. VA Palo Alto Health Care System, Palo Alto, CA, USA. Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA.
Christine M Cunningham Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA VA Palo Alto Health Care System, Palo Alto, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Adam M Andruska Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Katharina Schimmel Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Md Khadem Ali Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Dongeon Kim Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA VA Palo Alto Health Care System, Palo Alto, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Shenbiao Gu Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA VA Palo Alto Health Care System, Palo Alto, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Jason L Chang Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA VA Palo Alto Health Care System, Palo Alto, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Edda Spiekerkoetter Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA
Mark R Nicolls Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Stanford, CA, USA. VA Palo Alto Health Care System, Palo Alto, CA, USA. Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford, CA, USA.

Collapse

Zhao L, Cunningham CM, Andruska AM, Schimmel K, Ali MK, Kim D, Gu S, Chang JL, Spiekerkoetter E, Nicolls MR. Rat microbial biogeography and age-dependent lactic acid bacteria in healthy lungs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.19.541527. [PMID: 37293045 PMCID: PMC10245737 DOI: 10.1101/2023.05.19.541527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

The laboratory rat emerges as a useful tool for studying the interaction between the host and its microbiome. To advance principles relevant to the human microbiome, we systematically investigated and defined a multi-tissue full lifespan microbial biogeography for healthy Fischer 344 rats. Microbial community profiling data was extracted and integrated with host transcriptomic data from the Sequencing Quality Control (SEQC) consortium. Unsupervised machine learning, Spearman's correlation, taxonomic diversity, and abundance analyses were performed to determine and characterize the rat microbial biogeography and the identification of four inter-tissue microbial heterogeneity patterns (P1-P4). The 11 body habitats harbor a greater diversity of microbes than previously suspected. Lactic acid bacteria (LAB) abundances progressively declined in lungs from breastfeed newborn to adolescence/adult and was below detectable levels in elderly rats. LAB's presence and levels in lungs were further evaluated by PCR in the two validation datasets. The lung, testes, thymus, kidney, adrenal, and muscle niches were found to have age-dependent alterations in microbial abundance. P1 is dominated by lung samples. P2 contains the largest sample size and is enriched for environmental species. Liver and muscle samples were mostly classified into P3. Archaea species were exclusively enriched in P4. The 357 pattern-specific microbial signatures were positively correlated with host genes in cell migration and proliferation (P1), DNA damage repair and synaptic transmissions (P2), as well as DNA transcription and cell cycle in P3. Our study established a link between metabolic properties of LAB with lung microbiota maturation and development. Breastfeeding and environmental exposure influence microbiome composition and host health and longevity. The inferred rat microbial biogeography and pattern-specific microbial signatures would be useful for microbiome therapeutic approaches to human health and good quality of life.

Collapse

Seyler LM, Kraus EA, McLean C, Spear JR, Templeton AS, Schrenk MO. An untargeted exometabolomics approach to characterize dissolved organic matter in groundwater of the Samail Ophiolite. Front Microbiol 2023;14:1093372. [PMID: 36970670 PMCID: PMC10033605 DOI: 10.3389/fmicb.2023.1093372] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/23/2023] [Indexed: 03/11/2023] Open

Liu X, Yu T, Zhao X, Long C, Han R, Su Z, Li G. ARBic: an all-round biclustering algorithm for analyzing gene expression data. NAR Genom Bioinform 2023;5:lqad009. [PMID: 36733402 PMCID: PMC9887595 DOI: 10.1093/nargab/lqad009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 01/09/2023] [Accepted: 01/17/2023] [Indexed: 02/04/2023] Open

Robust semi-supervised data representation and imputation by correntropy based constraint nonnegative matrix factorization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03884-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Ko YJ, Kim S, Pan CH, Park K. Identification of Functional Microbial Modules Through Network-Based Analysis of Meta-Microbial Features Using Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2851-2862. [PMID: 34329170 DOI: 10.1109/tcbb.2021.3100893] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Barkley D, Moncada R, Pour M, Liberman DA, Dryg I, Werba G, Wang W, Baron M, Rao A, Xia B, França GS, Weil A, Delair DF, Hajdu C, Lund AW, Osman I, Yanai I. Cancer cell states recur across tumor types and form specific interactions with the tumor microenvironment. Nat Genet 2022;54:1192-1201. [PMID: 35931863 PMCID: PMC9886402 DOI: 10.1038/s41588-022-01141-9] [Citation(s) in RCA: 112] [Impact Index Per Article: 56.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 06/22/2022] [Indexed: 02/01/2023]

Castanho EN, Aidos H, Madeira SC. Biclustering fMRI time series: a comparative study. BMC Bioinformatics 2022;23:192. [PMID: 35606701 PMCID: PMC9126639 DOI: 10.1186/s12859-022-04733-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 05/13/2022] [Indexed: 12/12/2022] Open

Karagiannaki I, Gourlia K, Lagani V, Pantazis Y, Tsamardinos I. Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm. Mach Learn 2022;112:4257-4287. [PMID: 37900054 PMCID: PMC10600308 DOI: 10.1007/s10994-022-06158-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/12/2021] [Accepted: 02/19/2022] [Indexed: 11/24/2022]

Zhao L, Cho WC, Luo JL. Exploring the patient-microbiome interaction patterns for pan-cancer. Comput Struct Biotechnol J 2022;20:3068-3079. [PMID: 35782745 PMCID: PMC9233187 DOI: 10.1016/j.csbj.2022.06.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 06/06/2022] [Accepted: 06/06/2022] [Indexed: 11/03/2022] Open

Abstract

•

Cancer subtype-specific sets of microbiomes, making pan-cancer heterogeneity at the microbial level.

•

Approximately 60% of the untreated cancer patients have experienced microbial composition changes in their tumor tissues.

•

Colorectal cancer (CRC) was largely composed of two subtypes (S4 and S6) driven by different microbial profiles.

•

The identified seven pan-cancer subtypes with 424 subtype-specific microbial signatures will help us find new therapeutic targets and better treatment strategies for cancer patients.

Microbes play important roles in human health and disease. Immunocompromised cancer patients are more vulnerable to getting microbial infections. Regions of hypoxia and acidic tumor microenvironment shape the microbial community diversity and abundance. Each cancer has its own microbiome, making cancer-specific sets of microbiomes. High-throughput profiling technologies provide a culture-free approach for microbial profiling in tumor samples. Microbial compositional data was extracted and examined from the TCGA unmapped transcriptome data. Biclustering, correlation, and statistical analyses were performed to determine the seven patient-microbe interaction patterns. These two-dimensional patterns consist of a group of microbial species that show significant over-representation over the 7 pan-cancer subtypes (S1-S7), respectively. Approximately 60% of the untreated cancer patients have experienced tissue microbial composition and functional changes between subtypes and normal controls. Among these changes, subtype S5 had loss of microbial diversity as well as impaired immune functions. S1, S2, and S3 had been enriched with microbial signatures derived from the Gammaproteobacteria, Actinobacteria and Betaproteobacteria, respectively. Colorectal cancer (CRC) was largely composed of two subtypes, namely S4 and S6, driven by different microbial profiles. S4 patients had increased microbial load, and were enriched with CRC-related oncogenic pathways. S6 CRC together with other cancer patients, making up almost 40% of all cases were classified into the S6 subtype, which not only resembled the normal control’s microbiota but also retained their original “normal-like” functions. Lastly, the S7 was a rare and understudied subtype. Our study investigated the pan-cancer heterogeneity at the microbial level. The identified seven pan-cancer subtypes with 424 subtype-specific microbial signatures will help us find new therapeutic targets and better treatment strategies for cancer patients.

Collapse

Liu R, Liu L, Zhou Y. m6Adecom: analysis of m6A profile matrix based on graph regularized non-negative matrix factorization. Methods 2022;203:322-327. [DOI: 10.1016/j.ymeth.2022.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 01/12/2022] [Accepted: 01/21/2022] [Indexed: 01/07/2023] Open

Gene Expression Analysis through Parallel Non-Negative Matrix Factorization. COMPUTATION 2021. [DOI: 10.3390/computation9100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Zhang S, Li X, Lin Q, Wong KC. Nature-Inspired Compressed Sensing for Transcriptomic Profiling From Random Composite Measurements. IEEE TRANSACTIONS ON CYBERNETICS 2021;51:4476-4487. [PMID: 31751263 DOI: 10.1109/tcyb.2019.2951402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Chung Y, Lee H. Correlation between Alzheimer's disease and type 2 diabetes using non-negative matrix factorization. Sci Rep 2021;11:15265. [PMID: 34315930 PMCID: PMC8316581 DOI: 10.1038/s41598-021-94048-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/24/2021] [Indexed: 02/07/2023] Open

Abstract

Alzheimer's disease (AD) is a complex and heterogeneous disease that can be affected by various genetic factors. Although the cause of AD is not yet known and there is no treatment to cure this disease, its progression can be delayed. AD has recently been recognized as a brain-specific type of diabetes called type 3 diabetes. Several studies have shown that people with type 2 diabetes (T2D) have a higher risk of developing AD. Therefore, it is important to identify subgroups of patients with AD that may be more likely to be associated with T2D. We here describe a new approach to identify the correlation between AD and T2D at the genetic level. Subgroups of AD and T2D were each generated using a non-negative matrix factorization (NMF) approach, which generated clusters containing subsets of genes and samples. In the gene cluster that was generated by conventional gene clustering method from NMF, we selected genes with significant differences in the corresponding sample cluster by Kruskal-Wallis and Dunn-test. Subsequently, we extracted differentially expressed gene (DEG) subgroups, and candidate genes with the same regulation direction can be extracted at the intersection of two disease DEG subgroups. Finally, we identified 241 candidate genes that represent common features related to both AD and T2D, and based on pathway analysis we propose that these genes play a role in the common pathological features of AD and T2D. Moreover, in the prediction of AD using logistic regression analysis with an independent AD dataset, the candidate genes obtained better prediction performance than DEGs. In conclusion, our study revealed a subgroup of patients with AD that are associated with T2D and candidate genes associated between AD and T2D, which can help in providing personalized and suitable treatments.

Collapse

Liu Z, Lu T, Wang L, Liu L, Li L, Han X. Comprehensive Molecular Analyses of a Novel Mutational Signature Classification System with Regard to Prognosis, Genomic Alterations, and Immune Landscape in Glioma. Front Mol Biosci 2021;8:682084. [PMID: 34307451 PMCID: PMC8293748 DOI: 10.3389/fmolb.2021.682084] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/28/2021] [Indexed: 11/26/2022] Open

Zhang C, Zhang S. Bayesian Joint Matrix Decomposition for Data Integration with Heterogeneous Noise. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021;43:1184-1196. [PMID: 31603812 DOI: 10.1109/tpami.2019.2946370] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Lee Y, Bogdanoff D, Wang Y, Hartoularos GC, Woo JM, Mowery CT, Nisonoff HM, Lee DS, Sun Y, Lee J, Mehdizadeh S, Cantlon J, Shifrut E, Ngyuen DN, Roth TL, Song YS, Marson A, Chow ED, Ye CJ. XYZeq: Spatially resolved single-cell RNA sequencing reveals expression heterogeneity in the tumor microenvironment. SCIENCE ADVANCES 2021;7:7/17/eabg4755. [PMID: 33883145 PMCID: PMC8059935 DOI: 10.1126/sciadv.abg4755] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 03/04/2021] [Indexed: 05/07/2023]

Affiliation(s)

Youjin Lee Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA. Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA J. David Gladstone Institutes, San Francisco, CA 94158, USA
Derek Bogdanoff Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA Center for Advanced Technology, University of California, San Francisco, San Francisco, CA 94158, USA
Yutong Wang Graduate Group in Biostatistics, University of California, Berkeley, CA 94720, USA Center for Computational Biology, University of California, Berkeley, CA 94720, USA
George C Hartoularos Graduate Program in Biological and Medical Informatics, University of California, San Francisco, San Francisco, CA 94158, USA Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA 94143, USA Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
Jonathan M Woo Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA J. David Gladstone Institutes, San Francisco, CA 94158, USA
Cody T Mowery Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA J. David Gladstone Institutes, San Francisco, CA 94158, USA Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA 94143, USA Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA 94143, USA
Hunter M Nisonoff Center for Computational Biology, University of California, Berkeley, CA 94720, USA
David S Lee Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA 94143, USA Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
Yang Sun Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA 94143, USA Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
James Lee Division of Hematology and Oncology, University of California, San Francisco, San Francisco, CA 94143, USA
Sadaf Mehdizadeh Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA
Joshua Cantlon Scienion AG, Volmerstrasse 7b, 12489 Berlin, Germany
Eric Shifrut Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA J. David Gladstone Institutes, San Francisco, CA 94158, USA
David N Ngyuen Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA J. David Gladstone Institutes, San Francisco, CA 94158, USA Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
Theodore L Roth Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA 94143, USA Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA 94143, USA
Yun S Song Computer Science Division, University of California, Berkeley, CA 94720, USA Department of Statistics, University of California, Berkeley, CA 94720, USA Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
Alexander Marson Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA 94143, USA. Diabetes Center, University of California, San Francisco, San Francisco, CA 94143, USA Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA J. David Gladstone Institutes, San Francisco, CA 94158, USA Department of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA Chan Zuckerberg Biohub, San Francisco, CA 94158, USA UCSF Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, USA Parker Institute for Cancer Immunotherapy, University of California, San Francisco, San Francisco, CA 94129, USA Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA
Eric D Chow Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94158, USA. Center for Advanced Technology, University of California, San Francisco, San Francisco, CA 94158, USA
Chun Jimmie Ye Division of Rheumatology, Department of Medicine, University of California, San Francisco, CA 94143, USA. Chan Zuckerberg Biohub, San Francisco, CA 94158, USA Parker Institute for Cancer Immunotherapy, University of California, San Francisco, San Francisco, CA 94129, USA Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA 94158, USA Institute of Computational Health Sciences, University of California, San Francisco, San Francisco, CA 94143, USA Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA

Collapse

Liu X, Li D, Liu J, Su Z, Li G. RecBic: a fast and accurate algorithm recognizing trend-preserving biclusters. Bioinformatics 2021;36:5054-5060. [PMID: 32653907 DOI: 10.1093/bioinformatics/btaa630] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 06/24/2020] [Accepted: 07/06/2020] [Indexed: 01/09/2023] Open

Lemsara A, Ouadfel S, Fröhlich H. PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC Bioinformatics 2020;21:146. [PMID: 32299344 PMCID: PMC7161108 DOI: 10.1186/s12859-020-3465-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 03/23/2020] [Indexed: 02/08/2023] Open

Abstract

Background

Recent years have witnessed an increasing interest in multi-omics data, because these data allow for better understanding complex diseases such as cancer on a molecular system level. In addition, multi-omics data increase the chance to robustly identify molecular patient sub-groups and hence open the door towards a better personalized treatment of diseases. Several methods have been proposed for unsupervised clustering of multi-omics data. However, a number of challenges remain, such as the magnitude of features and the large difference in dimensionality across different omics data sources.

Results

We propose a multi-modal sparse denoising autoencoder framework coupled with sparse non-negative matrix factorization to robustly cluster patients based on multi-omics data. The proposed model specifically leverages pathway information to effectively reduce the dimensionality of omics data into a pathway and patient specific score profile. In consequence, our method allows us to understand, which pathway is a feature of which particular patient cluster. Moreover, recently proposed machine learning techniques allow us to disentangle the specific impact of each individual omics feature on a pathway score. We applied our method to cluster patients in several cancer datasets using gene expression, miRNA expression, DNA methylation and CNVs, demonstrating the possibility to obtain biologically plausible disease subtypes characterized by specific molecular features. Comparison against several competing methods showed a competitive clustering performance. In addition, post-hoc analysis of somatic mutations and clinical data provided supporting evidence and interpretation of the identified clusters.

Conclusions

Our suggested multi-modal sparse denoising autoencoder approach allows for an effective and interpretable integration of multi-omics data on pathway level while addressing the high dimensional character of omics data. Patient specific pathway score profiles derived from our model allow for a robust identification of disease subgroups.

Collapse

Moncada R, Barkley D, Wagner F, Chiodin M, Devlin JC, Baron M, Hajdu CH, Simeone DM, Yanai I. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol 2020;38:333-342. [PMID: 31932730 DOI: 10.1038/s41587-019-0392-8] [Citation(s) in RCA: 426] [Impact Index Per Article: 106.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Accepted: 12/11/2019] [Indexed: 12/12/2022]

Appice A, Tsoumakas G, Manolopoulos Y, Matwin S. Pathway Activity Score Learning for Dimensionality Reduction of Gene Expression Data. DISCOVERY SCIENCE 2020. [PMCID: PMC7556388 DOI: 10.1007/978-3-030-61527-7_17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Wu MJ, Gao YL, Liu JX, Zhu R, Wang J. Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data. Hum Hered 2019;84:47-58. [PMID: 31466072 DOI: 10.1159/000501653] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 06/23/2019] [Indexed: 11/19/2022] Open

Woo J, Winterhoff BJ, Starr TK, Aliferis C, Wang J. De novo prediction of cell-type complexity in single-cell RNA-seq and tumor microenvironments. Life Sci Alliance 2019;2:2/4/e201900443. [PMID: 31266885 PMCID: PMC6607449 DOI: 10.26508/lsa.201900443] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 06/24/2019] [Indexed: 12/30/2022] Open

Winham SJ, Larson NB, Armasu SM, Fogarty ZC, Larson MC, McCauley BM, Wang C, Lawrenson K, Gayther S, Cunningham JM, Fridley BL, Goode EL. Molecular signatures of X chromosome inactivation and associations with clinical outcomes in epithelial ovarian cancer. Hum Mol Genet 2019;28:1331-1342. [PMID: 30576442 DOI: 10.1093/hmg/ddy444] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 10/12/2018] [Accepted: 12/14/2018] [Indexed: 12/19/2022] Open

Esposito F, Gillis N, Del Buono N. Orthogonal joint sparse NMF for microarray data analysis. J Math Biol 2019;79:223-247. [PMID: 31004215 DOI: 10.1007/s00285-019-01355-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 03/29/2019] [Indexed: 12/20/2022]

Laplacian regularized low-rank representation for cancer samples clustering. Comput Biol Chem 2018;78:504-509. [PMID: 30528509 DOI: 10.1016/j.compbiolchem.2018.11.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 11/07/2018] [Indexed: 12/18/2022]

Carmona-Sáez P, Varela N, Luque MJ, Toro-Domínguez D, Martorell-Marugan J, Alarcón-Riquelme ME, Marañón C. Metagene projection characterizes GEN2.2 and CAL-1 as relevant human plasmacytoid dendritic cell models. Bioinformatics 2018;33:3691-3695. [PMID: 28961902 DOI: 10.1093/bioinformatics/btx502] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2016] [Accepted: 08/06/2017] [Indexed: 12/24/2022] Open

Liu JX, Wang D, Gao YL, Zheng CH, Xu Y, Yu J. Regularized Non-Negative Matrix Factorization for Identifying Differentially Expressed Genes and Clustering Samples: A Survey. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:974-987. [PMID: 28186906 DOI: 10.1109/tcbb.2017.2665557] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Connectedness-based subspace clustering. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1181-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Gu Q, Veselkov K. Bi-clustering of metabolic data using matrix factorization tools. Methods 2018;151:12-20. [PMID: 29438828 PMCID: PMC6297113 DOI: 10.1016/j.ymeth.2018.02.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 02/04/2018] [Accepted: 02/06/2018] [Indexed: 01/08/2023] Open

Abd Elaziz ME. Simultaneous feature extraction and selection of microarray data using fuzzy-rough based multiobjective nonnegative matrix factorization. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-17954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Scheltens NME, Tijms BM, Koene T, Barkhof F, Teunissen CE, Wolfsgruber S, Wagner M, Kornhuber J, Peters O, Cohn-Sheehy BI, Rabinovici GD, Miller BL, Kramer JH, Scheltens P, van der Flier WM. Cognitive subtypes of probable Alzheimer's disease robustly identified in four cohorts. Alzheimers Dement 2017;13:1226-1236. [PMID: 28427934 PMCID: PMC5857387 DOI: 10.1016/j.jalz.2017.03.002] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Revised: 03/09/2017] [Accepted: 03/09/2017] [Indexed: 01/25/2023]

Affiliation(s)

Nienke M. E. Scheltens Department of Neurology, Alzheimer Center, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands
Betty M. Tijms Department of Neurology, Alzheimer Center, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands
Teddy Koene Department of Medical Psychology, VU University Medical Center, Amsterdam, The Netherlands
Frederik Barkhof Department of Radiology and Nuclear Medicine, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands Institute of Neurology, University College London, London, UK Institute of Healthcare Engineering, University College London, London, UK
Charlotte E. Teunissen Neurochemistry Laboratory and Biobank, Department of Clinical Chemistry, Amsterdam Neuroscience, VU University Medical Centre, Amsterdam, The Netherlands
Steffen Wolfsgruber Department of Psychiatry, University of Bonn, Bonn, Germany German Center for Neurodegenerative Diseases, Bonn, Germany
Michael Wagner Department of Psychiatry, University of Bonn, Bonn, Germany German Center for Neurodegenerative Diseases, Bonn, Germany
Johannes Kornhuber Department of Psychiatry, Friedrich-Alexander-University Erlangen, Erlangen, Germany
Oliver Peters Department of Psychiatry, Charité Berlin, Campus Benjamin Franklin, Berlin, Germany
Brendan I. Cohn-Sheehy Memory and Aging Center, Department of Neurology, University of California San Francisco, San Francisco, CA, USA
Gil D. Rabinovici Memory and Aging Center, Department of Neurology, University of California San Francisco, San Francisco, CA, USA
Bruce L. Miller Memory and Aging Center, Department of Neurology, University of California San Francisco, San Francisco, CA, USA
Joel H. Kramer Memory and Aging Center, Department of Neurology, University of California San Francisco, San Francisco, CA, USA
Philip Scheltens Department of Neurology, Alzheimer Center, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands
Wiesje M. van der Flier Department of Neurology, Alzheimer Center, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands
Alzheimer’s Disease Neuroimaging Initiative
German Dementia Competence Network, University of California San Francisco Memory and Aging Center, and Amsterdam Dementia Cohort

Collapse

Li X, Ma S, Wong KC. Evolving Spatial Clusters of Genomic Regions From High-Throughput Chromatin Conformation Capture Data. IEEE Trans Nanobioscience 2017;16:400-407. [DOI: 10.1109/tnb.2017.2725991] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Ray B, Liu W, Fenyö D. Adaptive Multiview Nonnegative Matrix Factorization Algorithm for Integration of Multimodal Biomedical Data. Cancer Inform 2017;16:1176935117725727. [PMID: 28835735 PMCID: PMC5564898 DOI: 10.1177/1176935117725727] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 07/08/2017] [Indexed: 11/16/2022] Open

Abstract

The amounts and types of available multimodal tumor data are rapidly increasing, and their integration is critical for fully understanding the underlying cancer biology and personalizing treatment. However, the development of methods for effectively integrating multimodal data in a principled manner is lagging behind our ability to generate the data. In this article, we introduce an extension to a multiview nonnegative matrix factorization algorithm (NNMF) for dimensionality reduction and integration of heterogeneous data types and compare the predictive modeling performance of the method on unimodal and multimodal data. We also present a comparative evaluation of our novel multiview approach and current data integration methods. Our work provides an efficient method to extend an existing dimensionality reduction method. We report rigorous evaluation of the method on large-scale quantitative protein and phosphoprotein tumor data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) acquired using state-of-the-art liquid chromatography mass spectrometry. Exome sequencing and RNA-Seq data were also available from The Cancer Genome Atlas for the same tumors. For unimodal data, in case of breast cancer, transcript levels were most predictive of estrogen and progesterone receptor status and copy number variation of human epidermal growth factor receptor 2 status. For ovarian and colon cancers, phosphoprotein and protein levels were most predictive of tumor grade and stage and residual tumor, respectively. When multiview NNMF was applied to multimodal data to predict outcomes, the improvement in performance is not overall statistically significant beyond unimodal data, suggesting that proteomics data may contain more predictive information regarding tumor phenotypes than transcript levels, probably due to the fact that proteins are the functional gene products and therefore a more direct measurement of the functional state of the tumor. Here, we have applied our proposed approach to multimodal molecular data for tumors, but it is generally applicable to dimensionality reduction and joint analysis of any type of multimodal data.

Collapse

Yu G, Yu X, Wang J. Network-aided Bi-Clustering for discovering cancer subtypes. Sci Rep 2017;7:1046. [PMID: 28432308 PMCID: PMC5430742 DOI: 10.1038/s41598-017-01064-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 03/28/2017] [Indexed: 12/18/2022] Open

Shao C, Höfer T. Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 2016;33:235-242. [DOI: 10.1093/bioinformatics/btw607] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 09/15/2016] [Accepted: 09/16/2016] [Indexed: 11/14/2022] Open

Stražar M, Žitnik M, Zupan B, Ule J, Curk T. Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins. Bioinformatics 2016;32:1527-35. [PMID: 26787667 PMCID: PMC4894278 DOI: 10.1093/bioinformatics/btw003] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 01/01/2016] [Indexed: 12/15/2022] Open

Abstract

Motivation: RNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein–RNA interactions by considering all available sources of information, it is necessary to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure. Such integration is possible by matrix factorization, where current approaches have an undesired tendency to identify only a small number of the strongest patterns with overlapping features. Because protein–RNA interactions are orchestrated by multiple factors, methods that identify discriminative patterns of varying strengths are needed.

Results: We have developed an integrative orthogonality-regularized nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. The orthogonality constraint halves the effective size of the factor model and outperforms other NMF models in predicting RBP interaction sites on RNA. We have integrated the largest data compendium to date, which includes 31 CLIP experiments on 19 RBPs involved in splicing (such as hnRNPs, U2AF2, ELAVL1, TDP-43 and FUS) and processing of 3’UTR (Ago, IGF2BP). We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites. In our study the key predictive factors of protein–RNA interactions were the position of RNA structure and sequence motifs, RBP co-binding and gene region type. We report on a number of protein-specific patterns, many of which are consistent with experimentally determined properties of RBPs.

Availability and implementation: The iONMF implementation and example datasets are available at https://github.com/mstrazar/ionmf.

Contact: tomaz.curk@fri.uni-lj.si

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Pontes B, Giráldez R, Aguilar-Ruiz JS. Biclustering on expression data: A review. J Biomed Inform 2015;57:163-80. [PMID: 26160444 DOI: 10.1016/j.jbi.2015.06.028] [Citation(s) in RCA: 165] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 06/22/2015] [Accepted: 06/30/2015] [Indexed: 11/28/2022]

Mejía-Roa E, Tabas-Madrid D, Setoain J, García C, Tirado F, Pascual-Montano A. NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC Bioinformatics 2015;16:43. [PMID: 25887585 PMCID: PMC4339678 DOI: 10.1186/s12859-015-0485-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 01/30/2015] [Indexed: 01/11/2023] Open

Abstract

BACKGROUND

In the last few years, the Non-negative Matrix Factorization ( NMF ) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster. In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units ( GPUs ). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU.

RESULTS

NMF-mGPU is based on CUDA ( Compute Unified Device Architecture ), the NVIDIA's framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system's main memory to the GPU's memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI ( Message Passing Interface ). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup).

CONCLUSIONS

Applications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used "out of the box" by researchers with little or no expertise in GPU programming in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu .

Collapse

Li Y, Ngom A. Versatile sparse matrix factorization: Theory and applications. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.05.076] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

NMF versus ICA for blind source separation. ADV DATA ANAL CLASSI 2014. [DOI: 10.1007/s11634-014-0192-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Li X, Ye Y, Ng M, Wu Q. MultiFacTV: module detection from higher-order time series biological data. BMC Genomics 2013;14 Suppl 4:S2. [PMID: 24268038 PMCID: PMC3856496 DOI: 10.1186/1471-2164-14-s4-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open

Abstract

BACKGROUND

Identifying modules from time series biological data helps us understand biological functionalities of a group of proteins/genes interacting together and how responses of these proteins/genes dynamically change with respect to time. With rapid acquisition of time series biological data from different laboratories or databases, new challenges are posed for the identification task and powerful methods which are able to detect modules with integrative analysis are urgently called for. To accomplish such integrative analysis, we assemble multiple time series biological data into a higher-order form, e.g., a gene × condition × time tensor. It is interesting and useful to develop methods to identify modules from this tensor.

RESULTS

In this paper, we present MultiFacTV, a new method to find modules from higher-order time series biological data. This method employs a tensor factorization objective function where a time-related total variation regularization term is incorporated. According to factorization results, MultiFacTV extracts modules that are composed of some genes, conditions and time-points. We have performed MultiFacTV on synthetic datasets and the results have shown that MultiFacTV outperforms existing methods EDISA and Metafac. Moreover, we have applied MultiFacTV to Arabidopsis thaliana root(shoot) tissue dataset represented as a gene × condition × time tensor of size 2395 × 9 × 6(3454 × 8 × 6), to Yeast dataset and Homo sapiens dataset represented as tensors of sizes 4425 × 6 × 6 and 2920 × 14 × 9 respectively. The results have shown that MultiFacTV indeed identifies some interesting modules in these datasets, which have been validated and explained by Gene Ontology analysis with DAVID or other analysis.

CONCLUSION

Experimental results on both synthetic datasets and real datasets show that the proposed MultiFacTV is effective in identifying modules for higher-order time series biological data. It provides, compared to traditional non-integrative analysis methods, a more comprehensive and better view on biological process since modules composed of more than two types of biological variables could be identified and analyzed.

Collapse

Liao R, Zhang Y, Guan J, Zhou S. CloudNMF: a MapReduce implementation of nonnegative matrix factorization for large-scale biological datasets. GENOMICS PROTEOMICS & BIOINFORMATICS 2013;12:48-51. [PMID: 23933456 PMCID: PMC4411332 DOI: 10.1016/j.gpb.2013.06.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 06/21/2013] [Accepted: 06/26/2013] [Indexed: 12/03/2022]

Chen HC, Zou W, Tien YJ, Chen JJ. Identification of bicluster regions in a binary matrix and its applications. PLoS One 2013;8:e71680. [PMID: 23940779 PMCID: PMC3733970 DOI: 10.1371/journal.pone.0071680] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 07/09/2013] [Indexed: 11/18/2022] Open

Lai Y, Hayashida M, Akutsu T. Survival analysis by penalized regression and matrix factorization. ScientificWorldJournal 2013;2013:632030. [PMID: 23737722 PMCID: PMC3655687 DOI: 10.1155/2013/632030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Accepted: 04/03/2013] [Indexed: 11/18/2022] Open

Li Y, Ngom A. The non-negative matrix factorization toolbox for biological data mining. SOURCE CODE FOR BIOLOGY AND MEDICINE 2013;8:10. [PMID: 23591137 PMCID: PMC3736608 DOI: 10.1186/1751-0473-8-10] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 04/10/2013] [Indexed: 01/06/2023]

Wang JJY, Wang X, Gao X. Non-negative matrix factorization by maximizing correntropy for cancer clustering. BMC Bioinformatics 2013;14:107. [PMID: 23522344 PMCID: PMC3659102 DOI: 10.1186/1471-2105-14-107] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Accepted: 03/08/2013] [Indexed: 11/11/2022] Open

Wang YK, Print CG, Crampin EJ. Biclustering reveals breast cancer tumour subgroups with common clinical features and improves prediction of disease recurrence. BMC Genomics 2013;14:102. [PMID: 23405961 PMCID: PMC3598775 DOI: 10.1186/1471-2164-14-102] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 02/05/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Many studies have revealed correlations between breast tumour phenotypes, variations in gene expression, and patient survival outcomes. The molecular heterogeneity between breast tumours revealed by these studies has allowed prediction of prognosis and has underpinned stratified therapy, where groups of patients with particular tumour types receive specific treatments. The molecular tests used to predict prognosis and stratify treatment usually utilise fixed sets of genomic biomarkers, with the same biomarker sets being used to test all patients. In this paper we suggest that instead of fixed sets of genomic biomarkers, it may be more effective to use a stratified biomarker approach, where optimal biomarker sets are automatically chosen for particular patient groups, analogous to the choice of optimal treatments for groups of similar patients in stratified therapy. We illustrate the effectiveness of a biclustering approach to select optimal gene sets for determining the prognosis of specific strata of patients, based on potentially overlapping, non-discrete molecular characteristics of tumours.

RESULTS

Biclustering identified tightly co-expressed gene sets in the tumours of restricted subgroups of breast cancer patients. The co-expressed genes in these biclusters were significantly enriched for particular biological annotations and gene regulatory modules associated with breast cancer biology. Tumours identified within the same bicluster were more likely to present with similar clinical features. Bicluster membership combined with clinical information could predict patient prognosis in conditional inference tree and ridge regression class prediction models.

CONCLUSIONS

The increasing clinical use of genomic profiling demands identification of more effective methods to segregate patients into prognostic and treatment groups. We have shown that biclustering can be used to select optimal gene sets for determining the prognosis of specific strata of patients.

Collapse

Chen HC, Tsong Y, Chen JJ. Data Mining for Signal Detection of Adverse Event Safety Data. J Biopharm Stat 2013;23:146-60. [DOI: 10.1080/10543406.2013.735780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]