1
|
Jain S, Pei L, Spraggins JM, Angelo M, Carson JP, Gehlenborg N, Ginty F, Gonçalves JP, Hagood JS, Hickey JW, Kelleher NL, Laurent LC, Lin S, Lin Y, Liu H, Naba A, Nakayasu ES, Qian WJ, Radtke A, Robson P, Stockwell BR, Van de Plas R, Vlachos IS, Zhou M, Börner K, Snyder MP. Author Correction: Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nat Cell Biol 2024; 26:839. [PMID: 38429479 DOI: 10.1038/s41556-024-01384-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Affiliation(s)
- Sanjay Jain
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA.
- Department of Pediatrics, Washington University School of Medicine, St Louis, MO, USA.
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA.
| | - Liming Pei
- Center for Mitochondrial and Epigenomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Jeffrey M Spraggins
- Department of Cell and Developmental Biology and the Mass Spectrometry Research Center, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Michael Angelo
- Department of Pathology, Stanford School of Medicine, Stanford, CA, USA
| | - James P Carson
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Joana P Gonçalves
- Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands
| | - James S Hagood
- Department of Pediatrics (Pulmonology) and Program for Rare and Interstitial Lung Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - John W Hickey
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Neil L Kelleher
- Departments of Medicine, Chemistry and Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Louise C Laurent
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Shin Lin
- Division of Cardiology, University of Washington School of Medicine, Seattle, WA, USA
| | - Yiing Lin
- Department of Surgery, Washington University School of Medicine, St Louis, MO, USA
| | - Huiping Liu
- Departments of Pharmacology, Medicine (Hematology and Oncology), Lurie Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Alexandra Naba
- Department of Physiology and Biophysics, University of Illinois at Chicago, Chicago, IL, USA
| | - Ernesto S Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Andrea Radtke
- Lymphocyte Biology Section and Center for Advanced Tissue Imaging, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Paul Robson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Brent R Stockwell
- Department of Biological Sciences and Department of Chemistry, Columbia University, New York, NY, USA
| | - Raf Van de Plas
- Delft Center for Systems and Control, Delft University of Technology, Delft, Netherlands
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, and Harvard Medical School, Boston, MA, USA
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Katy Börner
- Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA.
| | - Michael P Snyder
- Department of Genetics, Stanford School of Medicine, Stanford, CA, USA.
| |
Collapse
|
2
|
Tepeli YI, Seale C, Gonçalves JP. ELISL: early-late integrated synthetic lethality prediction in cancer. Bioinformatics 2024; 40:btad764. [PMID: 38113447 DOI: 10.1093/bioinformatics/btad764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/06/2023] [Accepted: 12/18/2023] [Indexed: 12/21/2023]
Abstract
MOTIVATION Anti-cancer therapies based on synthetic lethality (SL) exploit tumour vulnerabilities for treatment with reduced side effects, by targeting a gene that is jointly essential with another whose function is lost. Computational prediction is key to expedite SL screening, yet existing methods are vulnerable to prevalent selection bias in SL data and reliant on cancer or tissue type-specific omics, which can be scarce. Notably, sequence similarity remains underexplored as a proxy for related gene function and joint essentiality. RESULTS We propose ELISL, Early-Late Integrated SL prediction with forest ensembles, using context-free protein sequence embeddings and context-specific omics from cell lines and tissue. Across eight cancer types, ELISL showed superior robustness to selection bias and recovery of known SL genes, as well as promising cross-cancer predictions. Co-occurring mutations in a BRCA gene and ELISL-predicted pairs from the HH, FGF, WNT, or NEIL gene families were associated with longer patient survival times, revealing therapeutic potential. AVAILABILITY AND IMPLEMENTATION Data: 10.6084/m9.figshare.23607558 & Code: github.com/joanagoncalveslab/ELISL.
Collapse
Affiliation(s)
- Yasin I Tepeli
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft, The Netherlands
| | - Colm Seale
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft, The Netherlands
- Holland Proton Therapy Center (HollandPTC), Delft, The Netherlands
| | - Joana P Gonçalves
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft, The Netherlands
| |
Collapse
|
3
|
Jain S, Pei L, Spraggins JM, Angelo M, Carson JP, Gehlenborg N, Ginty F, Gonçalves JP, Hagood JS, Hickey JW, Kelleher NL, Laurent LC, Lin S, Lin Y, Liu H, Naba A, Nakayasu ES, Qian WJ, Radtke A, Robson P, Stockwell BR, Van de Plas R, Vlachos IS, Zhou M, Börner K, Snyder MP. Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nat Cell Biol 2023; 25:1089-1100. [PMID: 37468756 PMCID: PMC10681365 DOI: 10.1038/s41556-023-01194-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 06/22/2023] [Indexed: 07/21/2023]
Abstract
The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.
Collapse
Affiliation(s)
- Sanjay Jain
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA.
- Department of Pediatrics, Washington University School of Medicine, St Louis, MO, USA.
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA.
| | - Liming Pei
- Center for Mitochondrial and Epigenomic Medicine, Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Jeffrey M Spraggins
- Department of Cell and Developmental Biology and the Mass Spectrometry Research Center, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | - Michael Angelo
- Department of Pathology, Stanford School of Medicine, Stanford, CA, USA
| | - James P Carson
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Joana P Gonçalves
- Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands
| | - James S Hagood
- Department of Pediatrics (Pulmonology) and Program for Rare and Interstitial Lung Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - John W Hickey
- Department of Microbiology and Immunology, Stanford University, Stanford, CA, USA
| | - Neil L Kelleher
- Departments of Medicine, Chemistry and Molecular Biosciences, Northwestern University, Evanston, IL, USA
| | - Louise C Laurent
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Shin Lin
- Division of Cardiology, University of Washington School of Medicine, Seattle, WA, USA
| | - Yiing Lin
- Department of Surgery, Washington University School of Medicine, St Louis, MO, USA
| | - Huiping Liu
- Departments of Pharmacology, Medicine (Hematology and Oncology), Lurie Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Alexandra Naba
- Department of Physiology and Biophysics, University of Illinois at Chicago, Chicago, IL, USA
| | - Ernesto S Nakayasu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Andrea Radtke
- Lymphocyte Biology Section and Center for Advanced Tissue Imaging, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
| | - Paul Robson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Brent R Stockwell
- Department of Biological Sciences and Department of Chemistry, Columbia University, New York, NY, USA
| | - Raf Van de Plas
- Delft Center for Systems and Control, Delft University of Technology, Delft, Netherlands
| | - Ioannis S Vlachos
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Spatial Technologies Unit, Harvard Medical School Initiative for RNA Medicine, Department of Pathology, Beth Israel Deaconess Medical Center, and Harvard Medical School, Boston, MA, USA
| | - Mowei Zhou
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Katy Börner
- Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA.
| | - Michael P Snyder
- Department of Genetics, Stanford School of Medicine, Stanford, CA, USA.
| |
Collapse
|
4
|
Oliveira AS, Nunes GT, Sousajr IP, Gonçalves JP, Lopes JIF, Silva CACE, Silva JA, Amorim L, Paula VS. PREVALÊNCIA MOLECULAR E SOROLÓGICA DO CITOMEGALOVÍRUS HUMANO EM DOADORES DE SANGUE DO RIO DE JANEIRO. Hematol Transfus Cell Ther 2022. [DOI: 10.1016/j.htct.2022.09.1112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
5
|
Seale C, Tepeli Y, Gonçalves JP. Overcoming selection bias in synthetic lethality prediction. Bioinformatics 2022; 38:4360-4368. [PMID: 35876858 PMCID: PMC9477536 DOI: 10.1093/bioinformatics/btac523] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 07/13/2022] [Accepted: 07/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Synthetic lethality (SL) between two genes occurs when simultaneous loss of function leads to cell death. This holds great promise for developing anti-cancer therapeutics that target synthetic lethal pairs of endogenously disrupted genes. Identifying novel SL relationships through exhaustive experimental screens is challenging, due to the vast number of candidate pairs. Computational SL prediction is therefore sought to identify promising SL gene pairs for further experimentation. However, current SL prediction methods lack consideration for generalizability in the presence of selection bias in SL data. RESULTS We show that SL data exhibit considerable gene selection bias. Our experiments designed to assess the robustness of SL prediction reveal that models driven by the topology of known SL interactions (e.g. graph, matrix factorization) are especially sensitive to selection bias. We introduce selection bias-resilient synthetic lethality (SBSL) prediction using regularized logistic regression or random forests. Each gene pair is described by 27 molecular features derived from cancer cell line, cancer patient tissue and healthy donor tissue samples. SBSL models are built and tested using approximately 8000 experimentally derived SL pairs across breast, colon, lung and ovarian cancers. Compared to other SL prediction methods, SBSL showed higher predictive performance, better generalizability and robustness to selection bias. Gene dependency, quantifying the essentiality of a gene for cell survival, contributed most to SBSL predictions. Random forests were superior to linear models in the absence of dependency features, highlighting the relevance of mutual exclusivity of somatic mutations, co-expression in healthy tissue and differential expression in tumour samples. AVAILABILITY AND IMPLEMENTATION https://github.com/joanagoncalveslab/sbsl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Colm Seale
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands
- Holland Proton Therapy Center (HollandPTC), Delft 2600 AC, The Netherlands
| | - Yasin Tepeli
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands
| | - Joana P Gonçalves
- Pattern Recognition & Bioinformatics, Department of Intelligent Systems, Faculty EEMCS, Delft University of Technology, Delft 2628 XE, The Netherlands
| |
Collapse
|
6
|
Bellan DL, Mazepa E, Biscaia SMP, Gonçalves JP, Oliveira CC, Rossi GR, Ferreira LG, Noseda MD, Trindade ES, Duarte MER, Franco CRC. Non-Cytotoxic Sulfated Heterorhamnan from Gayralia brasiliensis Green Seaweed Reduces Driver Features of Melanoma Metastatic Progression. Mar Biotechnol (NY) 2020; 22:194-206. [PMID: 31970542 DOI: 10.1007/s10126-020-09944-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 01/02/2020] [Indexed: 06/10/2023]
Abstract
Melanoma is a form of skin cancer with high mortality owing to its fast progression and metastatic capacity. The treatments available nowadays are only palliative in advanced stages of the disease. Thus, alternative therapies for cancer treatment are in demand, and molecules from natural sources, such as polysaccharides, could represent new possible therapeutic approaches. Polysaccharides of freshwater and marine algae with biological activities, such as antitumor properties, are greatly reported in the scientific literature. In the present study, a sulfated heterorhamnan obtained from the green seaweed Gayralia brasiliensis (Gb1 fraction) was chemically characterized and its biological activities in the B16-F10 murine melanoma cell line were evaluated. The Gb1 polysaccharidic fraction tested concentrations presented low or absence of cytotoxicity to B16-F10 cells and neither cell proliferation nor cell cycle were altered. Interestingly, Gb1 treatment decreased B16-F10 cells migration and invasion capabilities and CD44 labeling, showing to be a promising compound for further in vitro and in vivo antitumor studies.
Collapse
Affiliation(s)
- D L Bellan
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil.
| | - E Mazepa
- Department of Biochemistry, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - S M P Biscaia
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - J P Gonçalves
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - C C Oliveira
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - G R Rossi
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - L G Ferreira
- Department of Biochemistry, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - M D Noseda
- Department of Biochemistry, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - E S Trindade
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil
| | - M E R Duarte
- Department of Biochemistry, Federal University of Paraná, Curitiba, Paraná, Brazil.
| | - C R C Franco
- Department of Cellular Biology, Federal University of Paraná, Curitiba, Paraná, Brazil.
| |
Collapse
|
7
|
Gisler S, Gonçalves JP, Akhtar W, de Jong J, Pindyurin AV, Wessels LFA, van Lohuizen M. Multiplexed Cas9 targeting reveals genomic location effects and gRNA-based staggered breaks influencing mutation efficiency. Nat Commun 2019; 10:1598. [PMID: 30962441 PMCID: PMC6453899 DOI: 10.1038/s41467-019-09551-w] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 03/14/2019] [Indexed: 12/16/2022] Open
Abstract
Understanding the impact of guide RNA (gRNA) and genomic locus on CRISPR-Cas9 activity is crucial to design effective gene editing assays. However, it is challenging to profile Cas9 activity in the endogenous cellular environment. Here we leverage our TRIP technology to integrate ~ 1k barcoded reporter genes in the genomes of mouse embryonic stem cells. We target the integrated reporters (IRs) using RNA-guided Cas9 and characterize induced mutations by sequencing. We report that gRNA-sequence and IR locus explain most variation in mutation efficiency. Predominant insertions of a gRNA-specific nucleotide are consistent with template-dependent repair of staggered DNA ends with 1-bp 5' overhangs. We confirm that such staggered ends are induced by Cas9 in mouse pre-B cells. To explain observed insertions, we propose a model generating primarily blunt and occasionally staggered DNA ends. Mutation patterns indicate that gRNA-sequence controls the fraction of staggered ends, which could be used to optimize Cas9-based insertion efficiency.
Collapse
Affiliation(s)
- Santiago Gisler
- Division of Molecular Genetics, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands
| | - Joana P Gonçalves
- Department of Intelligent Systems, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE, The Netherlands
- Division of Molecular Carcinogenesis, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands
| | - Waseem Akhtar
- Division of Molecular Genetics, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands
| | - Johann de Jong
- Division of Molecular Carcinogenesis, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands
- Data & Translational Sciences Group, UCB Biosciences GmbH, Alfred-Nobel-Straße 10, Monheim am Rhein, 40789, Germany
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology, Siberian Branch of Russian Academy of Sciences, Acad. Lavrentiev Ave. 8, Novosibirsk, 630090, Russia
- Division of Gene Regulation, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands
| | - Lodewyk F A Wessels
- Department of Intelligent Systems, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE, The Netherlands.
- Division of Molecular Carcinogenesis, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands.
| | - Maarten van Lohuizen
- Division of Molecular Genetics, Oncode and The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam, 1066 CX, The Netherlands.
| |
Collapse
|
8
|
Sun K, Gonçalves JP, Larminie C, Przulj N. Predicting disease associations via biological network analysis. BMC Bioinformatics 2014; 15:304. [PMID: 25228247 PMCID: PMC4174675 DOI: 10.1186/1471-2105-15-304] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 08/19/2014] [Indexed: 12/11/2022] Open
Abstract
Background Understanding the relationship between diseases based on the underlying biological mechanisms is one of the greatest challenges in modern biology and medicine. Exploring disease-disease associations by using system-level biological data is expected to improve our current knowledge of disease relationships, which may lead to further improvements in disease diagnosis, prognosis and treatment. Results We took advantage of diverse biological data including disease-gene associations and a large-scale molecular network to gain novel insights into disease relationships. We analysed and compared four publicly available disease-gene association datasets, then applied three disease similarity measures, namely annotation-based measure, function-based measure and topology-based measure, to estimate the similarity scores between diseases. We systematically evaluated disease associations obtained by these measures against a statistical measure of comorbidity which was derived from a large number of medical patient records. Our results show that the correlation between our similarity measures and comorbidity scores is substantially higher than expected at random, confirming that our similarity measures are able to recover comorbidity associations. We also demonstrated that our predicted disease associations correlated with disease associations generated from genome-wide association studies significantly higher than expected at random. Furthermore, we evaluated our predicted disease associations via mining the literature on PubMed, and presented case studies to demonstrate how these novel disease associations can be used to enhance our current knowledge of disease relationships. Conclusions We present three similarity measures for predicting disease associations. The strong correlation between our predictions and known disease associations demonstrates the ability of our measures to provide novel insights into disease relationships. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-304) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Nataša Przulj
- Department of Computing, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
9
|
Gonçalves JP, Madeira SC. LateBiclustering: Efficient Heuristic Algorithm for Time-Lagged Bicluster Identification. IEEE/ACM Trans Comput Biol Bioinform 2014; 11:801-813. [PMID: 26356854 DOI: 10.1109/tcbb.2014.2312007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Identifying patterns in temporal data is key to uncover meaningful relationships in diverse domains, from stock trading to social interactions. Also of great interest are clinical and biological applications, namely monitoring patient response to treatment or characterizing activity at the molecular level. In biology, researchers seek to gain insight into gene functions and dynamics of biological processes, as well as potential perturbations of these leading to disease, through the study of patterns emerging from gene expression time series. Clustering can group genes exhibiting similar expression profiles, but focuses on global patterns denoting rather broad, unspecific responses. Biclustering reveals local patterns, which more naturally capture the intricate collaboration between biological players, particularly under a temporal setting. Despite the general biclustering formulation being NP-hard, considering specific properties of time series has led to efficient solutions for the discovery of temporally aligned patterns. Notably, the identification of biclusters with time-lagged patterns, suggestive of transcriptional cascades, remains a challenge due to the combinatorial explosion of delayed occurrences. Herein, we propose LateBiclustering, a sensible heuristic algorithm enabling a polynomial rather than exponential time solution for the problem. We show that it identifies meaningful time-lagged biclusters relevant to the response of Saccharomyces cerevisiae to heat stress.
Collapse
|
10
|
Abstract
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Collapse
Affiliation(s)
- Joana P. Gonçalves
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| | - Alexandre P. Francisco
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
| | - Yves Moreau
- Electrical Engineering Department, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Sara C. Madeira
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| |
Collapse
|
11
|
Gonçalves JP, Oliveira A, Severo M, Santos AC, Lopes C. Cross-sectional and longitudinal associations between serum uric acid and metabolic syndrome. Endocrine 2012; 41:450-7. [PMID: 22350659 DOI: 10.1007/s12020-012-9629-8] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 02/03/2012] [Indexed: 10/28/2022]
Abstract
Research on the importance of serum uric acid (SUA) as a contributing metabolic factor to cardiovascular diseases has conducted to conflicting results, with most studies assuming a cross-sectional design. The aim of this study was to evaluate the association of SUA and metabolic syndrome (MetS) and its features. A representative sample of 2,485 individuals aged ≥18 years was randomly selected from the non-institutionalized resident population of Porto, Portugal. A total of 1,054 eligible subjects were included for the longitudinal analyses. Hyperuricemia was defined as SUA ≥70 mg/L in men and ≥60 mg/L in women. MetS was defined according the Joint Interim (2009) criteria. Associations were estimated using Poison regression and binomial models. In the cross-sectional analysis, subjects with hyperuricemia had a 2.10-fold increased risk of MetS as compared with normouricemic subjects (PR = 2.10, 95% CI: 1.68-2.63). Among MetS features, high triglycerides presented the strongest association with hyperuricemia (PR = 2.32, 95% CI: 1.84-2.91). The MetS crude incidence rate was 4.5/100 person-year (95% CI: 3.9-5.2) in normal uricemic and 13.0/100 person-year (95% CI: 8.5-20.0) in hyperuricemic participants. Using a multivariate longitudinal approach, hyperuricemia was positively associated with MetS incidence rate ratios (IRR = 1.73, 95% CI: 1.08-2.76). One standard deviation increase of SUA concentration was associated with a 1.22-fold increase in MetS risk (IRR = 1.22, 95% CI: 1.05-1.42). Elevated SUA presented the strongest association with high-triglycerides concentration (IRR = 1.44, 95%: 1.22-1.71) and waist circumference (IRR = 1.25, 95%: 1.05-1.49). The independent positive association between SUA and MetS suggested by this longitudinal study supports that SUA might be a risk factor for MetS.
Collapse
Affiliation(s)
- J P Gonçalves
- Department of Clinical Epidemiology, Predictive Medicine and Public Health and Cardiovascular Research & Development Unit, University of Porto Medical School, Alameda Professor Hernâni Monteiro, 4200-319, Porto, Portugal.
| | | | | | | | | |
Collapse
|
12
|
Gonçalves JP, Moreau Y, Madeira SC. AliBiMotif: Integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences. INT J DATA MIN BIOIN 2012; 6:196-215. [DOI: 10.1504/ijdmb.2012.048198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Gonçalves JP, Francisco AP, Mira NP, Teixeira MC, Sá-Correia I, Oliveira AL, Madeira SC. TFRank: network-based prioritization of regulatory associations underlying transcriptional responses. ACTA ACUST UNITED AC 2011; 27:3149-57. [PMID: 21965816 DOI: 10.1093/bioinformatics/btr546] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor-target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies. RESULTS We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.
Collapse
|
14
|
Nitsch D, Tranchevent LC, Gonçalves JP, Vogt JK, Madeira SC, Moreau Y. PINTA: a web server for network-based gene prioritization from expression data. Nucleic Acids Res 2011; 39:W334-8. [PMID: 21602267 PMCID: PMC3125740 DOI: 10.1093/nar/gkr289] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.
Collapse
Affiliation(s)
- Daniela Nitsch
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, 3001 Leuven, Belgium
| | | | | | | | | | | |
Collapse
|
15
|
Nitsch D, Gonçalves JP, Ojeda F, de Moor B, Moreau Y. Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics 2010; 11:460. [PMID: 20840752 PMCID: PMC2945940 DOI: 10.1186/1471-2105-11-460] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Accepted: 09/14/2010] [Indexed: 02/02/2023] Open
Abstract
Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.
Collapse
Affiliation(s)
- Daniela Nitsch
- Department of Electrical Engineering (ESAT-SCD) Katholieke Universiteit Leuven, 3001 Leuven, Belgium.
| | | | | | | | | |
Collapse
|
16
|
Gonçalves JP, Madeira SC, Oliveira AL. BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data. BMC Res Notes 2009; 2:124. [PMID: 19583847 PMCID: PMC2720980 DOI: 10.1186/1756-0500-2-124] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 07/07/2009] [Indexed: 11/10/2022] Open
Abstract
Background The ability to monitor changes in expression patterns over time, and to observe the emergence of coherent temporal responses using expression time series, is critical to advance our understanding of complex biological processes. Biclustering has been recognized as an effective method for discovering local temporal expression patterns and unraveling potential regulatory mechanisms. The general biclustering problem is NP-hard. In the case of time series this problem is tractable, and efficient algorithms can be used. However, there is still a need for specialized applications able to take advantage of the temporal properties inherent to expression time series, both from a computational and a biological perspective. Findings BiGGEsTS makes available state-of-the-art biclustering algorithms for analyzing expression time series. Gene Ontology (GO) annotations are used to assess the biological relevance of the biclusters. Methods for preprocessing expression time series and post-processing results are also included. The analysis is additionally supported by a visualization module capable of displaying informative representations of the data, including heatmaps, dendrograms, expression charts and graphs of enriched GO terms. Conclusion BiGGEsTS is a free open source graphical software tool for revealing local coexpression of genes in specific intervals of time, while integrating meaningful information on gene annotations. It is freely available at: . We present a case study on the discovery of transcriptional regulatory modules in the response of Saccharomyces cerevisiae to heat stress.
Collapse
Affiliation(s)
- Joana P Gonçalves
- Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Rua Alves Redol, Apartado 13069, 1000-029 Lisboa, Portugal.
| | | | | |
Collapse
|
17
|
Gonçalves JP, Grãos M, Valente AXCN. POLAR MAPPER: a computational tool for integrated visualization of protein interaction networks and mRNA expression data. J R Soc Interface 2008; 6:881-96. [PMID: 19091689 PMCID: PMC2684442 DOI: 10.1098/rsif.2008.0407] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Polar Mapper is a computational application for exposing the architecture of protein interaction networks. It facilitates the system-level analysis of mRNA expression data in the context of the underlying protein interaction network. Preliminary analysis of a human protein interaction network and comparison of the yeast oxidative stress and heat shock gene expression responses are addressed as case studies.
Collapse
|