1
|
Karolak A, Urbaniak K, Monastyrskyi A, Duckett DR, Branciamore S, Stewart PA. Structure-independent machine-learning predictions of the CDK12 interactome. Biophys J 2024; 123:2910-2920. [PMID: 38762754 PMCID: PMC11393676 DOI: 10.1016/j.bpj.2024.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 04/24/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024] Open
Abstract
Cyclin-dependent kinase 12 (CDK12) is a critical regulatory protein involved in transcription and DNA repair processes. Dysregulation of CDK12 has been implicated in various diseases, including cancer. Understanding the CDK12 interactome is pivotal for elucidating its functional roles and potential therapeutic targets. Traditional methods for interactome prediction often rely on protein structure information, limiting applicability to CDK12 characterized by partly disordered terminal C region. In this study, we present a structure-independent machine-learning model that utilizes proteins' sequence and functional data to predict the CDK12 interactome. This approach is motivated by the disordered character of the CDK12 C-terminal region mitigating a structure-driven search for binding partners. Our approach incorporates multiple data sources, including protein-protein interaction networks, functional annotations, and sequence-based features, to construct a comprehensive CDK12 interactome prediction model. The ability to predict CDK12 interactions without relying on structural information is a significant advancement, as many potential interaction partners may lack crystallographic data. In conclusion, our structure-independent machine-learning model presents a powerful tool for predicting the CDK12 interactome and holds promise in advancing our understanding of CDK12 biology, identifying potential therapeutic targets, and facilitating precision-medicine approaches for CDK12-associated diseases.
Collapse
Affiliation(s)
| | - Konstancja Urbaniak
- Department of Computational and Quantitative Medicine, City of Hope, Duarte, California
| | | | - Derek R Duckett
- Department of Drug Discovery, Moffitt Cancer Center, Tampa, Florida
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, City of Hope, Duarte, California
| | - Paul A Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida
| |
Collapse
|
2
|
Branciamore S, Gogoshin G, Rodin AS, Myers AJ. Changes in expression of VGF, SPECC1L, HLA-DRA and RANBP3L act with APOE E4 to alter risk for late onset Alzheimer's disease. Sci Rep 2024; 14:14954. [PMID: 38942763 PMCID: PMC11213882 DOI: 10.1038/s41598-024-65010-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 06/16/2024] [Indexed: 06/30/2024] Open
Abstract
While there are currently over 40 replicated genes with mapped risk alleles for Late Onset Alzheimer's disease (LOAD), the Apolipoprotein E locus E4 haplotype is still the biggest driver of risk, with odds ratios for neuropathologically confirmed E44 carriers exceeding 30 (95% confidence interval 16.59-58.75). We sought to address whether the APOE E4 haplotype modifies expression globally through networks of expression to increase LOAD risk. We have used the Human Brainome data to build expression networks comparing APOE E4 carriers to non-carriers using scalable mixed-datatypes Bayesian network (BN) modeling. We have found that VGF had the greatest explanatory weight. High expression of VGF is a protective signal, even on the background of APOE E4 alleles. LOAD risk signals, considering an APOE background, include high levels of SPECC1L, HLA-DRA and RANBP3L. Our findings nominate several new transcripts, taking a combined approach to network building including known LOAD risk loci.
Collapse
Affiliation(s)
- Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA, 91010, USA
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA, 91010, USA
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA, 91010, USA.
| | - Amanda J Myers
- Department of Cell Biology, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
- Institute for Data Science and Computing, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
- Interdepartmental Program in Neuroscience, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
- Interdepartmental Program in Human Genetics and Genomics, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
| |
Collapse
|
3
|
Mukhaleva E, Ma N, van der Velden WJC, Gogoshin G, Branciamore S, Bhattacharya S, Rodin AS, Vaidehi N. Bayesian network models identify cooperative GPCR:G protein interactions that contribute to G protein coupling. J Biol Chem 2024; 300:107362. [PMID: 38735478 PMCID: PMC11176750 DOI: 10.1016/j.jbc.2024.107362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 05/03/2024] [Accepted: 05/04/2024] [Indexed: 05/14/2024] Open
Abstract
Cooperative interactions in protein-protein interfaces demonstrate the interdependency or the linked network-like behavior and their effect on the coupling of proteins. Cooperative interactions also could cause ripple or allosteric effects at a distance in protein-protein interfaces. Although they are critically important in protein-protein interfaces, it is challenging to determine which amino acid pair interactions are cooperative. In this work, we have used Bayesian network modeling, an interpretable machine learning method, combined with molecular dynamics trajectories to identify the residue pairs that show high cooperativity and their allosteric effect in the interface of G protein-coupled receptor (GPCR) complexes with Gα subunits. Our results reveal six GPCR:Gα contacts that are common to the different Gα subtypes and show strong cooperativity in the formation of interface. Both the C terminus helix5 and the core of the G protein are codependent entities and play an important role in GPCR coupling. We show that a promiscuous GPCR coupling to different Gα subtypes, makes all the GPCR:Gα contacts that are specific to each Gα subtype (Gαs, Gαi, and Gαq). This work underscores the potential of data-driven Bayesian network modeling in elucidating the intricate dependencies and selectivity determinants in GPCR:G protein complexes, offering valuable insights into the dynamic nature of these essential cellular signaling components.
Collapse
Affiliation(s)
- Elizaveta Mukhaleva
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA; Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | - Ning Ma
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | - Wijnand J C van der Velden
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA; Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, California, USA.
| | - Supriyo Bhattacharya
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA.
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA; Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, California, USA.
| | - Nagarajan Vaidehi
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA; Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, California, USA.
| |
Collapse
|
4
|
Lefevre TJ, Wei W, Mukhaleva E, Meda Venkata SP, Chandan NR, Abraham S, Li Y, Dessauer CW, Vaidehi N, Smrcka AV. Stabilization of interdomain interactions in G protein α subunits as a determinant of Gα i subtype signaling specificity. J Biol Chem 2024; 300:107211. [PMID: 38522511 PMCID: PMC11066577 DOI: 10.1016/j.jbc.2024.107211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 03/07/2024] [Accepted: 03/12/2024] [Indexed: 03/26/2024] Open
Abstract
Highly homologous members of the Gαi family, Gαi1-3, have distinct tissue distributions and physiological functions, yet their biochemical and functional properties are very similar. We recently identified PDZ-RhoGEF (PRG) as a novel Gαi1 effector that is poorly activated by Gαi2. In a proteomic proximity labeling screen we observed a strong preference for Gαi1 relative to Gαi2 with respect to engagement of a broad range of potential targets. We investigated the mechanistic basis for this selectivity using PRG as a representative target. Substitution of either the helical domain (HD) from Gαi1 into Gαi2 or substitution of a single amino acid, A230 in Gαi2 with the corresponding D in Gαi1, largely rescues PRG activation and interactions with other potential Gαi targets. Molecular dynamics simulations combined with Bayesian network models revealed that in the GTP bound state, separation at the HD-Ras-like domain (RLD) interface is more pronounced in Gαi2 than Gαi1. Mutation of A230 to D in Gαi2 stabilizes HD-RLD interactions via ionic interactions with R145 in the HD which in turn modify the conformation of Switch III. These data support a model where D229 in Gαi1 interacts with R144 and stabilizes a network of interactions between HD and RLD to promote protein target recognition. The corresponding A230 in Gαi2 is unable to stabilize this network leading to an overall lower efficacy with respect to target interactions. This study reveals distinct mechanistic properties that could underly differential biological and physiological consequences of activation of Gαi1 or Gαi2 by G protein-coupled receptors.
Collapse
Affiliation(s)
- Tyler J Lefevre
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, Michigan, USA; Program in Chemical Biology, University of Michigan, Ann Arbor, Michigan, USA
| | - Wenyuan Wei
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | - Elizaveta Mukhaleva
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | | | - Naincy R Chandan
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, Michigan, USA; Genentech, South San Francisco, California, USA
| | - Saji Abraham
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Yong Li
- Department of Integrative Biology and Pharmacology McGovern Medical School, UTHealth, Houston, Texas, USA
| | - Carmen W Dessauer
- Department of Integrative Biology and Pharmacology McGovern Medical School, UTHealth, Houston, Texas, USA
| | - Nagarajan Vaidehi
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, California, USA
| | - Alan V Smrcka
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, Michigan, USA.
| |
Collapse
|
5
|
Abeywardana T, Wu X, Huang ST, Aldana Masangkay G, Rodin AS, Branciamore S, Gogoshin G, Li A, Du L, Tharuka N, Tomaino R, Chen Y. Regulation of Enhancers by SUMOylation Through TFAP2C Binding and Recruitment of HDAC Complex to the Chromatin. RESEARCH SQUARE 2024:rs.3.rs-4201913. [PMID: 38645262 PMCID: PMC11030540 DOI: 10.21203/rs.3.rs-4201913/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Enhancers are fundamental to gene regulation. Post-translational modifications by the small ubiquitin-like modifiers (SUMO) modify chromatin regulation enzymes, including histone acetylases and deacetylases. However, it remains unclear whether SUMOylation regulates enhancer marks, acetylation at the 27th lysine residue of the histone H3 protein (H3K27Ac). To investigate whether SUMOylation regulates H3K27Ac, we performed genome-wide ChIP-seq analyses and discovered that knockdown (KD) of the SUMO activating enzyme catalytic subunit UBA2 reduced H3K27Ac at most enhancers. Bioinformatic analysis revealed that TFAP2C-binding sites are enriched in enhancers whose H3K27Ac was reduced by UBA2 KD. ChIP-seq analysis in combination with molecular biological methods showed that TFAP2C binding to enhancers increased upon UBA2 KD or inhibition of SUMOylation by a small molecule SUMOylation inhibitor. However, this is not due to the SUMOylation of TFAP2C itself. Proteomics analysis of TFAP2C interactome on the chromatin identified histone deacetylation (HDAC) and RNA splicing machineries that contain many SUMOylation targets. TFAP2C KD reduced HDAC1 binding to chromatin and increased H3K27Ac marks at enhancer regions, suggesting that TFAP2C is important in recruiting HDAC machinery. Taken together, our findings provide insights into the regulation of enhancer marks by SUMOylation and TFAP2C and suggest that SUMOylation of proteins in the HDAC machinery regulates their recruitments to enhancers.
Collapse
Affiliation(s)
| | - Xiwei Wu
- Toni Stephenson Lymphoma Center Beckman Research Institute, City of Hope
| | | | | | - Andrei S Rodin
- Toni Stephenson Lymphoma Center Beckman Research Institute, City of Hope
| | - Sergio Branciamore
- Toni Stephenson Lymphoma Center Beckman Research Institute, City of Hope
| | - Grigoriy Gogoshin
- Toni Stephenson Lymphoma Center Beckman Research Institute, City of Hope
| | - Arthur Li
- Toni Stephenson Lymphoma Center Beckman Research Institute, City of Hope
| | - Li Du
- Toni Stephenson Lymphoma Center Beckman Research Institute, City of Hope
| | | | - Ross Tomaino
- Harvard Medical School Taplin Mass Spectrometry Facility
| | | |
Collapse
|
6
|
Gogoshin G, Rodin AS. Graph Neural Networks in Cancer and Oncology Research: Emerging and Future Trends. Cancers (Basel) 2023; 15:5858. [PMID: 38136405 PMCID: PMC10742144 DOI: 10.3390/cancers15245858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023] Open
Abstract
Next-generation cancer and oncology research needs to take full advantage of the multimodal structured, or graph, information, with the graph data types ranging from molecular structures to spatially resolved imaging and digital pathology, biological networks, and knowledge graphs. Graph Neural Networks (GNNs) efficiently combine the graph structure representations with the high predictive performance of deep learning, especially on large multimodal datasets. In this review article, we survey the landscape of recent (2020-present) GNN applications in the context of cancer and oncology research, and delineate six currently predominant research areas. We then identify the most promising directions for future research. We compare GNNs with graphical models and "non-structured" deep learning, and devise guidelines for cancer and oncology researchers or physician-scientists, asking the question of whether they should adopt the GNN methodology in their research pipelines.
Collapse
Affiliation(s)
- Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Andrei S. Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| |
Collapse
|
7
|
Branciamore S, Gogoshin G, Rodin AS, Myers AJ. The Human Brainome: changes in expression of VGF, SPECC1L, HLA-DRA and RANBP3L act with APOE E4 to alter risk for late onset Alzheimer's disease. RESEARCH SQUARE 2023:rs.3.rs-3678057. [PMID: 38168398 PMCID: PMC10760217 DOI: 10.21203/rs.3.rs-3678057/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
While there are currently over 40 replicated genes with mapped risk alleles for Late Onset Alzheimer's disease (LOAD), the Apolipoprotein E locus E4 haplotype is still the biggest driver of risk, with odds ratios for neuropathologically confirmed E44 carriers exceeding 30 (95% confidence interval 16.59-58.75). We sought to address whether the APOE E4 haplotype modifies expression globally through networks of expression to increase LOAD risk. We have used the Human Brainome data to build expression networks comparing APOE E4 carriers to non-carriers using scalable mixed-datatypes Bayesian network (BN) modeling. We have found that VGF had the greatest explanatory weight. High expression of VGF is a protective signal, even on the background of APOE E4 alleles. LOAD risk signals, considering an APOE background, include high levels of SPECC1L, HLA-DRA and RANBP3L. Our findings nominate several new transcripts, taking a combined approach to network building including known LOAD risk loci.
Collapse
|
8
|
Mukhaleva E, Ma N, van der Velden WJC, Gogoshin G, Branciamore S, Bhattacharya S, Rodin AS, Vaidehi N. Bayesian network models identify co-operative GPCR:G protein interactions that contribute to G protein coupling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.09.561618. [PMID: 37873104 PMCID: PMC10592737 DOI: 10.1101/2023.10.09.561618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Cooperative interactions in protein-protein interfaces demonstrate the interdependency or the linked network-like behavior of interface interactions and their effect on the coupling of proteins. Cooperative interactions also could cause ripple or allosteric effects at a distance in protein-protein interfaces. Although they are critically important in protein-protein interfaces it is challenging to determine which amino acid pair interactions are cooperative. In this work we have used Bayesian network modeling, an interpretable machine learning method, combined with molecular dynamics trajectories to identify the residue pairs that show high cooperativity and their allosteric effect in the interface of G protein-coupled receptor (GPCR) complexes with G proteins. Our results reveal a strong co-dependency in the formation of interface GPCR:G protein contacts. This observation indicates that cooperativity of GPCR:G protein interactions is necessary for the coupling and selectivity of G proteins and is thus critical for receptor function. We have identified subnetworks containing polar and hydrophobic interactions that are common among multiple GPCRs coupling to different G protein subtypes (Gs, Gi and Gq). These common subnetworks along with G protein-specific subnetworks together confer selectivity to the G protein coupling. This work underscores the potential of data-driven Bayesian network modeling in elucidating the intricate dependencies and selectivity determinants in GPCR:G protein complexes, offering valuable insights into the dynamic nature of these essential cellular signaling components.
Collapse
Affiliation(s)
- Elizaveta Mukhaleva
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Ning Ma
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Wijnand J. C. van der Velden
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Supriyo Bhattacharya
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Andrei S. Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| | - Nagarajan Vaidehi
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA 91010
- Irell and Manella Graduate School of Biological Sciences, Beckman Research Institute of the City of Hope, Duarte, CA 91010
| |
Collapse
|
9
|
Lefevre TJ, Wei W, Mukhaleva E, Venkata SPM, Chandan NR, Abraham S, Li Y, Dessauer CW, Vaidehi N, Smrcka AV. Stabilization of Interdomain Interactions in G protein α i Subunits Determines Gα i Subtype Signaling Specificity. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.10.532072. [PMID: 37066214 PMCID: PMC10103935 DOI: 10.1101/2023.03.10.532072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Highly homologous members of the Gαi family, Gαi1-3, have distinct tissue distributions and physiological functions, yet the functional properties of these proteins with respect to GDP/GTP binding and regulation of adenylate cyclase are very similar. We recently identified PDZ-RhoGEF (PRG) as a novel Gαi1 effector, however, it is poorly activated by Gαi2. Here, in a proteomic proximity labeling screen we observed a strong preference for Gαi1 relative to Gαi2 with respect to engagement of a broad range of potential targets. We investigated the mechanistic basis for this selectivity using PRG as a representative target. Substitution of either the helical domain (HD) from Gαi1 into Gαi2 or substitution of a single amino acid, A230 in Gαi2 to the corresponding D in Gαi1, largely rescues PRG activation and interactions with other Gαi targets. Molecular dynamics simulations combined with Bayesian network models revealed that in the GTP bound state, dynamic separation at the HD-Ras-like domain (RLD) interface is prevalent in Gαi2 relative to Gαi1 and that mutation of A230s4h3.3 to D in Gαi2 stabilizes HD-RLD interactions through formation of an ionic interaction with R145HD.11 in the HD. These interactions in turn modify the conformation of Switch III. These data support a model where D229s4h3.3 in Gαi1 interacts with R144HD.11 stabilizes a network of interactions between HD and RLD to promote protein target recognition. The corresponding A230 in Gαi2 is unable to form the "ionic lock" to stabilize this network leading to an overall lower efficacy with respect to target interactions. This study reveals distinct mechanistic properties that could underly differential biological and physiological consequences of activation of Gαi1 or Gαi2 by GPCRs.
Collapse
Affiliation(s)
- Tyler J. Lefevre
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI
- Program in Chemical Biology, University of Michigan, Ann Arbor, MI
| | - Wenyuan Wei
- Department of Integrative Biology and Pharmacology McGovern Medical School, UTHealth, Houston, TX
| | - Elizaveta Mukhaleva
- Department of Integrative Biology and Pharmacology McGovern Medical School, UTHealth, Houston, TX
| | | | - Naincy R. Chandan
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI
- Genentech, South San Francisco, CA
| | - Saji Abraham
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI
| | - Yong Li
- Department of Integrative Biology and Pharmacology McGovern Medical School, UTHealth, Houston, TX
| | - Carmen W. Dessauer
- Department of Integrative Biology and Pharmacology McGovern Medical School, UTHealth, Houston, TX
| | - Nagarajan Vaidehi
- Department of Computational and Quantitative Medicine, Beckman Research Institute of the City of Hope, Duarte, CA
| | - Alan V. Smrcka
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI
| |
Collapse
|
10
|
Hilliard S, Mosoyan K, Branciamore S, Gogoshin G, Zhang A, Simons DL, Rockne RC, Lee PP, Rodin AS. Bow-tie architectures in biological and artificial neural networks: Implications for network evolution and assay design. iScience 2023; 26:106041. [PMID: 36818303 PMCID: PMC9929672 DOI: 10.1016/j.isci.2023.106041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 01/09/2023] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open
Abstract
Modern artificial neural networks (ANNs) have long been designed on foundations of mathematics as opposed to their original foundations of biomimicry. However, the structure and function of these modern ANNs are often analogous to real-life biological networks. We propose that the ubiquitous information-theoretic principles underlying the development of ANNs are similar to the principles guiding the macro-evolution of biological networks and that insights gained from one field can be applied to the other. We generate hypotheses on the bow-tie network structure of the Janus kinase - signal transducers and activators of transcription (JAK-STAT) pathway, additionally informed by the evolutionary considerations, and carry out ANN simulation experiments to demonstrate that an increase in the network's input and output complexity does not necessarily require a more complex intermediate layer. This observation should guide novel biomarker discovery-namely, to prioritize sections of the biological networks in which information is most compressed as opposed to biomarkers representing the periphery of the network.
Collapse
Affiliation(s)
- Seth Hilliard
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Karen Mosoyan
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Alvin Zhang
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Diana L. Simons
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Russell C. Rockne
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Peter P. Lee
- Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| | - Andrei S. Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, 1500 East Duarte Road, Duarte, CA 91010, USA
| |
Collapse
|
11
|
Ma Y, Fa B, Yuan X, Zhang Y, Yu Z. STS-BN: An efficient Bayesian network method for detecting causal SNPs. Front Genet 2022; 13:942464. [PMID: 36186431 PMCID: PMC9520706 DOI: 10.3389/fgene.2022.942464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 08/16/2022] [Indexed: 11/16/2022] Open
Abstract
Background: The identification of the causal SNPs of complex diseases in large-scale genome-wide association analysis is beneficial to the studies of pathogenesis, prevention, diagnosis and treatment of these diseases. However, existing applicable methods for large-scale data suffer from low accuracy. Developing powerful and accurate methods for detecting SNPs associated with complex diseases is highly desired. Results: We propose a score-based two-stage Bayesian network method to identify causal SNPs of complex diseases for case-control designs. This method combines the ideas of constraint-based methods and score-and-search methods to learn the structure of the disease-centered local Bayesian network. Simulation experiments are conducted to compare this new algorithm with several common methods that can achieve the same function. The results show that our method improves the accuracy and stability compared to several common methods. Our method based on Bayesian network theory results in lower false-positive rates when all correct loci are detected. Besides, real-world data application suggests that our algorithm has good performance when handling genome-wide association data. Conclusion: The proposed method is designed to identify the SNPs related to complex diseases, and is more accurate than other methods which can also be adapted to large-scale genome-wide analysis studies data.
Collapse
Affiliation(s)
- Yanran Ma
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Botao Fa
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Xi’an Jiaotong University, Xi’an, China
| | - Xin Yuan
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Yue Zhang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- *Correspondence: Yue Zhang, ; Zhangsheng Yu,
| | - Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- *Correspondence: Yue Zhang, ; Zhangsheng Yu,
| |
Collapse
|
12
|
Identifying large scale interaction atlases using probabilistic graphs and external knowledge. J Clin Transl Sci 2022; 6:e27. [PMID: 35321220 PMCID: PMC8922291 DOI: 10.1017/cts.2022.18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/29/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022] Open
Abstract
Introduction: Reconstruction of gene interaction networks from experimental data provides a deep understanding of the underlying biological mechanisms. The noisy nature of the data and the large size of the network make this a very challenging task. Complex approaches handle the stochastic nature of the data but can only do this for small networks; simpler, linear models generate large networks but with less reliability. Methods: We propose a divide-and-conquer approach using probabilistic graph representations and external knowledge. We cluster the experimental data and learn an interaction network for each cluster, which are merged using the interaction network for the representative genes selected for each cluster. Results: We generated an interaction atlas for 337 human pathways yielding a network of 11,454 genes with 17,777 edges. Simulated gene expression data from this atlas formed the basis for reconstruction. Based on the area under the curve of the precision-recall curve, the proposed approach outperformed the baseline (random classifier) by ∼15-fold and conventional methods by ∼5–17-fold. The performance of the proposed workflow is significantly linked to the accuracy of the clustering step that tries to identify the modularity of the underlying biological mechanisms. Conclusions: We provide an interaction atlas generation workflow optimizing the algorithm/parameter selection. The proposed approach integrates external knowledge in the reconstruction of the interactome using probabilistic graphs. Network characterization and understanding long-range effects in interaction atlases provide means for comparative analysis with implications in biomarker discovery and therapeutic approaches. The proposed workflow is freely available at http://otulab.unl.edu/atlas.
Collapse
|
13
|
Tripp BA, Otu HH. Integration of Multi-Omics Data Using Probabilistic Graph Models and
External Knowledge. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210906141545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
High-throughput sequencing technologies have revolutionized the ability to
perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive
characterization of different layers of biological information. Integration of these heterogeneous
layers can provide insight into the underlying biology but is challenged by modeling complex interactions.
Objective:
We introduce OBaNK: omics integration using Bayesian networks and external knowledge,
an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate
complex functional clusters and emergent relationships associated with an observed phenotype.
Method:
Using Bayesian network learning, we modeled the statistical dependencies and interactions
between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between
molecules was altered based on external knowledge.
Results :
Networks learned from synthetic datasets based on real pathways achieved an average area under
the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real
multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological
data were identified, and the results were compared to other multi-omics integration approaches.
Conclusion:
OBaNK successfully improved the accuracy of learning interaction networks from data integrating
external knowledge, identified heterogeneous functional networks from real data, and suggested
potential novel interactions associated with the phenotype. These findings can guide future hypothesis
generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a
graphical user interface is available at: http://otulab.unl.edu/OBaNK.
Collapse
Affiliation(s)
- Bridget A. Tripp
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
- PhD Program of Complex Biosystems, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
14
|
Gogoshin G, Branciamore S, Rodin AS. Synthetic data generation with probabilistic Bayesian Networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:8603-8621. [PMID: 34814315 PMCID: PMC8848551 DOI: 10.3934/mbe.2021426] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Bayesian Network (BN) modeling is a prominent and increasingly popular computational systems biology method. It aims to construct network graphs from the large heterogeneous biological datasets that reflect the underlying biological relationships. Currently, a variety of strategies exist for evaluating BN methodology performance, ranging from utilizing artificial benchmark datasets and models, to specialized biological benchmark datasets, to simulation studies that generate synthetic data from predefined network models. The last is arguably the most comprehensive approach; however, existing implementations often rely on explicit and implicit assumptions that may be unrealistic in a typical biological data analysis scenario, or are poorly equipped for automated arbitrary model generation. In this study, we develop a purely probabilistic simulation framework that addresses the demands of statistically sound simulations studies in an unbiased fashion. Additionally, we expand on our current understanding of the theoretical notions of causality and dependence / conditional independence in BNs and the Markov Blankets within.
Collapse
|
15
|
Le J, Park JE, Ha VL, Luong A, Branciamore S, Rodin AS, Gogoshin G, Li F, Loh YHE, Camacho V, Patel SB, Welner RS, Parekh C. Single-Cell RNA-Seq Mapping of Human Thymopoiesis Reveals Lineage Specification Trajectories and a Commitment Spectrum in T Cell Development. Immunity 2021; 52:1105-1118.e9. [PMID: 32553173 DOI: 10.1016/j.immuni.2020.05.010] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 04/20/2020] [Accepted: 05/22/2020] [Indexed: 12/21/2022]
Abstract
The challenges in recapitulating in vivo human T cell development in laboratory models have posed a barrier to understanding human thymopoiesis. Here, we used single-cell RNA sequencing (sRNA-seq) to interrogate the rare CD34+ progenitor and the more differentiated CD34- fractions in the human postnatal thymus. CD34+ thymic progenitors were comprised of a spectrum of specification and commitment states characterized by multilineage priming followed by gradual T cell commitment. The earliest progenitors in the differentiation trajectory were CD7- and expressed a stem-cell-like transcriptional profile, but had also initiated T cell priming. Clustering analysis identified a CD34+ subpopulation primed for the plasmacytoid dendritic lineage, suggesting an intrathymic dendritic specification pathway. CD2 expression defined T cell commitment stages where loss of B cell potential preceded that of myeloid potential. These datasets delineate gene expression profiles spanning key differentiation events in human thymopoiesis and provide a resource for the further study of human T cell development.
Collapse
Affiliation(s)
- Justin Le
- Cancer and Blood Disease Institute, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Jeong Eun Park
- Cancer and Blood Disease Institute, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Vi Luan Ha
- Cancer and Blood Disease Institute, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Annie Luong
- Cancer and Blood Disease Institute, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, Duarte, CA, USA
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, Duarte, CA, USA
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, and Diabetes and Metabolism Research Institute, City of Hope National Medical Center, Duarte, CA, USA
| | - Fan Li
- Cancer and Blood Disease Institute, Children's Hospital Los Angeles, Los Angeles, CA, USA
| | | | - Virginia Camacho
- Division of Hematology and Oncology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sweta B Patel
- Division of Hematology and Oncology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert S Welner
- Division of Hematology and Oncology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Chintan Parekh
- Cancer and Blood Disease Institute, Children's Hospital Los Angeles, Los Angeles, CA, USA; Department of Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
16
|
Dórea FC, Revie CW. Data-Driven Surveillance: Effective Collection, Integration, and Interpretation of Data to Support Decision Making. Front Vet Sci 2021; 8:633977. [PMID: 33778039 PMCID: PMC7994248 DOI: 10.3389/fvets.2021.633977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 02/18/2021] [Indexed: 11/20/2022] Open
Abstract
The biggest change brought about by the “era of big data” to health in general, and epidemiology in particular, relates arguably not to the volume of data encountered, but to its variety. An increasing number of new data sources, including many not originally collected for health purposes, are now being used for epidemiological inference and contextualization. Combining evidence from multiple data sources presents significant challenges, but discussions around this subject often confuse issues of data access and privacy, with the actual technical challenges of data integration and interoperability. We review some of the opportunities for connecting data, generating information, and supporting decision-making across the increasingly complex “variety” dimension of data in population health, to enable data-driven surveillance to go beyond simple signal detection and support an expanded set of surveillance goals.
Collapse
Affiliation(s)
- Fernanda C Dórea
- Department of Disease Control and Epidemiology, National Veterinary Institute, Uppsala, Sweden
| | - Crawford W Revie
- Computer and Information Sciences, University of Strathclyde, Glasgow, United Kingdom
| |
Collapse
|
17
|
Rodin AS, Gogoshin G, Hilliard S, Wang L, Egelston C, Rockne RC, Chao J, Lee PP. Dissecting Response to Cancer Immunotherapy by Applying Bayesian Network Analysis to Flow Cytometry Data. Int J Mol Sci 2021; 22:ijms22052316. [PMID: 33652558 PMCID: PMC7956201 DOI: 10.3390/ijms22052316] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 12/11/2022] Open
Abstract
Cancer immunotherapy, specifically immune checkpoint blockade, has been found to be effective in the treatment of metastatic cancers. However, only a subset of patients achieve clinical responses. Elucidating pretreatment biomarkers predictive of sustained clinical response is a major research priority. Another research priority is evaluating changes in the immune system before and after treatment in responders vs. nonresponders. Our group has been studying immune networks as an accurate reflection of the global immune state. Flow cytometry (FACS, fluorescence-activated cell sorting) data characterizing immune cell panels in peripheral blood mononuclear cells (PBMC) from gastroesophageal adenocarcinoma (GEA) patients were used to analyze changes in immune networks in this setting. Here, we describe a novel computational pipeline to perform secondary analyses of FACS data using systems biology/machine learning techniques and concepts. The pipeline is centered around comparative Bayesian network analyses of immune networks and is capable of detecting strong signals that conventional methods (such as FlowJo manual gating) might miss. Future studies are planned to validate and follow up the immune biomarkers (and combinations/interactions thereof) associated with clinical responses identified with this computational pipeline.
Collapse
Affiliation(s)
- Andrei S. Rodin
- City of Hope National Medical Center, Department of Computational and Quantitative Medicine, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (G.G.); (S.H.); (R.C.R.)
- Correspondence: (A.S.R.); (P.P.L.)
| | - Grigoriy Gogoshin
- City of Hope National Medical Center, Department of Computational and Quantitative Medicine, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (G.G.); (S.H.); (R.C.R.)
| | - Seth Hilliard
- City of Hope National Medical Center, Department of Computational and Quantitative Medicine, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (G.G.); (S.H.); (R.C.R.)
| | - Lei Wang
- City of Hope National Medical Center, Department of Immuno-Oncology, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (L.W.); (C.E.)
| | - Colt Egelston
- City of Hope National Medical Center, Department of Immuno-Oncology, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (L.W.); (C.E.)
| | - Russell C. Rockne
- City of Hope National Medical Center, Department of Computational and Quantitative Medicine, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (G.G.); (S.H.); (R.C.R.)
| | - Joseph Chao
- City of Hope National Medical Center, Department of Medical Oncology & Therapeutics Research, 1500 East Duarte Road, Duarte, CA 91010, USA;
| | - Peter P. Lee
- City of Hope National Medical Center, Department of Immuno-Oncology, Beckman Research Institute, 1500 East Duarte Road, Duarte, CA 91010, USA; (L.W.); (C.E.)
- Correspondence: (A.S.R.); (P.P.L.)
| |
Collapse
|
18
|
Wang X, Branciamore S, Gogoshin G, Ding S, Rodin AS. New Analysis Framework Incorporating Mixed Mutual Information and Scalable Bayesian Networks for Multimodal High Dimensional Genomic and Epigenomic Cancer Data. Front Genet 2020; 11:648. [PMID: 32625238 PMCID: PMC7314938 DOI: 10.3389/fgene.2020.00648] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 05/28/2020] [Indexed: 12/14/2022] Open
Abstract
We propose a novel two-stage analysis strategy to discover candidate genes associated with the particular cancer outcomes in large multimodal genomic cancers databases, such as The Cancer Genome Atlas (TCGA). During the first stage, we use mixed mutual information to perform variable selection; during the second stage, we use scalable Bayesian network (BN) modeling to identify candidate genes and their interactions. Two crucial features of the proposed approach are (i) the ability to handle mixed data types (continuous and discrete, genomic, epigenomic, etc.) and (ii) a flexible boundary between the variable selection and network modeling stages - the boundary that can be adjusted in accordance with the investigators' BN software scalability and hardware implementation. These two aspects result in high generalizability of the proposed analytical framework. We apply the above strategy to three different TCGA datasets (LGG, Brain Lower Grade Glioma; HNSC, Head and Neck Squamous Cell Carcinoma; STES, Stomach and Esophageal Carcinoma), linking multimodal molecular information (SNPs, mRNA expression, DNA methylation) to two clinical outcome variables (tumor status and patient survival). We identify 11 candidate genes, of which 6 have already been directly implicated in the cancer literature. One novel LGG prognostic factor suggested by our analysis, methylation of TMPRSS11F type II transmembrane serine protease, presents intriguing direction for the follow-up studies.
Collapse
Affiliation(s)
- Xichun Wang
- Department of Computational and Quantitative Medicine, Beckman Research Institute and Diabetes and Metabolism Research Institute of the City of Hope, Duarte, CA, United States
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, Beckman Research Institute and Diabetes and Metabolism Research Institute of the City of Hope, Duarte, CA, United States
| | - Grigoriy Gogoshin
- Department of Computational and Quantitative Medicine, Beckman Research Institute and Diabetes and Metabolism Research Institute of the City of Hope, Duarte, CA, United States
| | - Shuyu Ding
- Department of Computational and Quantitative Medicine, Beckman Research Institute and Diabetes and Metabolism Research Institute of the City of Hope, Duarte, CA, United States
| | - Andrei S Rodin
- Department of Computational and Quantitative Medicine, Beckman Research Institute and Diabetes and Metabolism Research Institute of the City of Hope, Duarte, CA, United States
| |
Collapse
|
19
|
Gabaldón T. Recent trends in molecular diagnostics of yeast infections: from PCR to NGS. FEMS Microbiol Rev 2019; 43:517-547. [PMID: 31158289 PMCID: PMC8038933 DOI: 10.1093/femsre/fuz015] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 05/31/2019] [Indexed: 12/29/2022] Open
Abstract
The incidence of opportunistic yeast infections in humans has been increasing over recent years. These infections are difficult to treat and diagnose, in part due to the large number and broad diversity of species that can underlie the infection. In addition, resistance to one or several antifungal drugs in infecting strains is increasingly being reported, severely limiting therapeutic options and showcasing the need for rapid detection of the infecting agent and its drug susceptibility profile. Current methods for species and resistance identification lack satisfactory sensitivity and specificity, and often require prior culturing of the infecting agent, which delays diagnosis. Recently developed high-throughput technologies such as next generation sequencing or proteomics are opening completely new avenues for more sensitive, accurate and fast diagnosis of yeast pathogens. These approaches are the focus of intensive research, but translation into the clinics requires overcoming important challenges. In this review, we provide an overview of existing and recently emerged approaches that can be used in the identification of yeast pathogens and their drug resistance profiles. Throughout the text we highlight the advantages and disadvantages of each methodology and discuss the most promising developments in their path from bench to bedside.
Collapse
Affiliation(s)
- Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
- ICREA, Pg Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
20
|
Moorthy K, Jaber AN, Ismail MA, Ernawan F, Mohamad MS, Deris S. Missing-Values Imputation Algorithms for Microarray Gene Expression Data. Methods Mol Biol 2019; 1986:255-266. [PMID: 31115893 DOI: 10.1007/978-1-4939-9442-7_12] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In gene expression studies, missing values are a common problem with important consequences for the interpretation of the final data (Satija et al., Nat Biotechnol 33(5):495, 2015). Numerous bioinformatics examination tools are used for cancer prediction, including the data set matrix (Bailey et al., Cell 173(2):371-385, 2018); thus, it is necessary to resolve the problem of missing-values imputation. This chapter presents a review of the research on missing-values imputation approaches for gene expression data. By using local and global correlation of the data, we were able to focus mostly on the differences between the algorithms. We classified the algorithms as global, hybrid, local, or knowledge-based techniques. Additionally, this chapter presents suitable assessments of the different approaches. The purpose of this review is to focus on developments in the current techniques for scientists rather than applying different or newly developed algorithms with identical functional goals. The aim was to adapt the algorithms to the characteristics of the data.
Collapse
Affiliation(s)
- Kohbalan Moorthy
- Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Kuantan, Pahang, Malaysia.
| | - Aws Naser Jaber
- Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Kuantan, Pahang, Malaysia
| | - Mohd Arfian Ismail
- Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Kuantan, Pahang, Malaysia
| | - Ferda Ernawan
- Faculty of Computer Systems & Software Engineering, Universiti Malaysia Pahang, Kuantan, Pahang, Malaysia
| | - Mohd Saberi Mohamad
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
| | - Safaai Deris
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
| |
Collapse
|
21
|
Branciamore S, Gogoshin G, Di Giulio M, Rodin AS. Intrinsic Properties of tRNA Molecules as Deciphered via Bayesian Network and Distribution Divergence Analysis. Life (Basel) 2018; 8:life8010005. [PMID: 29419741 PMCID: PMC5871937 DOI: 10.3390/life8010005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Revised: 01/22/2018] [Accepted: 01/23/2018] [Indexed: 12/27/2022] Open
Abstract
The identity/recognition of tRNAs, in the context of aminoacyl tRNA synthetases (and other molecules), is a complex phenomenon that has major implications ranging from the origins and evolution of translation machinery and genetic code to the evolution and speciation of tRNAs themselves to human mitochondrial diseases to artificial genetic code engineering. Deciphering it via laboratory experiments, however, is difficult and necessarily time- and resource-consuming. In this study, we propose a mathematically rigorous two-pronged in silico approach to identifying and classifying tRNA positions important for tRNA identity/recognition, rooted in machine learning and information-theoretic methodology. We apply Bayesian Network modeling to elucidate the structure of intra-tRNA-molecule relationships, and distribution divergence analysis to identify meaningful inter-molecule differences between various tRNA subclasses. We illustrate the complementary application of these two approaches using tRNA examples across the three domains of life, and identify and discuss important (informative) positions therein. In summary, we deliver to the tRNA research community a novel, comprehensive methodology for identifying the specific elements of interest in various tRNA molecules, which can be followed up by the corresponding experimental work and/or high-resolution position-specific statistical analyses.
Collapse
Affiliation(s)
- Sergio Branciamore
- Department of Diabetes Complications and Metabolism, Diabetes and Metabolism Research Institute, City of Hope, Duarte, 91010 CA, USA.
| | - Grigoriy Gogoshin
- Department of Diabetes Complications and Metabolism, Diabetes and Metabolism Research Institute, City of Hope, Duarte, 91010 CA, USA.
| | - Massimo Di Giulio
- Early Evolution of Life Laboratory, Institute of Biosciences and Bioresources, CNR, 80131 Naples, Italy.
| | - Andrei S Rodin
- Department of Diabetes Complications and Metabolism, Diabetes and Metabolism Research Institute, City of Hope, Duarte, 91010 CA, USA.
| |
Collapse
|
22
|
Analysis of high-resolution 3D intrachromosomal interactions aided by Bayesian network modeling. Proc Natl Acad Sci U S A 2017; 114:E10359-E10368. [PMID: 29133398 PMCID: PMC5715735 DOI: 10.1073/pnas.1620425114] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Long-range intrachromosomal interactions play an important role in 3D chromosome structure and function, but our understanding of how various factors contribute to the strength of these interactions remains poor. In this study we used a recently developed analysis framework for Bayesian network (BN) modeling to analyze publicly available datasets for intrachromosomal interactions. We investigated how 106 variables affect the pairwise interactions of over 10 million 5-kb DNA segments in the B-lymphocyte cell line GB12878. Strictly data-driven BN modeling indicates that the strength of intrachromosomal interactions (hic_strength) is directly influenced by only four types of factors: distance between segments, Rad21 or SMC3 (cohesin components),transcription at transcription start sites (TSS), and the number of CCCTC-binding factor (CTCF)-cohesin complexes between the interacting DNA segments. Subsequent studies confirmed that most high-intensity interactions have a CTCF-cohesin complex in at least one of the interacting segments. However, 46% have CTCF on only one side, and 32% are without CTCF. As expected, high-intensity interactions are strongly dependent on the orientation of the ctcf motif, and, moreover, we find that the interaction between enhancers and promoters is similarly dependent on ctcf motif orientation. Dependency relationships between transcription factors were also revealed, including known lineage-determining B-cell transcription factors (e.g., Ebf1) as well as potential novel relationships. Thus, BN analysis of large intrachromosomal interaction datasets is a useful tool for gaining insight into DNA-DNA, protein-DNA, and protein-protein interactions.
Collapse
|
23
|
Jinawath N, Bunbanjerdsuk S, Chayanupatkul M, Ngamphaiboon N, Asavapanumas N, Svasti J, Charoensawan V. Bridging the gap between clinicians and systems biologists: from network biology to translational biomedical research. J Transl Med 2016; 14:324. [PMID: 27876057 PMCID: PMC5120462 DOI: 10.1186/s12967-016-1078-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 11/08/2016] [Indexed: 01/22/2023] Open
Abstract
With the wealth of data accumulated from completely sequenced genomes and other high-throughput experiments, global studies of biological systems, by simultaneously investigating multiple biological entities (e.g. genes, transcripts, proteins), has become a routine. Network representation is frequently used to capture the presence of these molecules as well as their relationship. Network biology has been widely used in molecular biology and genetics, where several network properties have been shown to be functionally important. Here, we discuss how such methodology can be useful to translational biomedical research, where scientists traditionally focus on one or a small set of genes, diseases, and drug candidates at any one time. We first give an overview of network representation frequently used in biology: what nodes and edges represent, and review its application in preclinical research to date. Using cancer as an example, we review how network biology can facilitate system-wide approaches to identify targeted small molecule inhibitors. These types of inhibitors have the potential to be more specific, resulting in high efficacy treatments with less side effects, compared to the conventional treatments such as chemotherapy. Global analysis may provide better insight into the overall picture of human diseases, as well as identify previously overlooked problems, leading to rapid advances in medicine. From the clinicians’ point of view, it is necessary to bridge the gap between theoretical network biology and practical biomedical research, in order to improve the diagnosis, prevention, and treatment of the world’s major diseases.
Collapse
Affiliation(s)
- Natini Jinawath
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, Thailand.,Program in Translational Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Sacarin Bunbanjerdsuk
- Program in Translational Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Maneerat Chayanupatkul
- Department of Physiology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.,Division of Gastroenterology and Hepatology, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Nuttapong Ngamphaiboon
- Medical Oncology Unit, Department of Medicine Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Nithi Asavapanumas
- Department of Physiology, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Jisnuson Svasti
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, Thailand.,Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, Thailand.,Laboratory of Biochemistry, Chulabhorn Research Institute, Bangkok, Thailand
| | - Varodom Charoensawan
- Integrative Computational BioScience (ICBS) Center, Mahidol University, Nakhon Pathom, Thailand. .,Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, Thailand. .,Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, Thailand.
| |
Collapse
|