1
|
Gandhi N, Wills L, Akers K, Su Y, Niccum P, Murali TM, Rajagopalan P. Comparative transcriptomic and phenotypic analysis of induced pluripotent stem cell hepatocyte-like cells and primary human hepatocytes. Cell Tissue Res 2024; 396:119-139. [PMID: 38369646 DOI: 10.1007/s00441-024-03868-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 01/22/2024] [Indexed: 02/20/2024]
Abstract
Primary human hepatocytes (PHHs) are used extensively for in vitro liver cultures to study hepatic functions. However, limited availability and invasive retrieval prevent their widespread use. Induced pluripotent stem cells exhibit significant potential since they can be obtained non-invasively and differentiated into hepatic lineages, such as hepatocyte-like cells (iHLCs). However, there are concerns about their fetal phenotypic characteristics and their hepatic functions compared to PHHs in culture. Therefore, we performed an RNA-sequencing (RNA-seq) analysis to understand pathways that are either up- or downregulated in each cell type. Analysis of the RNA-seq data showed an upregulation in the bile secretion pathway where genes such as AQP9 and UGT1A1 were higher expressed in PHHs compared to iHLCs by 455- and 15-fold, respectively. Upon immunostaining, bile canaliculi were shown to be present in PHHs. The TCA cycle in PHHs was upregulated compared to iHLCs. Cellular analysis showed a 2-2.5-fold increase in normalized urea production in PHHs compared to iHLCs. In addition, drug metabolism pathways, including cytochrome P450 (CYP450) and UDP-glucuronosyltransferase enzymes, were upregulated in PHHs compared to iHLCs. Of note, CYP2E1 gene expression was significantly higher (21,810-fold) in PHHs. Acetaminophen and ethanol were administered to PHH and iHLC cultures to investigate differences in biotransformation. CYP450 activity of baseline and toxicant-treated samples was significantly higher in PHHs compared to iHLCs. Our analysis revealed that iHLCs have substantial differences from PHHs in critical hepatic functions. These results have highlighted the differences in gene expression and hepatic functions between PHHs and iHLCs to motivate future investigation.
Collapse
Affiliation(s)
- Neeti Gandhi
- Department of Chemical Engineering, Virginia Tech, 333 Kelly Hall, Blacksburg, VA, 24061, USA
| | - Lauren Wills
- School of Biomedical Engineering and Sciences, Virginia Tech, Blacksburg, USA
| | - Kyle Akers
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Yiqi Su
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Parker Niccum
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Tech, 333 Kelly Hall, Blacksburg, VA, 24061, USA.
- School of Biomedical Engineering and Sciences, Virginia Tech, Blacksburg, USA.
| |
Collapse
|
2
|
Antony B, Blau H, Casiraghi E, Loomba JJ, Callahan TJ, Laraway BJ, Wilkins KJ, Antonescu CC, Valentini G, Williams AE, Robinson PN, Reese JT, Murali TM. Predictive models of long COVID. EBioMedicine 2023; 96:104777. [PMID: 37672869 PMCID: PMC10494314 DOI: 10.1016/j.ebiom.2023.104777] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/24/2023] [Accepted: 08/15/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. METHODS We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). FINDINGS LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. INTERPRETATION ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. FUNDING NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.
Collapse
Affiliation(s)
- Blessy Antony
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, 24061, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; ELLIS - European Laboratory for Learning and Intelligent Systems, Milan Unit, Milan, 20133, Italy
| | - Johanna J Loomba
- Integrated Translational Health Research Institute of Virginia, University of Virginia, Charlottesville, VA, 22904, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Bryan J Laraway
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Kenneth J Wilkins
- Biostatistics Program, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20814, USA
| | | | - Giorgio Valentini
- AnacletoLab, Computer Science Department, Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy; ELLIS - European Laboratory for Learning and Intelligent Systems, Milan Unit, Milan, 20133, Italy
| | - Andrew E Williams
- Institute for Clinical Research and Health Policy Studies, Tufts University School of Medicine, Boston, MA, 02111, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, 06269, USA
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - T M Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, 24061, USA.
| |
Collapse
|
3
|
Law J, Orbach SM, Weston BR, Steele PA, Rajagopalan P, Murali TM. Computational Construction of Toxicant Signaling Networks. Chem Res Toxicol 2023; 36:1267-1277. [PMID: 37471124 PMCID: PMC10445288 DOI: 10.1021/acs.chemrestox.2c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Indexed: 07/21/2023]
Abstract
Humans and animals are regularly exposed to compounds that may have adverse effects on health. The Toxicity Forecaster (ToxCast) program was developed to use high throughput screening assays to quickly screen chemicals by measuring their effects on many biological end points. Many of these assays test for effects on cellular receptors and transcription factors (TFs), under the assumption that a toxicant may perturb normal signaling pathways in the cell. We hypothesized that we could reconstruct the intermediate proteins in these pathways that may be directly or indirectly affected by the toxicant, potentially revealing important physiological processes not yet tested for many chemicals. We integrate data from ToxCast with a human protein interactome to build toxicant signaling networks that contain physical and signaling protein interactions that may be affected as a result of toxicant exposure. To build these networks, we developed the EdgeLinker algorithm, which efficiently finds short paths in the interactome that connect the receptors to TFs for each toxicant. We performed multiple evaluations and found evidence suggesting that these signaling networks capture biologically relevant effects of toxicants. To aid in dissemination and interpretation, interactive visualizations of these networks are available at http://graphspace.org.
Collapse
Affiliation(s)
- Jeffrey
N. Law
- Interdisciplinary
Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Blacksburg, Virginia 24061, United States
| | - Sophia M. Orbach
- Department
of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Bronson R. Weston
- Interdisciplinary
Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Blacksburg, Virginia 24061, United States
| | - Peter A. Steele
- Department
of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - Padmavathy Rajagopalan
- Department
of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States
| | - T. M. Murali
- Department
of Computer Science, Virginia Tech, Blacksburg, Virginia 24061, United States
| |
Collapse
|
4
|
Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Caufield JH, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. EBioMedicine 2023; 87:104413. [PMID: 36563487 PMCID: PMC9769411 DOI: 10.1016/j.ebiom.2022.104413] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/23/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
Collapse
Affiliation(s)
- Justin T Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Elena Casiraghi
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | | | - Johanna J Loomba
- The Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Bryan Laraway
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Michael Gargano
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Kenneth J Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | - Nariman Ammar
- Health Science Center, University of Tennessee, Memphis, TN, USA
| | - Blessy Antony
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - J Harry Caufield
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Julie A McMurry
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Andrew Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA; Tufts University School of Medicine, Institute for Clinical Research and Health Policy Studies, Boston, MA, USA; Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
| | - Richard Moffitt
- Department of Biomedical Informatics and Stony Brook Cancer Center, Stony Brook University, Stony Brook, NY, USA
| | | | | | | | - Kristin Kostka
- Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | | | - Christopher G Chute
- Schools of Medicine, Public Health and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA.
| |
Collapse
|
5
|
Li YC, Wang L, Law JN, Murali TM, Pandey G. Integrating multimodal data through interpretable heterogeneous ensembles. Bioinform Adv 2022; 2:vbac065. [PMID: 36158455 PMCID: PMC9495448 DOI: 10.1093/bioadv/vbac065] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 09/01/2022] [Accepted: 09/10/2022] [Indexed: 01/27/2023]
Abstract
Motivation Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability and implementation Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yan Chak Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Linhua Wang
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jeffrey N Law
- Biosciences Center, National Renewable Energy Laboratory, Golden, CO 80401, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
6
|
Li YC, Wang L, Law JN, Murali TM, Pandey G. Integrating multimodal data through interpretable heterogeneous ensembles. bioRxiv 2022:2020.05.29.123497. [PMID: 35923321 PMCID: PMC9347276 DOI: 10.1101/2020.05.29.123497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration . Contact gaurav.pandey@mssm.edu.
Collapse
Affiliation(s)
- Yan Chak Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Linhua Wang
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, Texas, USA
| | - Jeffrey N. Law
- National Renewable Energy Laboratory, Golden, Colorado, USA
| | - T. M. Murali
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
7
|
Reese JT, Blau H, Bergquist T, Loomba JJ, Callahan T, Laraway B, Antonescu C, Casiraghi E, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. medRxiv 2022:2022.05.24.22275398. [PMID: 35665012 PMCID: PMC9164456 DOI: 10.1101/2022.05.24.22275398] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Abstract
Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.
Collapse
|
8
|
Law JN, Akers K, Tasnina N, Santina CMD, Deutsch S, Kshirsagar M, Klein-Seetharaman J, Crovella M, Rajagopalan P, Kasif S, Murali TM. Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2. Gigascience 2021; 10:giab082. [PMID: 34966926 PMCID: PMC8716363 DOI: 10.1093/gigascience/giab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/21/2021] [Accepted: 11/28/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
Collapse
Affiliation(s)
- Jeffrey N Law
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Kyle Akers
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Nure Tasnina
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | | - Shay Deutsch
- Department of Mathematics, University of California, Los Angeles, CA 90095, USA
| | | | | | - Mark Crovella
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| | | | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
9
|
Abstract
Adaptive modulation of the global cellular growth state of unicellular organisms is crucial for their survival in fluctuating nutrient environments. Because these organisms must be able to respond reliably to ever varying and unpredictable nutritional conditions, their nutrient signaling networks must have a certain inbuilt robustness. In eukaryotes, such as the budding yeast Saccharomyces cerevisiae, distinct nutrient signals are relayed by specific plasma membrane receptors to signal transduction pathways that are interconnected in complex information-processing networks, which have been well characterized. However, the complexity of the signaling network confounds the interpretation of the overall regulatory "logic" of the control system. Here, we propose a literature-curated molecular mechanism of the integrated nutrient signaling network in budding yeast, focusing on early temporal responses to carbon and nitrogen signaling. We build a computational model of this network to reconcile literature-curated quantitative experimental data with our proposed molecular mechanism. We evaluate the robustness of our estimates of the model's kinetic parameter values. We test the model by comparing predictions made in mutant strains with qualitative experimental observations made in the same strains. Finally, we use the model to predict nutrient-responsive transcription factor activities in a number of mutant strains undergoing complex nutrient shifts.
Collapse
Affiliation(s)
- Amogh P Jalihal
- Genetics, Bioinformatics, and Computational Biology PhD Program
| | - Pavel Kraikivski
- Division of Systems Biology, Academy of Integrated Science, Virginia Tech, Blacksburg, VA 24061
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061
| | - John J Tyson
- Division of Systems Biology, Academy of Integrated Science, Virginia Tech, Blacksburg, VA 24061.,Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061
| |
Collapse
|
10
|
Law JN, Kale SD, Murali TM. Accurate and efficient gene function prediction using a multi-bacterial network. Bioinformatics 2021; 37:800-806. [PMID: 33063084 DOI: 10.1093/bioinformatics/btaa885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 09/23/2020] [Accepted: 09/30/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. RESULTS We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. AVAILABILITY AND IMPLEMENTATION An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeffrey N Law
- Genetics, Bioinformatics and Computational Biology Ph.D. Program, Blacksburg, VA 24061, USA
| | - Shiv D Kale
- Fralin Life Sciences Institute, Blacksburg, VA 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
11
|
Mandoiu I, Murali TM, Narasimhan G, Rajasekaran S, Skums P, Zelikovsky A. Special Issue: 9th International Computational Advances in Bio and Medical Sciences (ICCABS 2019). J Comput Biol 2021; 28:115-116. [PMID: 33539275 DOI: 10.1089/cmb.2021.29034.im] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Ion Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, Connecticut, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA
| | - Giri Narasimhan
- School of Computing & Information Sciences, Florida International University, Miami, Florida, USA
| | - Sanguthevar Rajasekaran
- Computer Science & Engineering Department, University of Connecticut, Storrs, Connecticut, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | | |
Collapse
|
12
|
Kshirsagar M, Tasnina N, Ward MD, Law JN, Murali TM, Lavista Ferres JM, Bowman GR, Klein-Seetharaman J. Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 -human interactome. Pac Symp Biocomput 2021; 26:154-165. [PMID: 33691013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a's interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.
Collapse
|
13
|
Abstract
MOTIVATION High-quality curation of the proteins and interactions in signaling pathways is slow and painstaking. As a result, many experimentally detected interactions are not annotated to any pathways. A natural question that arises is whether or not it is possible to automatically leverage existing pathway annotations to identify new interactions for inclusion in a given pathway. RESULTS We present RegLinker, an algorithm that achieves this purpose by computing multiple short paths from pathway receptors to transcription factors within a background interaction network. The key idea underlying RegLinker is the use of regular language constraints to control the number of non-pathway interactions that are present in the computed paths. We systematically evaluate RegLinker and five alternative approaches against a comprehensive set of 15 signaling pathways and demonstrate that RegLinker recovers withheld pathway proteins and interactions with the best precision and recall. We used RegLinker to propose new extensions to the pathways. We discuss the literature that supports the inclusion of these proteins in the pathways. These results show the broad potential of automated analysis to attenuate difficulties of traditional manual inquiry. AVAILABILITY AND IMPLEMENTATION https://github.com/Murali-group/RegLinker. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| |
Collapse
|
14
|
Gallegos JE, Adames NR, Rogers MF, Kraikivski P, Ibele A, Nurzynski-Loth K, Kudlow E, Murali TM, Tyson JJ, Peccoud J. Genetic interactions derived from high-throughput phenotyping of 6589 yeast cell cycle mutants. NPJ Syst Biol Appl 2020; 6:11. [PMID: 32376972 PMCID: PMC7203125 DOI: 10.1038/s41540-020-0134-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 04/06/2020] [Indexed: 11/09/2022] Open
Abstract
Over the last 30 years, computational biologists have developed increasingly realistic mathematical models of the regulatory networks controlling the division of eukaryotic cells. These models capture data resulting from two complementary experimental approaches: low-throughput experiments aimed at extensively characterizing the functions of small numbers of genes, and large-scale genetic interaction screens that provide a systems-level perspective on the cell division process. The former is insufficient to capture the interconnectivity of the genetic control network, while the latter is fraught with irreproducibility issues. Here, we describe a hybrid approach in which the 630 genetic interactions between 36 cell-cycle genes are quantitatively estimated by high-throughput phenotyping with an unprecedented number of biological replicates. Using this approach, we identify a subset of high-confidence genetic interactions, which we use to refine a previously published mathematical model of the cell cycle. We also present a quantitative dataset of the growth rate of these mutants under six different media conditions in order to inform future cell cycle models.
Collapse
Affiliation(s)
- Jenna E Gallegos
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - Neil R Adames
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA.,New Culture, Inc., San Francisco, CA, USA
| | | | - Pavel Kraikivski
- Virginia Tech, Academy of Integrated Sciences, Blacksburg, VA, USA
| | - Aubrey Ibele
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - Kevin Nurzynski-Loth
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - Eric Kudlow
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA
| | - T M Murali
- Virginia Tech, Computer Science, Blacksburg, VA, USA
| | - John J Tyson
- Virginia Tech, Biological Sciences, Blacksburg, VA, USA
| | - Jean Peccoud
- Colorado State University, Chemical and Biological Engineering, Fort Collins, CO, USA. .,GenoFAB, Inc., Fort Collins, CO, USA.
| |
Collapse
|
15
|
Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods 2020; 17:147-154. [PMID: 31907445 PMCID: PMC7098173 DOI: 10.1038/s41592-019-0690-6] [Citation(s) in RCA: 285] [Impact Index Per Article: 71.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 11/22/2019] [Indexed: 01/10/2023]
Abstract
We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Amogh P Jalihal
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Jeffrey N Law
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA, USA
| | - Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
16
|
Franzese N, Groce A, Murali TM, Ritz A. Hypergraph-based connectivity measures for signaling pathway topologies. PLoS Comput Biol 2019; 15:e1007384. [PMID: 31652258 PMCID: PMC6834280 DOI: 10.1371/journal.pcbi.1007384] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Revised: 11/06/2019] [Accepted: 09/09/2019] [Indexed: 12/12/2022] Open
Abstract
Characterizing cellular responses to different extrinsic signals is an active area of research, and curated pathway databases describe these complex signaling reactions. Here, we revisit a fundamental question in signaling pathway analysis: are two molecules “connected” in a network? This question is the first step towards understanding the potential influence of molecules in a pathway, and the answer depends on the choice of modeling framework. We examined the connectivity of Reactome signaling pathways using four different pathway representations. We find that Reactome is very well connected as a graph, moderately well connected as a compound graph or bipartite graph, and poorly connected as a hypergraph (which captures many-to-many relationships in reaction networks). We present a novel relaxation of hypergraph connectivity that iteratively increases connectivity from a node while preserving the hypergraph topology. This measure, B-relaxation distance, provides a parameterized transition between hypergraph connectivity and graph connectivity. B-relaxation distance is sensitive to the presence of small molecules that participate in many functionally unrelated reactions in the network. We also define a score that quantifies one pathway’s downstream influence on another, which can be calculated as B-relaxation distance gradually relaxes the connectivity constraint in hypergraphs. Computing this score across all pairs of 34 Reactome pathways reveals pairs of pathways with statistically significant influence. We present two such case studies, and we describe the specific reactions that contribute to the large influence score. Finally, we investigate the ability for connectivity measures to capture functional relationships among proteins, and use the evidence channels in the STRING database as a benchmark dataset. STRING interactions whose proteins are B-connected in Reactome have statistically significantly higher scores than interactions connected in the bipartite graph representation. Our method lays the groundwork for other generalizations of graph-theoretic concepts to hypergraphs in order to facilitate signaling pathway analysis. Signaling pathways describe how cells respond to external signals through molecular interactions. As we gain a deeper understanding of these signaling reactions, it is important to understand how molecules may influence downstream responses and how pathways may affect each other. As the amount of information in signaling pathway databases continues to grow, we have the opportunity to analyze properties about pathway structure. We pose an intuitive question about signaling pathways: when are two molecules “connected” in a pathway? This answer varies dramatically based on the assumptions we make about how reactions link molecules. Here, examine four approaches for modeling the structural topology of signaling pathways, and present methods to quantify whether two molecules are “connected” in a pathway database. We find that existing approaches are either too permissive (molecules are connected to many others) or restrictive (molecules are connected to a handful of others), and we present a new measure that offers a continuum between these two extremes. We then expand our question to ask when an entire signaling pathway is “downstream” of another pathway, and show two case studies from the Reactome pathway database that uncovers pathway influence. Finally, we show that the strict notion of connectivity can capture functional relationships among proteins using an independent benchmark dataset. Our approach to quantify connectivity in pathways considers a biologically-motivated definition of connectivity, laying the foundation for more sophisticated analyses that leverage the detailed information in pathway databases.
Collapse
Affiliation(s)
- Nicholas Franzese
- Biology Department, Reed College, Portland, Oregon, United States of America
- Computer Science Department, Reed College, Portland, Oregon, United States of America
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Adam Groce
- Computer Science Department, Reed College, Portland, Oregon, United States of America
| | - T. M. Murali
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Anna Ritz
- Biology Department, Reed College, Portland, Oregon, United States of America
- * E-mail:
| |
Collapse
|
17
|
Pratapa A, Adames N, Kraikivski P, Franzese N, Tyson JJ, Peccoud J, Murali TM. CrossPlan: systematic planning of genetic crosses to validate mathematical models. Bioinformatics 2019; 34:2237-2244. [PMID: 29432533 DOI: 10.1093/bioinformatics/bty072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/07/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aditya Pratapa
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Neil Adames
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - Pavel Kraikivski
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | | | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg, USA
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| |
Collapse
|
18
|
Abstract
Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred ( https://github.com/GauravPandeyLab/LargeGOPred).
Collapse
Affiliation(s)
- Linhua Wang
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Jeffrey Law
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Shiv D Kale
- Biocomplexity Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| |
Collapse
|
19
|
Tegge AN, Rodrigues RR, Larkin AL, Vu L, Murali TM, Rajagopalan P. Transcriptomic Analysis of Hepatic Cells in Multicellular Organotypic Liver Models. Sci Rep 2018; 8:11306. [PMID: 30054499 PMCID: PMC6063915 DOI: 10.1038/s41598-018-29455-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2017] [Accepted: 07/11/2018] [Indexed: 02/08/2023] Open
Abstract
Liver homeostasis requires the presence of both parenchymal and non-parenchymal cells (NPCs). However, systems biology studies of the liver have primarily focused on hepatocytes. Using an organotypic three-dimensional (3D) hepatic culture, we report the first transcriptomic study of liver sinusoidal endothelial cells (LSECs) and Kupffer cells (KCs) cultured with hepatocytes. Through computational pathway and interaction network analyses, we demonstrate that hepatocytes, LSECs and KCs have distinct expression profiles and functional characteristics. Our results show that LSECs in the presence of KCs exhibit decreased expression of focal adhesion kinase (FAK) signaling, a pathway linked to LSEC dedifferentiation. We report the novel result that peroxisome proliferator-activated receptor alpha (PPARα) is transcribed in LSECs. The expression of downstream processes corroborates active PPARα signaling in LSECs. We uncover transcriptional evidence in LSECs for a feedback mechanism between PPARα and farnesoid X-activated receptor (FXR) that maintains bile acid homeostasis; previously, this feedback was known occur only in HepG2 cells. We demonstrate that KCs in 3D liver models display expression patterns consistent with an anti-inflammatory phenotype when compared to monocultures. These results highlight the distinct roles of LSECs and KCs in maintaining liver function and emphasize the need for additional mechanistic studies of NPCs in addition to hepatocytes in liver-mimetic microenvironments.
Collapse
Affiliation(s)
- Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, USA
- Department of Statistics, Virginia Tech, Blacksburg, USA
| | - Richard R Rodrigues
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, USA
| | - Adam L Larkin
- Department of Chemical Engineering, Virginia Tech, Blacksburg, USA
| | - Lucas Vu
- Department of Chemical Engineering, Virginia Tech, Blacksburg, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA.
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, USA.
| | - Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Tech, Blacksburg, USA.
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, USA.
- Virginia Tech-Wake Forest School of Biomedical Engineering and Sciences, Virginia Tech, Blacksburg, USA.
| |
Collapse
|
20
|
Bharadwaj A, Singh DP, Ritz A, Tegge AN, Poirel CL, Kraikivski P, Adames N, Luther K, Kale SD, Peccoud J, Tyson JJ, Murali TM. GraphSpace: stimulating interdisciplinary collaborations in network biology. Bioinformatics 2018; 33:3134-3136. [PMID: 28957495 DOI: 10.1093/bioinformatics/btx382] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2016] [Accepted: 06/09/2017] [Indexed: 01/23/2023] Open
Abstract
Summary Networks have become ubiquitous in systems biology. Visualization is a crucial component in their analysis. However, collaborations within research teams in network biology are hampered by software systems that are either specific to a computational algorithm, create visualizations that are not biologically meaningful, or have limited features for sharing networks and visualizations. We present GraphSpace, a web-based platform that fosters team science by allowing collaborating research groups to easily store, interact with, layout and share networks. Availability and implementation Anyone can upload and share networks at http://graphspace.org. In addition, the GraphSpace code is available at http://github.com/Murali-group/graphspace if a user wants to run his or her own server. Contact murali@cs.vt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Divit P Singh
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Anna Ritz
- Biology Department, Reed College, Portland, OR 97202, USA
| | - Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.,Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA
| | | | - Pavel Kraikivski
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Neil Adames
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523, USA
| | - Kurt Luther
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.,Center for Human-Computer Interaction, Virginia Tech, Blacksburg, VA 24061, USA
| | | | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80523, USA
| | - John J Tyson
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.,ICTAS Centre for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
21
|
Abstract
PathLinker is a graph-theoretic algorithm originally developed to reconstruct the interactions in a signaling pathway of interest. It efficiently computes multiple short paths within a background protein interaction network from the receptors to transcription factors (TFs) in a pathway. Since December 2015, PathLinker has been available as an app for Cytoscape. This paper describes how we automated the app to use the CyRest infrastructure and how users can incorporate PathLinker into their software pipelines.
Collapse
Affiliation(s)
- Li Jun Huang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Jeffrey N Law
- Genetics, Bioinformatics, and Computational Biology Ph.D. program, Virginia Tech, Blacksburg, VA, 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.,ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA, 24061, USA
| |
Collapse
|
22
|
Abstract
Signaling pathways play an important role in the cell's response to its environment. Signaling pathways are often represented as directed graphs, which are not adequate for modeling reactions such as complex assembly and dissociation, combinatorial regulation, and protein activation/inactivation. More accurate representations such as directed hypergraphs remain underutilized. In this paper, we present an extension of a directed hypergraph that we call a signaling hypergraph. We formulate a problem that asks what proteins and interactions must be involved in order to stimulate a specific response downstream of a signaling pathway. We relate this problem to computing the shortest acyclic B-hyperpath in a signaling hypergraph-an NP-hard problem-and present a mixed integer linear program to solve it. We demonstrate that the shortest hyperpaths computed in signaling hypergraphs are far more informative than shortest paths, Steiner trees, and subnetworks containing many short paths found in corresponding graph representations. Our results illustrate the potential of signaling hypergraphs as an improved representation of signaling pathways and motivate the development of novel hypergraph algorithms.
Collapse
|
23
|
Abstract
PathLinker is a graph-theoretic algorithm for reconstructing the interactions in a signaling pathway of interest. It efficiently computes multiple short paths within a background protein interaction network from the receptors to transcription factors (TFs) in a pathway. We originally developed PathLinker to complement manual curation of signaling pathways, which is slow and painstaking. The method can be used in general to connect any set of sources to any set of targets in an interaction network. The app presented here makes the PathLinker functionality available to Cytoscape users. We present an example where we used PathLinker to compute and analyze the network of interactions connecting proteins that are perturbed by the drug lovastatin.
Collapse
Affiliation(s)
- Daniel P Gil
- Department of Computer Science, Virginia Tech, Blacksburg, USA
| | - Jeffrey N Law
- Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, USA.,ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, USA
| |
Collapse
|
24
|
Sam SA, Teel J, Tegge AN, Bharadwaj A, Murali TM. XTalkDB: a database of signaling pathway crosstalk. Nucleic Acids Res 2016; 45:D432-D439. [PMID: 27899583 PMCID: PMC5210533 DOI: 10.1093/nar/gkw1037] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 09/28/2016] [Accepted: 10/20/2016] [Indexed: 01/01/2023] Open
Abstract
Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTalkDB (http://www.xtalkdb.org) to fill this very important gap. XTalkDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTalkDB website provides an easy-to-use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest.
Collapse
Affiliation(s)
- Sarah A Sam
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA.,School of Neuroscience, Virginia Tech, Blacksburg, VA 24061, USA
| | - Joelle Teel
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.,Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA
| | - Aditya Bharadwaj
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA .,ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
25
|
Tegge AN, Sharp N, Murali TM. Xtalk: a path-based approach for identifying crosstalk between signaling pathways. Bioinformatics 2015; 32:242-51. [PMID: 26400040 DOI: 10.1093/bioinformatics/btv549] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Accepted: 09/04/2015] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. RESULTS We present Xtalk, a path-based approach for identifying pairs of pathways that may crosstalk. Xtalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, Xtalk reports the precise interactions and mechanisms that support the identified crosstalk. We applied Xtalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which Xtalk achieved an area under the receiver operator characteristic curve of 0.65, a 12% improvement over the closest competing approach. The area under the receiver operator characteristic curve varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID). We provide examples of networks computed by Xtalk that accurately recovered known mechanisms of crosstalk. AVAILABILITY AND IMPLEMENTATION The XTALK software is available at http://bioinformatics.cs.vt.edu/~murali/software. Crosstalk networks are available at http://graphspace.org/graphs?tags=2015-bioinformatics-xtalk. CONTACT ategge@vt.edu, murali@cs.vt.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Allison N Tegge
- Department of Computer Science, Department of Statistics and
| | | | - T M Murali
- Department of Computer Science, ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
26
|
Adames NR, Schuck PL, Chen KC, Murali TM, Tyson JJ, Peccoud J. Experimental testing of a new integrated model of the budding yeast Start transition. Mol Biol Cell 2015; 26:3966-84. [PMID: 26310445 PMCID: PMC4710230 DOI: 10.1091/mbc.e15-06-0358] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 08/19/2015] [Indexed: 01/29/2023] Open
Abstract
Mathematical modeling of the cell cycle has unveiled recurrent features and emergent behaviors of cellular networks. Constructing new mutants and performing experimental tests during development of a new model of the budding yeast cell cycle yields a more efficient modeling process and results in several testable hypotheses. The cell cycle is composed of bistable molecular switches that govern the transitions between gap phases (G1 and G2) and the phases in which DNA is replicated (S) and partitioned between daughter cells (M). Many molecular details of the budding yeast G1–S transition (Start) have been elucidated in recent years, especially with regard to its switch-like behavior due to positive feedback mechanisms. These results led us to reevaluate and expand a previous mathematical model of the yeast cell cycle. The new model incorporates Whi3 inhibition of Cln3 activity, Whi5 inhibition of SBF and MBF transcription factors, and feedback inhibition of Whi5 by G1–S cyclins. We tested the accuracy of the model by simulating various mutants not described in the literature. We then constructed these novel mutant strains and compared their observed phenotypes to the model’s simulations. The experimental results reported here led to further changes of the model, which will be fully described in a later article. Our study demonstrates the advantages of combining model design, simulation, and testing in a coordinated effort to better understand a complex biological network.
Collapse
Affiliation(s)
- Neil R Adames
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061
| | - P Logan Schuck
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061
| | - Katherine C Chen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061
| | - John J Tyson
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061
| | - Jean Peccoud
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA 24061
| |
Collapse
|
27
|
Abstract
Signaling pathways function as the information-passing mechanisms of cells. A number of databases with extensive manual curation represent the current knowledge base for signaling pathways. These databases motivate the development of computational approaches for prediction and analysis. Such methods require an accurate and computable representation of signaling pathways. Pathways are often described as sets of proteins or as pairwise interactions between proteins. However, many signaling mechanisms cannot be described using these representations. In this opinion, we highlight a representation of signaling pathways that is underutilized: the hypergraph. We demonstrate the usefulness of hypergraphs in this context and discuss challenges and opportunities for the scientific community.
Collapse
Affiliation(s)
- Anna Ritz
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Hyunju Kim
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA; ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
28
|
Poirel CL, Rodrigues RR, Chen KC, Tyson JJ, Murali TM. Top-down network analysis to drive bottom-up modeling of physiological processes. J Comput Biol 2013; 20:409-18. [PMID: 23641868 DOI: 10.1089/cmb.2012.0274] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models that can be simulated computationally to suggest wet lab experiments. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present LINKER, an efficient and automated data-driven method that can analyze molecular interactomes to propose extensions to models that can be simulated. LINKER combines teleporting random walks and k-shortest path computations to discover connections from a source protein to a set of proteins collectively involved in a particular cellular process. We evaluate the efficacy of LINKER by applying it to a well-known dynamic model of the cell division cycle in Saccharomyces cerevisiae. Compared to other state-of-the-art methods, subnetworks computed by LINKER are heavily enriched in Gene Ontology (GO) terms relevant to the cell cycle. Finally, we highlight how networks computed by LINKER elucidate the role of a protein kinase (Cdc5) in the mitotic exit network of a dynamic model of the cell cycle.
Collapse
|
29
|
Kidane YH, Lawrence C, Murali TM. Computational approaches for discovery of common immunomodulators in fungal infections: towards broad-spectrum immunotherapeutic interventions. BMC Microbiol 2013; 13:224. [PMID: 24099000 PMCID: PMC3853472 DOI: 10.1186/1471-2180-13-224] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 09/17/2013] [Indexed: 01/16/2023] Open
Abstract
Background Fungi are the second most abundant type of human pathogens. Invasive fungal pathogens are leading causes of life-threatening infections in clinical settings. Toxicity to the host and drug-resistance are two major deleterious issues associated with existing antifungal agents. Increasing a host’s tolerance and/or immunity to fungal pathogens has potential to alleviate these problems. A host’s tolerance may be improved by modulating the immune system such that it responds more rapidly and robustly in all facets, ranging from the recognition of pathogens to their clearance from the host. An understanding of biological processes and genes that are perturbed during attempted fungal exposure, colonization, and/or invasion will help guide the identification of endogenous immunomodulators and/or small molecules that activate host-immune responses such as specialized adjuvants. Results In this study, we present computational techniques and approaches using publicly available transcriptional data sets, to predict immunomodulators that may act against multiple fungal pathogens. Our study analyzed data sets derived from host cells exposed to five fungal pathogens, namely, Alternaria alternata, Aspergillus fumigatus, Candida albicans, Pneumocystis jirovecii, and Stachybotrys chartarum. We observed statistically significant associations between host responses to A. fumigatus and C. albicans. Our analysis identified biological processes that were consistently perturbed by these two pathogens. These processes contained both immune response-inducing genes such as MALT1, SERPINE1, ICAM1, and IL8, and immune response-repressing genes such as DUSP8, DUSP6, and SPRED2. We hypothesize that these genes belong to a pool of common immunomodulators that can potentially be activated or suppressed (agonized or antagonized) in order to render the host more tolerant to infections caused by A. fumigatus and C. albicans. Conclusions Our computational approaches and methodologies described here can now be applied to newly generated or expanded data sets for further elucidation of additional drug targets. Moreover, identified immunomodulators may be used to generate experimentally testable hypotheses that could help in the discovery of broad-spectrum immunotherapeutic interventions. All of our results are available at the following supplementary website: http://bioinformatics.cs.vt.edu/~murali/supplements/2013-kidane-bmc
Collapse
Affiliation(s)
- Yared H Kidane
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA.
| | | | | |
Collapse
|
30
|
Lasher CD, Rajagopalan P, Murali TM. Summarizing cellular responses as biological process networks. BMC Syst Biol 2013; 7:68. [PMID: 23895181 PMCID: PMC3751784 DOI: 10.1186/1752-0509-7-68] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Accepted: 06/26/2013] [Indexed: 12/02/2022]
Abstract
Background Microarray experiments can simultaneously identify thousands of genes that show significant perturbation in expression between two experimental conditions. Response networks, computed through the integration of gene interaction networks with expression perturbation data, may themselves contain tens of thousands of interactions. Gene set enrichment has become standard for summarizing the results of these analyses in terms functionally coherent collections of genes such as biological processes. However, even these methods can yield hundreds of enriched functions that may overlap considerably. Results We describe a new technique called Markov chain Monte Carlo Biological Process Networks (MCMC-BPN) capable of reporting a highly non-redundant set of links between processes that describe the molecular interactions that are perturbed under a specific biological context. Each link in the BPN represents the perturbed interactions that serve as the interfaces between the two processes connected by the link. We apply MCMC-BPN to publicly available liver-related datasets to demonstrate that the networks formed by the most probable inter-process links reported by MCMC-BPN show high relevance to each biological condition. We show that MCMC-BPN’s ability to discern the few key links from in a very large solution space by comparing results from two other methods for detecting inter-process links. Conclusions MCMC-BPN is successful in using few inter-process links to explain as many of the perturbed gene-gene interactions as possible. Thereby, BPNs summarize the important biological trends within a response network by reporting a digestible number of inter-process links that can be explored in greater detail.
Collapse
Affiliation(s)
- Christopher D Lasher
- Genetics, Bioinformatics, and Computational Biology Ph.D. Program, Virginia Tech, Blacksburg, VA 24061 USA
| | | | | |
Collapse
|
31
|
Larkin AL, Rodrigues RR, Murali TM, Rajagopalan P. Designing a multicellular organotypic 3D liver model with a detachable, nanoscale polymeric Space of Disse. Tissue Eng Part C Methods 2013; 19:875-84. [PMID: 23556413 DOI: 10.1089/ten.tec.2012.0700] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The design of in vitro models that mimic the stratified multicellular hepatic microenvironment continues to be challenging. Although several in vitro hepatic cultures have been shown to exhibit liver functions, their physiological relevance is limited due to significant deviation from in vivo cellular composition. We report the assembly of a novel three-dimensional (3D) organotypic liver model incorporating three different cell types (hepatocytes, liver sinusoidal endothelial cells, and Kupffer cells) and a polymeric interface that mimics the Space of Disse. The nanoscale interface is detachable, optically transparent, derived from self-assembled polyelectrolyte multilayers, and exhibits a Young's modulus similar to in vivo values for liver tissue. Only the 3D liver models simultaneously maintain hepatic phenotype and elicit proliferation, while achieving cellular ratios found in vivo. The nanoscale detachable polymeric interfaces can be modulated to mimic basement membranes that exhibit a wide range of physical properties. This facile approach offers a versatile new avenue in the assembly of engineered tissues. These results demonstrate the ability of the tri-cellular 3D cultures to serve as an organotypic hepatic model that elicits proliferation and maintenance of phenotype and in vivo-like cellular ratios.
Collapse
Affiliation(s)
- Adam L Larkin
- 1 Department of Chemical Engineering, Virginia Tech , Blacksburg, Virginia
| | | | | | | |
Collapse
|
32
|
Kidane YH, Lawrence C, Murali TM. The landscape of host transcriptional response programs commonly perturbed by bacterial pathogens: towards host-oriented broad-spectrum drug targets. PLoS One 2013; 8:e58553. [PMID: 23516507 PMCID: PMC3596304 DOI: 10.1371/journal.pone.0058553] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2012] [Accepted: 02/07/2013] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND The emergence of drug-resistant pathogen strains and new infectious agents pose major challenges to public health. A promising approach to combat these problems is to target the host's genes or proteins, especially to discover targets that are effective against multiple pathogens, i.e., host-oriented broad-spectrum (HOBS) drug targets. An important first step in the discovery of such drug targets is the identification of host responses that are commonly perturbed by multiple pathogens. RESULTS In this paper, we present a methodology to identify common host responses elicited by multiple pathogens. First, we identified host responses perturbed by each pathogen using a gene set enrichment analysis of publicly available genome-wide transcriptional datasets. Then, we used biclustering to identify groups of host pathways and biological processes that were perturbed only by a subset of the analyzed pathogens. Finally, we tested the enrichment of each bicluster in human genes that are known drug targets, on the basis of which we elicited putative HOBS targets for specific groups of bacterial pathogens. We identified 84 up-regulated and three down-regulated statistically significant biclusters. Each bicluster contained a group of pathogens that commonly dysregulated a group of biological processes. We validated our approach by checking whether these biclusters correspond to known hallmarks of bacterial infection. Indeed, these biclusters contained biological process such as inflammation, activation of dendritic cells, pro- and anti- apoptotic responses and other innate immune responses. Next, we identified biclusters containing pathogens that infected the same tissue. After a literature-based analysis of the drug targets contained in these biclusters, we suggested new uses of the drugs Anakinra, Etanercept, and Infliximab for gastrointestinal pathogens Yersinia enterocolitica, Helicobacter pylori kx2 strain, and enterohemorrhagic Escherichia coli and the drug Simvastatin for hematopoietic pathogen Ehrlichia chaffeensis. CONCLUSIONS Using a combination of automated analysis of host-response gene expression data and manual study of the literature, we have been able to suggest host-oriented treatments for specific bacterial infections. The analyses and suggestions made in this study may be utilized to generate concrete hypothesis on which gene sets to probe further in the quest for HOBS drug targets for bacterial infections. All our results are available at the following supplementary website: http://bioinformatics.cs.vt.edu/ murali/supplements/2013-kidane-plos-one.
Collapse
Affiliation(s)
- Yared H. Kidane
- Genetics, Bioinformatics, and Computational Biology PhD Program, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Christopher Lawrence
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, Virginia, United States of America
- Department of Biology, Virginia Tech, Blacksburg, Virginia, United States of America
| | - T. M. Murali
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
33
|
Poirel CL, Rahman A, Rodrigues RR, Krishnan A, Addesa JR, Murali TM. Reconciling differential gene expression data with molecular interaction networks. ACTA ACUST UNITED AC 2013; 29:622-9. [PMID: 23314326 DOI: 10.1093/bioinformatics/btt007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. RESULTS To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression P-values such that two genes connected by an interaction show similar changes in their gene expression values. We provide a novel evaluation of four methods from this class of algorithms. We enumerate three desirable properties that this class of algorithms should address. These properties seek to maintain that the returned gene rankings are specific to the condition being studied. Moreover, to ease interpretation, highly ranked genes should participate in coherent network structures and should be functionally enriched with relevant biological pathways. We comprehensively evaluate the extent to which each algorithm addresses these properties on a compendium of gene expression data for 54 diverse human diseases. We show that the reconciled gene rankings can identify novel disease-related functions that are missed by analyzing expression data alone. AVAILABILITY C++ software implementing our algorithms is available in the NetworkReconciliation package as part of the Biorithm software suite under the GNU General Public License: http://bioinformatics.cs.vt.edu/∼murali/software/biorithm-docs.
Collapse
|
34
|
Abstract
Background The normal functioning of a living cell is characterized by complex interaction networks involving many different types of molecules. Associations detected between diseases and perturbations in well-defined pathways within such interaction networks have the potential to illuminate the molecular mechanisms underlying disease progression and response to treatment. Results In this paper, we present a computational method that compares expression profiles of genes in cancer samples to samples from normal tissues in order to detect perturbations of pre-defined pathways in the cancer. In contrast to many previous methods, our scoring function approach explicitly takes into account the interactions between the gene products in a pathway. Moreover, we compute the sub-pathway that has the highest score, as opposed to merely computing the score for the entire pathway. We use a permutation test to assess the statistical significance of the most perturbed sub-pathway. We apply our method to 20 pathways in the Netpath database and to the Global Cancer Map of gene expression in 18 cancers. We demonstrate that our method yields more sensitive results than alternatives that do not consider interactions or measure the perturbation of a pathway as a whole. We perform a sensitivity analysis to show that our approach is robust to modest changes in the input data. Our method confirms numerous well-known connections between pathways and cancers. Conclusions Our results indicate that integrating differential gene expression with the interaction structure in a pathway is a powerful approach for detecting links between a cancer and the pathways perturbed in it. Our results also suggest that even well-studied pathways may be perturbed only partially in any given cancer. Further analysis of cancer-specific sub-pathways may shed new light on the similarities and differences between cancers.
Collapse
Affiliation(s)
- Corban G Rivera
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | | | |
Collapse
|
35
|
Abstract
The complexity, diversity, and richness of experimental data on cellular systems are inspiring the development of computational analysis techniques that can directly prioritize and suggest new experiments.
Collapse
|
36
|
Abstract
Background Many methods have been developed to infer and reason about molecular interaction networks. These approaches often yield networks with hundreds or thousands of nodes and up to an order of magnitude more edges. It is often desirable to summarize the biological information in such networks. A very common approach is to use gene function enrichment analysis for this task. A major drawback of this method is that it ignores information about the edges in the network being analyzed, i.e., it treats the network simply as a set of genes. In this paper, we introduce a novel method for functional enrichment that explicitly takes network interactions into account. Results Our approach naturally generalizes Fisher’s exact test, a gene set-based technique. Given a function of interest, we compute the subgraph of the network induced by genes annotated to this function. We use the sequence of sizes of the connected components of this sub-network to estimate its connectivity. We estimate the statistical significance of the connectivity empirically by a permutation test. We present three applications of our method: i) determine which functions are enriched in a given network, ii) given a network and an interesting sub-network of genes within that network, determine which functions are enriched in the sub-network, and iii) given two networks, determine the functions for which the connectivity improves when we merge the second network into the first. Through these applications, we show that our approach is a natural alternative to network clustering algorithms. Conclusions We presented a novel approach to functional enrichment that takes into account the pairwise relationships among genes annotated by a particular function. Each of the three applications discovers highly relevant functions. We used our methods to study biological data from three different organisms. Our results demonstrate the wide applicability of our methods. Our algorithms are implemented in C++ and are freely available under the GNU General Public License at our supplementary website. Additionally, all our input data and results are available at http://bioinformatics.cs.vt.edu/~murali/supplements/2011-incob-nbe/.
Collapse
|
37
|
Murali TM, Dyer MD, Badger D, Tyler BM, Katze MG. Network-based prediction and analysis of HIV dependency factors. PLoS Comput Biol 2011; 7:e1002164. [PMID: 21966263 PMCID: PMC3178628 DOI: 10.1371/journal.pcbi.1002164] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Accepted: 06/30/2011] [Indexed: 01/27/2023] Open
Abstract
HIV Dependency Factors (HDFs) are a class of human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three previous genome-wide RNAi experiments identified HDF sets with little overlap. We combine data from these three studies with a human protein interaction network to predict new HDFs, using an intuitive algorithm called SinkSource and four other algorithms published in the literature. Our algorithm achieves high precision and recall upon cross validation, as do the other methods. A number of HDFs that we predict are known to interact with HIV proteins. They belong to multiple protein complexes and biological processes that are known to be manipulated by HIV. We also demonstrate that many predicted HDF genes show significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers to determine pathological outcome and the likelihood of AIDS development. More generally, if multiple genome-wide gene-level studies have been performed at independent labs to study the same biological system or phenomenon, our methodology is applicable to interpret these studies simultaneously in the context of molecular interaction networks and to ask if they reinforce or contradict each other. Medicines to cure infectious diseases usually target proteins in the pathogens. Since pathogens have short life cycles, the targeted proteins can rapidly evolve and make the medicines ineffective, especially in viruses such as HIV. However, since viruses have very small genomes, they must exploit the cellular machinery of the host to propagate. Therefore, disrupting the activity of selected host proteins may impede viruses. Three recent experiments have discovered hundreds of such proteins in human cells that HIV depends upon. Surprisingly, these three sets have very little overlap. In this work, we demonstrate that this discrepancy can be explained by considering physical interactions between the human proteins in these studies. Moreover, we exploit these interactions to predict new dependency factors for HIV. Our predictions show very significant overlaps with human proteins that are known to interact with HIV proteins and with human cellular processes that are known to be subverted by the virus. Most importantly, we show that proteins predicted by us may play a prominent role in affecting HIV-related disease progression in lymph nodes. Therefore, our predictions constitute a powerful resource for experimentalists who desire to discover new human proteins that can control the spread of HIV.
Collapse
Affiliation(s)
- T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- * E-mail: (TMM) (TM); (MGK) (MK)
| | - Matthew D. Dyer
- Applied Biosystems, Foster City, California, United States of America
| | - David Badger
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Brett M. Tyler
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Michael G. Katze
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- * E-mail: (TMM) (TM); (MGK) (MK)
| |
Collapse
|
38
|
Dyer MD, Murali TM, Sobral BW. Supervised learning and prediction of physical interactions between human and HIV proteins. Infect Genet Evol 2011; 11:917-23. [PMID: 21382517 DOI: 10.1016/j.meegid.2011.02.022] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Revised: 02/22/2011] [Accepted: 02/24/2011] [Indexed: 02/08/2023]
Abstract
BACKGROUND Infectious diseases result in millions of deaths each year. Physical interactions between pathogen and host proteins often form the basis of such infections. While a number of methods have been proposed for predicting protein-protein interactions (PPIs), they have primarily focused on intra-species protein-protein interactions. METHODOLOGY We present an application of a supervised learning method for predicting physical interactions between host and pathogen proteins, using the human-HIV system. Using a Support Vector Machine with a linear kernel, we explore the use of a number of features including domain profiles, protein sequence k-mers, and properties of human proteins in a human PPI network. We achieve the best cross-validation performance when we use a combination of all three of these features. At a precision value of 70% we obtain recall values greater than 40%, depending on the ratio of positive examples to negative examples used during training. We use a classifier trained using these features to predict new PPIs between human and HIV proteins. We focus our discussion on those predicted interactions that involve human proteins known to be critical for HIV replication and propagation. Examples of predicted interactions with support in the literature include those necessary for viral attachment to the host membrane and subsequent invasion of the host cell. SIGNIFICANCE Unlike intra-species PPIs, host-pathogen PPIs have not yet been experimentally detected on a large scale, though they are likely to play important roles in pathogenesis and disease outcomes. Computational methods that can robustly and accurately predict host-pathogen PPIs hold the promise of guiding future experiments and gaining insights into potential mechanisms of pathogenesis.
Collapse
Affiliation(s)
- Matthew D Dyer
- Virginia Bioinformatics Institute, Virginia Tech, 1 Washington St, Blacksburg, VA 24061, USA
| | | | | |
Collapse
|
39
|
Abstract
The liver plays a vital role in glucose homeostasis, the synthesis of bile acids and the detoxification of foreign substances. Liver culture systems are widely used to test adverse effects of drugs and environmental toxicants. The two most prevalent liver culture systems are hepatocyte monolayers (HMs) and collagen sandwiches (CS). Despite their wide use, comprehensive transcriptional programs and interaction networks in these culture systems have not been systematically investigated. We integrated an existing temporal transcriptional dataset for HM and CS cultures of rat hepatocytes with a functional interaction network of rat genes. We aimed to exploit the functional interactions to identify statistically significant linkages between perturbed biological processes. To this end, we developed a novel approach to compute Contextual Biological Process Linkage Networks (CBPLNs). CBPLNs revealed numerous meaningful connections between different biological processes and gene sets, which we were successful in interpreting within the context of liver metabolism. Multiple phenomena captured by CBPLNs at the process level such as regulation, downstream effects, and feedback loops have well described counterparts at the gene and protein level. CBPLNs reveal high-level linkages between pathways and processes, making the identification of important biological trends more tractable than through interactions between individual genes and molecules alone. Our approach may provide a new route to explore, analyze, and understand cellular responses to internal and external cues within the context of the intricate networks of molecular interactions that control cellular behavior.
Collapse
Affiliation(s)
- Christopher D. Lasher
- Genetics, Bioinformatics, and Computational Biology PhD Program, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Padmavathy Rajagopalan
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- ICTAS Center for Systems Biology of Engineered Tissues, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- * E-mail:
| |
Collapse
|
40
|
Dyer MD, Neff C, Dufford M, Rivera CG, Shattuck D, Bassaganya-Riera J, Murali TM, Sobral BW. The human-bacterial pathogen protein interaction networks of Bacillus anthracis, Francisella tularensis, and Yersinia pestis. PLoS One 2010; 5:e12089. [PMID: 20711500 PMCID: PMC2918508 DOI: 10.1371/journal.pone.0012089] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 07/17/2010] [Indexed: 01/01/2023] Open
Abstract
Background Bacillus anthracis, Francisella tularensis, and Yersinia pestis are bacterial pathogens that can cause anthrax, lethal acute pneumonic disease, and bubonic plague, respectively, and are listed as NIAID Category A priority pathogens for possible use as biological weapons. However, the interactions between human proteins and proteins in these bacteria remain poorly characterized leading to an incomplete understanding of their pathogenesis and mechanisms of immune evasion. Methodology In this study, we used a high-throughput yeast two-hybrid assay to identify physical interactions between human proteins and proteins from each of these three pathogens. From more than 250,000 screens performed, we identified 3,073 human-B. anthracis, 1,383 human-F. tularensis, and 4,059 human-Y. pestis protein-protein interactions including interactions involving 304 B. anthracis, 52 F. tularensis, and 330 Y. pestis proteins that are uncharacterized. Computational analysis revealed that pathogen proteins preferentially interact with human proteins that are hubs and bottlenecks in the human PPI network. In addition, we computed modules of human-pathogen PPIs that are conserved amongst the three networks. Functionally, such conserved modules reveal commonalities between how the different pathogens interact with crucial host pathways involved in inflammation and immunity. Significance These data constitute the first extensive protein interaction networks constructed for bacterial pathogens and their human hosts. This study provides novel insights into host-pathogen interactions.
Collapse
Affiliation(s)
- Matthew D. Dyer
- Virginia Bioinformatics Institute, Blacksburg, Virginia, United States of America
| | - Chris Neff
- Myriad Genetics, Salt Lake City, Utah, United States of America
| | - Max Dufford
- Myriad Genetics, Salt Lake City, Utah, United States of America
| | - Corban G. Rivera
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Donna Shattuck
- Myriad Genetics, Salt Lake City, Utah, United States of America
| | | | - T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
- * E-mail: (TMM); (BWS)
| | - Bruno W. Sobral
- Virginia Bioinformatics Institute, Blacksburg, Virginia, United States of America
- * E-mail: (TMM); (BWS)
| |
Collapse
|
41
|
Kim Y, Lasher CD, Milford LM, Murali TM, Rajagopalan P. A comparative study of genome-wide transcriptional profiles of primary hepatocytes in collagen sandwich and monolayer cultures. Tissue Eng Part C Methods 2010; 16:1449-60. [PMID: 20412007 DOI: 10.1089/ten.tec.2010.0012] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Two commonly used culture systems in hepatic tissue engineering are the collagen sandwich (CS) and monolayers of cells. In this study, genome-wide gene expression profiles of primary hepatocytes were measured over an 8-day period for each cell culture system using Affymetrix GeneChips and compared via gene set enrichment analysis to elicit biologically meaningful information at the level of gene sets. Our results demonstrate that gene expression in hepatocytes in CS cultures steadily and comprehensively diverges from that in monolayer cultures. Gene sets up-regulated in CS cultures include several associated with liver metabolic and synthesis functions, such as metabolism of lipids, amino acids, carbohydrates, and alcohol, and synthesis of bile acids. Monooxygenases such as Cytochrome-P450 enzymes do not show any change between the culture systems after 1 day, but exhibit significant up-regulation in CS cultures after 3 days in comparison to hepatocyte monolayers. These data provide insights into the up- and down-regulation of several liver-critical gene sets and their subsequent effects on liver-specific functions. These results provide a baseline for further explorations into the systems biology of engineered liver mimics.
Collapse
Affiliation(s)
- Yeonhee Kim
- Department of Chemical Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | | | | | | | | |
Collapse
|
42
|
|
43
|
Abstract
Protein-protein interactions (PPIs) play a vital role in initiating infection in a number of pathogens. Identifying which interactions allow a pathogen to infect its host can help us to understand methods of pathogenesis and provide potential targets for therapeutics. Public resources for studying host-pathogen systems, in particular PPIs, are scarce. To facilitate the study of host-pathogen PPIs, we have collected and integrated host-pathogen PPI (HP-PPI) data from a number of public resources to create the Pathogen Interaction Gateway (PIG). PIG provides a text based search and a BLAST interface for searching the HP-PPI data. Each entry in PIG includes information such as the functional annotations and the domains present in the interacting proteins. PIG provides links to external databases to allow for easy navigation among the various websites. Additionally, PIG includes a tool for visualizing a single HP-PPI network or two HP-PPI networks. PIG can be accessed at http://pig.vbi.vt.edu.
Collapse
Affiliation(s)
- Tim Driscoll
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | | | | | | |
Collapse
|
44
|
Affiliation(s)
- T. M. Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA
| | - Corban G. Rivera
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA
| |
Collapse
|
45
|
Abstract
MOTIVATION Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins. Developing computational methods that identify which PPIs enable a pathogen to infect a host has great implications in identifying potential targets for therapeutics. RESULTS We present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins. Given a set of intra-species PPIs, we identify the functional domains in each of the interacting proteins. For every pair of functional domains, we use Bayesian statistics to assess the probability that two proteins with that pair of domains will interact. We apply our method to the Homo sapiens-Plasmodium falciparum host-pathogen system. Our system predicts 516 PPIs between proteins from these two organisms. We show that pairs of human proteins we predict to interact with the same Plasmodium protein are close to each other in the human PPI network and that Plasmodium pairs predicted to interact with same human protein are co-expressed in DNA microarray datasets measured during various stages of the Plasmodium life cycle. Finally, we identify functionally enriched sub-networks spanned by the predicted interactions and discuss the plausibility of our predictions. AVAILABILITY Supplementary data are available at http://staff.vbi.vt.edu/dyermd/publications/dyer2007a.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew D Dyer
- Genetics, Bioinformatics and Computational Biology Program, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA.
| | | | | |
Collapse
|
46
|
|
47
|
Li P, Sioson A, Mane SP, Ulanov A, Grothaus G, Heath LS, Murali TM, Bohnert HJ, Grene R. Response diversity of Arabidopsis thaliana ecotypes in elevated [CO2] in the field. Plant Mol Biol 2006; 62:593-609. [PMID: 16941220 DOI: 10.1007/s11103-006-9041-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Accepted: 06/27/2006] [Indexed: 05/02/2023]
Abstract
Free Air [CO(2)] Enrichment (FACE) allows for plant growth under fully open-air conditions of elevated [CO(2)] at concentrations expected to be reached by mid-century. We used Arabidopsis thaliana ecotypes Col-0, Cvi-0, and WS to analyze changes in gene expression and metabolite profiles of plants grown in "SoyFACE" (http://www.soyface.uiuc.edu/), a system of open-air rings within which [CO(2)] is elevated to approximately 550 ppm. Data from multiple rings, comparing plants in ambient air and elevated [CO(2)], were analyzed by mixed model ANOVA, linear discriminant analysis (LDA) and data-mining tools. In elevated [CO(2)], decreases in the expression of genes related to chloroplast functions characterized all lines but individual members of distinct multi-gene families were regulated differently between lines. Also, different strategies distinguished the lines with respect to the regulation of genes related to carbohydrate biosynthesis and partitioning, N-allocation and amino acid metabolism, cell wall biosynthesis, and hormone responses, irrespective of the plants' developmental status. Metabolite results paralleled reactions seen at the level of transcript expression. Evolutionary adaptation of species to their habitat and intrinsic genetic plasticity seem to determine the nature of responses to elevated [CO(2)]. Irrespective of their underlying genetic diversity, and evolutionary adaptation to different habitats, a small number of common, predominantly stress-responsive, signature transcripts appear to characterize responses of the Arabidopsis ecotypes in FACE.
Collapse
Affiliation(s)
- Pinghua Li
- Department of Plant Biology, University of Illinois, 1201 W Gregory Drive, Urbana, IL 61801, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Abstract
Background Biclustering has emerged as a powerful algorithmic tool for analyzing measurements of gene expression. A number of different methods have emerged for computing biclusters in gene expression data. Many of these algorithms may output a very large number of biclusters with varying degrees of overlap. There are no systematic methods that create a two-dimensional layout of the computed biclusters and display overlaps between them. Results We develop a novel algorithm for laying out biclusters in a two-dimensional matrix whose rows (respectively, columns) are rows (respectively, columns) of the original dataset. We display each bicluster as a contiguous submatrix in the layout. We allow the layout to have repeated rows and/or columns from the original matrix as required, but we seek a layout of the smallest size. We also develop a web-based search interface for the user to query the genes and samples of interest and visualise the layout of biclusters matching the queries. Conclusion We demonstrate the usefulness of our approach on gene expression data for two types of leukaemia and on protein-DNA binding data for two growth conditions in Saccharomyces cerevisiae. The software implementing the layout algorithm is available at .
Collapse
Affiliation(s)
- Gregory A Grothaus
- Department of Computer Science, 660 McBryde Hall, Virginia Polytechnic Institute and State University, Blacksburg VA 24061, USA
- Google Inc., 1600 Amphitheater Parkway, Mountain View CA 94043, USA
| | - Adeel Mufti
- Department of Computer Science, 660 McBryde Hall, Virginia Polytechnic Institute and State University, Blacksburg VA 24061, USA
| | - TM Murali
- Department of Computer Science, 660 McBryde Hall, Virginia Polytechnic Institute and State University, Blacksburg VA 24061, USA
| |
Collapse
|
49
|
Abstract
Dramatic advances in sequencing technology and sophisticated experimental assays that interrogate the cell, combined with the public availability of the resulting data, herald the era of systems biology. However, the biological functions of more than 40% of the genes in sequenced genomes are unknown, posing a fundamental barrier to progress in systems biology. The large scale and diversity of available data requires the development of techniques that can automatically utilize these datasets to make quantified and robust predictions of gene function that can be experimentally verified. We present a service called the VIRtual Gene Ontology (VIRGO) that (i) constructs a functional linkage network (FLN) from gene expression and molecular interaction data, (ii) labels genes in the FLN with their functional annotations in the Gene Ontology and (iii) systematically propagates these labels across the FLN in order to precisely predict the functions of unlabelled genes. VIRGO assigns confidence estimates to predicted functions so that a biologist can prioritize predictions for further experimental study. For each prediction, VIRGO also provides an informative ‘propagation diagram’ that traces the flow of information in the FLN that led to the prediction. VIRGO is available at .
Collapse
Affiliation(s)
| | | | - T. M. Murali
- To whom correspondence should be addressed. Tel: +1 540 231 8534; Fax: +1 540 231 6075;
| |
Collapse
|
50
|
Abstract
BACKGROUND Modeling of cis-elements or regulatory motifs in promoter (upstream) regions of genes is a challenging computational problem. In this work, set of regulatory motifs simultaneously present in the promoters of a set of genes is modeled as a biclique in a suitably defined bipartite graph. A biologically meaningful co-occurrence of multiple cis-elements in a gene promoter is assessed by the combined analysis of genomic and gene expression data. Greater statistical significance is associated with a set of genes that shares a common set of regulatory motifs, while simultaneously exhibiting highly correlated gene expression under given experimental conditions. METHODS XcisClique, the system developed in this work, is a comprehensive infrastructure that associates annotated genome and gene expression data, models known cis-elements as regular expressions, identifies maximal bicliques in a bipartite gene-motif graph; and ranks bicliques based on their computed statistical significance. Significance is a function of the probability of occurrence of those motifs in a biclique (a hypergeometric distribution), and on the new sum of absolute values statistic (SAV) that uses Spearman correlations of gene expression vectors. SAV is a statistic well-suited for this purpose as described in the discussion. RESULTS XcisClique identifies new motif and gene combinations that might indicate as yet unidentified involvement of sets of genes in biological functions and processes. It currently supports Arabidopsis thaliana and can be adapted to other organisms, assuming the existence of annotated genomic sequences, suitable gene expression data, and identified regulatory motifs. A subset of Xcis Clique functionalities, including the motif visualization component MotifSee, source code, and supplementary material are available at https://bioinformatics.cs.vt.edu/xcisclique/.
Collapse
Affiliation(s)
- Amrita Pati
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Cecilia Vasquez-Robinet
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Lenwood S Heath
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Ruth Grene
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - TM Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|