1
|
Fuadah YN, Qauli AI, Marcellinus A, Pramudito MA, Lim KM. Machine learning approach to evaluate TdP risk of drugs using cardiac electrophysiological model including inter-individual variability. Front Physiol 2023; 14:1266084. [PMID: 37860622 PMCID: PMC10584148 DOI: 10.3389/fphys.2023.1266084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Introduction: Predicting ventricular arrhythmia Torsade de Pointes (TdP) caused by drug-induced cardiotoxicity is essential in drug development. Several studies used single biomarkers such as qNet and Repolarization Abnormality (RA) in a single cardiac cell model to evaluate TdP risk. However, a single biomarker may not encompass the full range of factors contributing to TdP risk, leading to divergent TdP risk prediction outcomes, mainly when evaluated using unseen data. We addressed this issue by utilizing multi-in silico features from a population of human ventricular cell models that could capture a representation of the underlying mechanisms contributing to TdP risk to provide a more reliable assessment of drug-induced cardiotoxicity. Method: We generated a virtual population of human ventricular cell models using a modified O'Hara-Rudy model, allowing inter-individual variation. IC 50 and Hill coefficients from 67 drugs were used as input to simulate drug effects on cardiac cells. Fourteen features (dVm dt repol , dVm dt max , Vm peak , Vm resting , APD tri , APD 90 , APD 50 , Ca peak , Ca diastole , Ca tri , CaD 90 , CaD 50 , qNet, qInward) could be generated from the simulation and used as input to several machine learning models, including k-nearest neighbor (KNN), Random Forest (RF), XGBoost, and Artificial Neural Networks (ANN). Optimization of the machine learning model was performed using a grid search to select the best parameter of the proposed model. We applied five-fold cross-validation while training the model with 42 drugs and evaluated the model's performance with test data from 25 drugs. Result: The proposed ANN model showed the highest performance in predicting the TdP risk of drugs by providing an accuracy of 0.923 (0.908-0.937), sensitivity of 0.926 (0.909-0.942), specificity of 0.921 (0.906-0.935), and AUC score of 0.964 (0.954-0.975). Discussion and conclusion: According to the performance results, combining the electrophysiological model including inter-individual variation and optimization of machine learning showed good generalization ability when evaluated using the unseen dataset and produced a reliable drug-induced TdP risk prediction system.
Collapse
Affiliation(s)
- Yunendah Nur Fuadah
- Computational Medicine Lab, Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi, Republic of Korea
- School of Electrical Engineering, Telkom University, Bandung, Indonesia
| | - Ali Ikhsanul Qauli
- Computational Medicine Lab, Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi, Republic of Korea
- Department of Engineering, Faculty of Advanced Technology and Multidiscipline, Universitas Airlangga, Surabaya, Jawa Timur, Indonesia
| | - Aroli Marcellinus
- Computational Medicine Lab, Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi, Republic of Korea
| | - Muhammad Adnan Pramudito
- Computational Medicine Lab, Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi, Republic of Korea
| | - Ki Moo Lim
- Computational Medicine Lab, Department of IT Convergence Engineering, Kumoh National Institute of Technology, Gumi, Republic of Korea
- Computational Medicine Lab, Department of Medical IT Convergence Engineering, Kumoh National Institute of Technology, Gumi, Republic of Korea
- Meta Heart Co., Ltd., Gumi, Republic of Korea
| |
Collapse
|
2
|
Tenenbaum D, Inlow K, Friedman LJ, Cai A, Gelles J, Kondev J. RNA polymerase sliding on DNA can couple the transcription of nearby bacterial operons. Proc Natl Acad Sci U S A 2023; 120:e2301402120. [PMID: 37459525 PMCID: PMC10372574 DOI: 10.1073/pnas.2301402120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 05/19/2023] [Indexed: 07/20/2023] Open
Abstract
DNA transcription initiates after an RNA polymerase (RNAP) molecule binds to the promoter of a gene. In bacteria, the canonical picture is that RNAP comes from the cytoplasmic pool of freely diffusing RNAP molecules. Recent experiments suggest the possible existence of a separate pool of polymerases, competent for initiation, which freely slide on the DNA after having terminated one round of transcription. Promoter-dependent transcription reinitiation from this pool of posttermination RNAP may lead to coupled initiation at nearby operons, but it is unclear whether this can occur over the distance and timescales needed for it to function widely on a bacterial genome in vivo. Here, we mathematically model the hypothesized reinitiation mechanism as a diffusion-to-capture process and compute the distances over which significant interoperon coupling can occur and the time required. These quantities depend on molecular association and dissociation rate constants between DNA, RNAP, and the transcription initiation factor σ70; we measure these rate constants using single-molecule experiments in vitro. Our combined theory/experimental results demonstrate that efficient coupling can occur at physiologically relevant σ70 concentrations and on timescales appropriate for transcript synthesis. Coupling is efficient over terminator-promoter distances up to ∼1,000 bp, which includes the majority of terminator-promoter nearest neighbor pairs in the Escherichia coli genome. The results suggest a generalized mechanism that couples the transcription of nearby operons and breaks the paradigm that each binding of RNAP to DNA can produce at most one messenger RNA.
Collapse
Affiliation(s)
- Debora Tenenbaum
- Department of Biochemistry, Brandeis University, Waltham, MA02453
- Department of Physics, Brandeis University, Waltham, MA02453
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY11724
| | - Koe Inlow
- Department of Biochemistry, Brandeis University, Waltham, MA02453
| | | | - Anthony Cai
- Department of Biochemistry, Brandeis University, Waltham, MA02453
| | - Jeff Gelles
- Department of Biochemistry, Brandeis University, Waltham, MA02453
| | - Jane Kondev
- Department of Physics, Brandeis University, Waltham, MA02453
| |
Collapse
|
3
|
Functional cooperativity between the trigger factor chaperone and the ClpXP proteolytic complex. Nat Commun 2021; 12:281. [PMID: 33436616 PMCID: PMC7804408 DOI: 10.1038/s41467-020-20553-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 12/08/2020] [Indexed: 01/29/2023] Open
Abstract
A functional association is uncovered between the ribosome-associated trigger factor (TF) chaperone and the ClpXP degradation complex. Bioinformatic analyses demonstrate conservation of the close proximity of tig, the gene coding for TF, and genes coding for ClpXP, suggesting a functional interaction. The effect of TF on ClpXP-dependent degradation varies based on the nature of substrate. While degradation of some substrates are slowed down or are unaffected by TF, surprisingly, TF increases the degradation rate of a third class of substrates. These include λ phage replication protein λO, master regulator of stationary phase RpoS, and SsrA-tagged proteins. Globally, TF acts to enhance the degradation of about 2% of newly synthesized proteins. TF is found to interact through multiple sites with ClpX in a highly dynamic fashion to promote protein degradation. This chaperone-protease cooperation constitutes a unique and likely ancestral aspect of cellular protein homeostasis in which TF acts as an adaptor for ClpXP.
Collapse
|
4
|
Pannier L, Merino E, Marchal K, Collado-Vides J. Effect of genomic distance on coexpression of coregulated genes in E. coli. PLoS One 2017; 12:e0174887. [PMID: 28419102 PMCID: PMC5395161 DOI: 10.1371/journal.pone.0174887] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/16/2017] [Indexed: 12/26/2022] Open
Abstract
In prokaryotes, genomic distance is a feature that in addition to coregulation affects coexpression. Several observations, such as genomic clustering of highly coexpressed small regulons, support the idea that coexpression behavior of coregulated genes is affected by the distance between the coregulated genes. However, the specific contribution of distance in addition to coregulation in determining the degree of coexpression has not yet been studied systematically. In this work, we exploit the rich information in RegulonDB to study how the genomic distance between coregulated genes affects their degree of coexpression, measured by pairwise similarity of expression profiles obtained under a large number of conditions. We observed that, in general, coregulated genes display higher degrees of coexpression as they are more closely located on the genome. This contribution of genomic distance in determining the degree of coexpression was relatively small compared to the degree of coexpression that was determined by the tightness of the coregulation (degree of overlap of regulatory programs) but was shown to be evolutionary constrained. In addition, the distance effect was sufficient to guarantee coexpression of coregulated genes that are located at very short distances, irrespective of their tightness of coregulation. This is partly but definitely not always because the close distance is also the cause of the coregulation. In cases where it is not, we hypothesize that the effect of the distance on coexpression could be caused by the fact that coregulated genes closely located to each other are also relatively more equidistantly located from their common TF and therefore subject to more similar levels of TF molecules. The absolute genomic distance of the coregulated genes to their common TF-coding gene tends to be less important in determining the degree of coexpression. Our results pinpoint the importance of taking into account the combined effect of distance and coregulation when studying prokaryotic coexpression and transcriptional regulation.
Collapse
Affiliation(s)
- Lucia Pannier
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Enrique Merino
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Kathleen Marchal
- Department of Microbial and Molecular Systems, KU Leuven, Centre of Microbial and Plant Genetics, Leuven, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark, Ghent, Belgium
- Department of Information Technology, Ghent University, IMinds, Ghent, Belgium
- Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria, South Africa
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| |
Collapse
|
5
|
Coman D, Rütimann P, Gruissem W. A flexible protocol for targeted gene co-expression network analysis. Methods Mol Biol 2014; 1153:285-99. [PMID: 24777806 DOI: 10.1007/978-1-4939-0606-2_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The inference of gene co-expression networks is a valuable resource for novel hypotheses in experimental research. Routine high-throughput microarray transcript profiling experiments and the rapid development of next-generation sequencing (NGS) technologies generate a large amount of publicly available data, enabling in silico reconstruction of regulatory networks. Analysis of the transcriptome under various experimental conditions proved that genes with an overall similar expression pattern often have similar functions. Consistently, genes involved in the same metabolic pathway are found in co-expressed modules. In this chapter, we describe a detailed workflow for analyzing gene co-expression networks using large-scale gene expression data and explain critical steps from design and data analysis to prediction of functionally related modules. This protocol is platform independent and can be used for data generated by ATH1 arrays, tiling arrays, or RNA sequencing for any organism. The most important feature of this workflow is that it can infer statistically significant gene co-expression networks for any number of genes and transcriptome data sets and it does not involve any particular hardware requirements.
Collapse
Affiliation(s)
- Diana Coman
- Department of Biology, Plant Biotechnology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland
| | | | | |
Collapse
|
6
|
Lin L, Song H, Tu Q, Qin Y, Zhou A, Liu W, He Z, Zhou J, Xu J. The Thermoanaerobacter glycobiome reveals mechanisms of pentose and hexose co-utilization in bacteria. PLoS Genet 2011; 7:e1002318. [PMID: 22022280 PMCID: PMC3192829 DOI: 10.1371/journal.pgen.1002318] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2011] [Accepted: 08/07/2011] [Indexed: 11/18/2022] Open
Abstract
Thermoanaerobic bacteria are of interest in cellulosic-biofuel production, due to their simultaneous pentose and hexose utilization (co-utilization) and thermophilic nature. In this study, we experimentally reconstructed the structure and dynamics of the first genome-wide carbon utilization network of thermoanaerobes. The network uncovers numerous novel pathways and identifies previously unrecognized but crucial pathway interactions and the associated key junctions. First, glucose, xylose, fructose, and cellobiose catabolism are each featured in distinct functional modules; the transport systems of hexose and pentose are apparently both regulated by transcriptional antiterminators of the BglG family, which is consistent with pentose and hexose co-utilization. Second, glucose and xylose modules cooperate in that the activity of the former promotes the activity of the latter via activating xylose transport and catabolism, while xylose delays cell lysis by sustaining coenzyme and ion metabolism. Third, the vitamin B12 pathway appears to promote ethanologenesis through ethanolamine and 1, 2-propanediol, while the arginine deiminase pathway probably contributes to cell survival in stationary phase. Moreover, by experimentally validating the distinct yet collaborative nature of glucose and xylose catabolism, we demonstrated that these novel network-derived features can be rationally exploited for product-yield enhancement via optimized timing and balanced loading of the carbon supply in a substrate-specific manner. Thus, this thermoanaerobic glycobiome reveals novel genetic features in carbon catabolism that may have immediate industrial implications and provides novel strategies and targets for fermentation and genome engineering. Renewable liquid fuels derived from lignocellulosic biomass could alleviate global energy shortage and climate change. Cellulose and hemicellulose are the main components of lignocellulosic biomass. Therefore, the ability to simultaneously utilize pentose and hexose (i.e., co-utilization) has been a crucial challenge for industrial microbes producing lignocellulosic biofuels. Certain thermoanaerobic bacteria demonstrate this unusual talent, but the genetic foundation and molecular mechanism of this process remain unknown. In this study, we reconstructed the structure and dynamics of the first genome-wide carbon utilization network of thermoanaerobes. This transcriptome-based co-expression network reveals that glucose, xylose, fructose, and cellobiose catabolism are each featured on distinct functional modules. Furthermore, the dynamics of the network suggests a distinct yet collaborative nature between glucose and xylose catabolism. In addition, we experimentally demonstrated that these novel network-derived features can be rationally exploited for product-yield enhancement via optimized timing and balanced loading of the carbon supply in a substrate-specific manner. Thus, the newly discovered modular and precisely regulated network elucidates unique features of thermoanaerobic glycobiomes and reveals novel perturbation strategies and targets for the enhanced thermophilic production of lignocellulosic biofuels.
Collapse
Affiliation(s)
- Lu Lin
- CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and BioEnergy Genome Center, Qingdao Institute of BioEnergy and BioProcess Technology, Chinese Academy of Sciences, Qingdao, China
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Houhui Song
- CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and BioEnergy Genome Center, Qingdao Institute of BioEnergy and BioProcess Technology, Chinese Academy of Sciences, Qingdao, China
| | - Qichao Tu
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Yujia Qin
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Aifen Zhou
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Wenbin Liu
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Zhili He
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Jizhong Zhou
- Institute for Environmental Genomics and Department of Botany and Microbiology, University of Oklahoma, Norman, Oklahoma, United States of America
- * E-mail: (JZ); (JX)
| | - Jian Xu
- CAS Key Laboratory of Biofuels, Shandong Key Laboratory of Energy Genetics and BioEnergy Genome Center, Qingdao Institute of BioEnergy and BioProcess Technology, Chinese Academy of Sciences, Qingdao, China
- * E-mail: (JZ); (JX)
| |
Collapse
|
7
|
Michoel T, De Smet R, Joshi A, Van de Peer Y, Marchal K. Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks. BMC SYSTEMS BIOLOGY 2009; 3:49. [PMID: 19422680 PMCID: PMC2684101 DOI: 10.1186/1752-0509-3-49] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 05/07/2009] [Indexed: 12/20/2022]
Abstract
BACKGROUND A myriad of methods to reverse-engineer transcriptional regulatory networks have been developed in recent years. Direct methods directly reconstruct a network of pairwise regulatory interactions while module-based methods predict a set of regulators for modules of coexpressed genes treated as a single unit. To date, there has been no systematic comparison of the relative strengths and weaknesses of both types of methods. RESULTS We have compared a recently developed module-based algorithm, LeMoNe (Learning Module Networks), to a mutual information based direct algorithm, CLR (Context Likelihood of Relatedness), using benchmark expression data and databases of known transcriptional regulatory interactions for Escherichia coli and Saccharomyces cerevisiae. A global comparison using recall versus precision curves hides the topologically distinct nature of the inferred networks and is not informative about the specific subtasks for which each method is most suited. Analysis of the degree distributions and a regulator specific comparison show that CLR is 'regulator-centric', making true predictions for a higher number of regulators, while LeMoNe is 'target-centric', recovering a higher number of known targets for fewer regulators, with limited overlap in the predicted interactions between both methods. Detailed biological examples in E. coli and S. cerevisiae are used to illustrate these differences and to prove that each method is able to infer parts of the network where the other fails. Biological validation of the inferred networks cautions against over-interpreting recall and precision values computed using incomplete reference networks. CONCLUSION Our results indicate that module-based and direct methods retrieve largely distinct parts of the underlying transcriptional regulatory networks. The choice of algorithm should therefore be based on the particular biological problem of interest and not on global metrics which cannot be transferred between organisms. The development of sound statistical methods for integrating the predictions of different reverse-engineering strategies emerges as an important challenge for future research.
Collapse
Affiliation(s)
- Tom Michoel
- Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | |
Collapse
|