1
|
Du Q, Guo Y, Zhang J, Lu F, Peng C, Zhou C. Predicting Promoters in Multiple Prokaryotes with Prompt. Interdiscip Sci 2024; 16:814-828. [PMID: 39110340 DOI: 10.1007/s12539-024-00637-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/17/2024] [Accepted: 05/21/2024] [Indexed: 10/27/2024]
Abstract
Promoters are important cis-regulatory elements for the regulation of gene expression, and their accurate predictions are crucial for elucidating the biological functions and potential mechanisms of genes. Many previous prokaryotic promoter prediction methods are encouraging in terms of the prediction performance, but most of them focus on the recognition of promoters in only one or a few bacterial species. Moreover, due to ignoring the promoter sequence motifs, the interpretability of predictions with existing methods is limited. In this work, we present a generalized method Prompt (Promoters in multiple prokaryotes) to predict promoters in 16 prokaryotes and improve the interpretability of prediction results. Prompt integrates three methods including RSK (Regression based on Selected k-mer), CL (Contrastive Learning) and MLP (Multilayer Perception), and employs a voting strategy to divide the datasets into high-confidence and low-confidence categories. Results on the promoter prediction tasks in 16 prokaryotes show that the accuracy (Accuracy, Matthews correlation coefficient) of Prompt is greater than 80% in highly credible datasets of 16 prokaryotes, and is greater than 90% in 12 prokaryotes, and Prompt performs the best compared with other existing methods. Moreover, by identifying promoter sequence motifs, Prompt can improve the interpretability of the predictions. Prompt is freely available at https://github.com/duqimeng/PromptPrompt , and will contribute to the research of promoters in prokaryote.
Collapse
Affiliation(s)
- Qimeng Du
- School of Engineering, Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Dali University, Dali, 671003, China
| | - Yixue Guo
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, 300457, China
| | - Junpeng Zhang
- School of Engineering, Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Dali University, Dali, 671003, China
| | - Fuping Lu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, 300457, China
| | - Chong Peng
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, 300457, China.
| | - Chichun Zhou
- School of Engineering, Air-Space-Ground Integrated Intelligence and Big Data Application Engineering Research Center of Yunnan Provincial Department of Education, Dali University, Dali, 671003, China.
| |
Collapse
|
2
|
Howerton E, Runge MC, Bogich TL, Borchering RK, Inamine H, Lessler J, Mullany LC, Probert WJM, Smith CP, Truelove S, Viboud C, Shea K. Context-dependent representation of within- and between-model uncertainty: aggregating probabilistic predictions in infectious disease epidemiology. J R Soc Interface 2023; 20:20220659. [PMID: 36695018 PMCID: PMC9874266 DOI: 10.1098/rsif.2022.0659] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 01/03/2023] [Indexed: 01/26/2023] Open
Abstract
Probabilistic predictions support public health planning and decision making, especially in infectious disease emergencies. Aggregating outputs from multiple models yields more robust predictions of outcomes and associated uncertainty. While the selection of an aggregation method can be guided by retrospective performance evaluations, this is not always possible. For example, if predictions are conditional on assumptions about how the future will unfold (e.g. possible interventions), these assumptions may never materialize, precluding any direct comparison between predictions and observations. Here, we summarize literature on aggregating probabilistic predictions, illustrate various methods for infectious disease predictions via simulation, and present a strategy for choosing an aggregation method when empirical validation cannot be used. We focus on the linear opinion pool (LOP) and Vincent average, common methods that make different assumptions about between-prediction uncertainty. We contend that assumptions of the aggregation method should align with a hypothesis about how uncertainty is expressed within and between predictions from different sources. The LOP assumes that between-prediction uncertainty is meaningful and should be retained, while the Vincent average assumes that between-prediction uncertainty is akin to sampling error and should not be preserved. We provide an R package for implementation. Given the rising importance of multi-model infectious disease hubs, our work provides useful guidance on aggregation and a deeper understanding of the benefits and risks of different approaches.
Collapse
Affiliation(s)
- Emily Howerton
- Department of Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, University Park, PA, USA
| | - Michael C. Runge
- Eastern Ecological Science Center at the Patuxent Research Refuge, U.S. Geological Survey, Laurel, MD, USA
| | - Tiffany L. Bogich
- Department of Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, University Park, PA, USA
| | - Rebecca K. Borchering
- Department of Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, University Park, PA, USA
| | - Hidetoshi Inamine
- Department of Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, University Park, PA, USA
| | - Justin Lessler
- Department of Epidemiology and Carolina Population Center, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Luke C. Mullany
- Applied Physics Laboratory, Johns Hopkins University, Baltimore, MD, USA
| | - William J. M. Probert
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
| | - Claire P. Smith
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Shaun Truelove
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
- Department of International Health, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Cécile Viboud
- Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| | - Katriona Shea
- Department of Biology and Center for Infectious Disease Dynamics, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
3
|
Panovska-Griffiths J, Waites W, Ackland GJ. Technical challenges of modelling real-life epidemics and examples of overcoming these. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2022; 380:20220179. [PMID: 35965472 PMCID: PMC9376714 DOI: 10.1098/rsta.2022.0179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/05/2022] [Indexed: 06/15/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has highlighted the importance of mathematical modelling in informing and advising policy decision-making. Effective practice of mathematical modelling has challenges. These can be around the technical modelling framework and how different techniques are combined, the appropriate use of mathematical formalisms or computational languages to accurately capture the intended mechanism or process being studied, in transparency and robustness of models and numerical code, in simulating the appropriate scenarios via explicitly identifying underlying assumptions about the process in nature and simplifying approximations to facilitate modelling, in correctly quantifying the uncertainty of the model parameters and projections, in taking into account the variable quality of data sources, and applying established software engineering practices to avoid duplication of effort and ensure reproducibility of numerical results. Via a collection of 16 technical papers, this special issue aims to address some of these challenges alongside showcasing the usefulness of modelling as applied in this pandemic. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
Collapse
Affiliation(s)
- J. Panovska-Griffiths
- The Big Data Institute and the Pandemic Sciences Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
- The Queen’s College, University of Oxford, Oxford, UK
| | - W. Waites
- Department of Computer and Information Sciences, University of Strathclyde, Glasgow G1 1XH, UK
| | - G. J. Ackland
- Institute of Condensed Matter and Complex Systems, School of Physics and Astronomy, University of Edinburgh, Edinburgh EH9 3FD, UK
| |
Collapse
|
4
|
Dykes J, Abdul-Rahman A, Archambault D, Bach B, Borgo R, Chen M, Enright J, Fang H, Firat EE, Freeman E, Gönen T, Harris C, Jianu R, John NW, Khan S, Lahiff A, Laramee RS, Matthews L, Mohr S, Nguyen PH, Rahat AAM, Reeve R, Ritsos PD, Roberts JC, Slingsby A, Swallow B, Torsney-Weir T, Turkay C, Turner R, Vidal FP, Wang Q, Wood J, Xu K. Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2022; 380:20210299. [PMID: 35965467 PMCID: PMC9376715 DOI: 10.1098/rsta.2021.0299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We report on an ongoing collaboration between epidemiological modellers and visualization researchers by documenting and reflecting upon knowledge constructs-a series of ideas, approaches and methods taken from existing visualization research and practice-deployed and developed to support modelling of the COVID-19 pandemic. Structured independent commentary on these efforts is synthesized through iterative reflection to develop: evidence of the effectiveness and value of visualization in this context; open problems upon which the research communities may focus; guidance for future activity of this type and recommendations to safeguard the achievements and promote, advance, secure and prepare for future collaborations of this kind. In describing and comparing a series of related projects that were undertaken in unprecedented conditions, our hope is that this unique report, and its rich interactive supplementary materials, will guide the scientific community in embracing visualization in its observation, analysis and modelling of data as well as in disseminating findings. Equally we hope to encourage the visualization community to engage with impactful science in addressing its emerging data challenges. If we are successful, this showcase of activity may stimulate mutually beneficial engagement between communities with complementary expertise to address problems of significance in epidemiology and beyond. See https://ramp-vis.github.io/RAMPVIS-PhilTransA-Supplement/. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
Collapse
Affiliation(s)
| | | | | | | | | | - Min Chen
- University of Oxford, Oxford, UK
| | | | - Hui Fang
- Loughborough University, Loughborough, UK
| | | | | | | | - Claire Harris
- Biomathematics and Statistics Scotland, Edinburgh, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Qiru Wang
- University of Nottingham, Nottingham, UK
| | - Jo Wood
- City, University of London, London, UK
| | - Kai Xu
- Middlesex University, London, UK
| |
Collapse
|
5
|
Dykes J, Abdul-Rahman A, Archambault D, Bach B, Borgo R, Chen M, Enright J, Fang H, Firat EE, Freeman E, Gönen T, Harris C, Jianu R, John NW, Khan S, Lahiff A, Laramee RS, Matthews L, Mohr S, Nguyen PH, Rahat AAM, Reeve R, Ritsos PD, Roberts JC, Slingsby A, Swallow B, Torsney-Weir T, Turkay C, Turner R, Vidal FP, Wang Q, Wood J, Xu K. Visualization for epidemiological modelling: challenges, solutions, reflections and recommendations. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2022. [PMID: 35965467 DOI: 10.6084/m9.figshare.c.6080807] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
We report on an ongoing collaboration between epidemiological modellers and visualization researchers by documenting and reflecting upon knowledge constructs-a series of ideas, approaches and methods taken from existing visualization research and practice-deployed and developed to support modelling of the COVID-19 pandemic. Structured independent commentary on these efforts is synthesized through iterative reflection to develop: evidence of the effectiveness and value of visualization in this context; open problems upon which the research communities may focus; guidance for future activity of this type and recommendations to safeguard the achievements and promote, advance, secure and prepare for future collaborations of this kind. In describing and comparing a series of related projects that were undertaken in unprecedented conditions, our hope is that this unique report, and its rich interactive supplementary materials, will guide the scientific community in embracing visualization in its observation, analysis and modelling of data as well as in disseminating findings. Equally we hope to encourage the visualization community to engage with impactful science in addressing its emerging data challenges. If we are successful, this showcase of activity may stimulate mutually beneficial engagement between communities with complementary expertise to address problems of significance in epidemiology and beyond. See https://ramp-vis.github.io/RAMPVIS-PhilTransA-Supplement/. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
Collapse
Affiliation(s)
| | | | | | | | | | - Min Chen
- University of Oxford, Oxford, UK
| | | | - Hui Fang
- Loughborough University, Loughborough, UK
| | | | | | | | - Claire Harris
- Biomathematics and Statistics Scotland, Edinburgh, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Qiru Wang
- University of Nottingham, Nottingham, UK
| | - Jo Wood
- City, University of London, London, UK
| | - Kai Xu
- Middlesex University, London, UK
| |
Collapse
|