1
|
Lamoureux CR, Phaneuf PV, Palsson B, Zielinski D. Escherichia coli non-coding regulatory regions are highly conserved. NAR Genom Bioinform 2024; 6:lqae041. [PMID: 38774514 PMCID: PMC11106028 DOI: 10.1093/nargab/lqae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/05/2024] [Accepted: 05/15/2024] [Indexed: 05/24/2024] Open
Abstract
Microbial genome sequences are rapidly accumulating, enabling large-scale studies of sequence variation. Existing studies primarily focus on coding regions to study amino acid substitution patterns in proteins. However, non-coding regulatory regions also play a distinct role in determining physiologic responses. To investigate intergenic sequence variation on a large-scale, we identified non-coding regulatory region alleles across 2350 Escherichia coli strains. This 'alleleome' consists of 117 781 unique alleles for 1169 reference regulatory regions (transcribing 1975 genes) at single base-pair resolution. We find that 64% of nucleotide positions are invariant, and variant positions vary in a median of just 0.6% of strains. Additionally, non-coding alleles are sufficient to recover E. coli phylogroups. We find that core promoter elements and transcription factor binding sites are significantly conserved, especially those located upstream of essential or highly-expressed genes. However, variability in conservation of transcription factor binding sites is significant both within and across regulons. Finally, we contrast mutations acquired during adaptive laboratory evolution with wild-type variation, finding that the former preferentially alter positions that the latter conserves. Overall, this analysis elucidates the wealth of information found in E. coli non-coding sequence variation and expands pangenomic studies to non-coding regulatory regions at single-nucleotide resolution.
Collapse
Affiliation(s)
- Cameron R Lamoureux
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Patrick V Phaneuf
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kgs. Lyngby, Denmark
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kgs. Lyngby, Denmark
| | - Daniel C Zielinski
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
2
|
Josephs-Spaulding J, Rajput A, Hefner Y, Szubin R, Balasubramanian A, Li G, Zielinski DC, Jahn L, Sommer M, Phaneuf P, Palsson BO. Reconstructing the transcriptional regulatory network of probiotic L. reuteri is enabled by transcriptomics and machine learning. mSystems 2024; 9:e0125723. [PMID: 38349131 PMCID: PMC10949432 DOI: 10.1128/msystems.01257-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 01/09/2024] [Indexed: 03/20/2024] Open
Abstract
Limosilactobacillus reuteri, a probiotic microbe instrumental to human health and sustainable food production, adapts to diverse environmental shifts via dynamic gene expression. We applied the independent component analysis (ICA) to 117 RNA-seq data sets to decode its transcriptional regulatory network (TRN), identifying 35 distinct signals that modulate specific gene sets. Our findings indicate that the ICA provides a qualitative advancement and captures nuanced relationships within gene clusters that other methods may miss. This study uncovers the fundamental properties of L. reuteri's TRN and deepens our understanding of its arginine metabolism and the co-regulation of riboflavin metabolism and fatty acid conversion. It also sheds light on conditions that regulate genes within a specific biosynthetic gene cluster and allows for the speculation of the potential role of isoprenoid biosynthesis in L. reuteri's adaptive response to environmental changes. By integrating transcriptomics and machine learning, we provide a system-level understanding of L. reuteri's response mechanism to environmental fluctuations, thus setting the stage for modeling the probiotic transcriptome for applications in microbial food production. IMPORTANCE We have studied Limosilactobacillus reuteri, a beneficial probiotic microbe that plays a significant role in our health and production of sustainable foods, a type of foods that are nutritionally dense and healthier and have low-carbon emissions compared to traditional foods. Similar to how humans adapt their lifestyles to different environments, this microbe adjusts its behavior by modulating the expression of genes. We applied machine learning to analyze large-scale data sets on how these genes behave across diverse conditions. From this, we identified 35 unique patterns demonstrating how L. reuteri adjusts its genes based on 50 unique environmental conditions (such as various sugars, salts, microbial cocultures, human milk, and fruit juice). This research helps us understand better how L. reuteri functions, especially in processes like breaking down certain nutrients and adapting to stressful changes. More importantly, with our findings, we become closer to using this knowledge to improve how we produce more sustainable and healthier foods with the help of microbes.
Collapse
Affiliation(s)
- Jonathan Josephs-Spaulding
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Akanksha Rajput
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Ying Hefner
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Richard Szubin
- Department of Bioengineering, University of California, San Diego, California, USA
| | | | - Gaoyuan Li
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Daniel C. Zielinski
- Department of Bioengineering, University of California, San Diego, California, USA
| | - Leonie Jahn
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Morten Sommer
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Patrick Phaneuf
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Bernhard O. Palsson
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
- Department of Bioengineering, University of California, San Diego, California, USA
| |
Collapse
|
3
|
Qiu S, Wan X, Liang Y, Lamoureux CR, Akbari A, Palsson BO, Zielinski DC. Inferred regulons are consistent with regulator binding sequences in E. coli. PLoS Comput Biol 2024; 20:e1011824. [PMID: 38252668 PMCID: PMC10833566 DOI: 10.1371/journal.pcbi.1011824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 02/01/2024] [Accepted: 01/12/2024] [Indexed: 01/24/2024] Open
Abstract
The transcriptional regulatory network (TRN) of E. coli consists of thousands of interactions between regulators and DNA sequences. Regulons are typically determined either from resource-intensive experimental measurement of functional binding sites, or inferred from analysis of high-throughput gene expression datasets. Recently, independent component analysis (ICA) of RNA-seq compendia has shown to be a powerful method for inferring bacterial regulons. However, it remains unclear to what extent regulons predicted by ICA structure have a biochemical basis in promoter sequences. Here, we address this question by developing machine learning models that predict inferred regulon structures in E. coli based on promoter sequence features. Models were constructed successfully (cross-validation AUROC > = 0.8) for 85% (40/47) of ICA-inferred E. coli regulons. We found that: 1) The presence of a high scoring regulator motif in the promoter region was sufficient to specify regulatory activity in 40% (19/47) of the regulons, 2) Additional features, such as DNA shape and extended motifs that can account for regulator multimeric binding, helped to specify regulon structure for the remaining 60% of regulons (28/47); 3) investigating regulons where initial machine learning models failed revealed new regulator-specific sequence features that improved model accuracy. Finally, we found that strong regulatory binding sequences underlie both the genes shared between ICA-inferred and experimental regulons as well as genes in the E. coli core pan-regulon of Fur. This work demonstrates that the structure of ICA-inferred regulons largely can be understood through the strength of regulator binding sites in promoter regions, reinforcing the utility of top-down inference for regulon discovery.
Collapse
Affiliation(s)
- Sizhe Qiu
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
| | - Xinlong Wan
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
| | - Yueshan Liang
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
| | - Cameron R. Lamoureux
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
| | - Amir Akbari
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| | - Daniel C. Zielinski
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States of America
| |
Collapse
|
4
|
Singh V, Pandey S, Bhardwaj A. From the reference human genome to human pangenome: Premise, promise and challenge. Front Genet 2022; 13:1042550. [PMID: 36437921 PMCID: PMC9684177 DOI: 10.3389/fgene.2022.1042550] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 10/21/2022] [Indexed: 11/11/2022] Open
Abstract
The Reference Human Genome remains the single most important resource for mapping genetic variations and assessing their impact. However, it is monophasic, incomplete and not representative of the variation that exists in the population. Given the extent of ethno-geographic diversity and the consequent diversity in clinical manifestations of these variations, population specific references were developed overtime. The dramatically plummeting cost of sequencing whole genomes and the advent of third generation long range sequencers allowing accurate, error free, telomere-to-telomere assemblies of human genomes present us with a unique and unprecedented opportunity to develop a more composite standard reference consisting of a collection of multiple genomes that capture the maximal variation existing in the population, with the deepest annotation possible, enabling a realistic, reliable and actionable estimation of clinical significance of specific variations. The Human Pangenome Project thus is a logical next step promising a more accurate and global representation of genomic variations. The pangenome effort must be reciprocally complemented with precise variant discovery tools and exhaustive annotation to ensure unambiguous clinical assessment of the variant in ethno-geographical context. Here we discuss a broad roadmap, the challenges and way forward in developing a universal pangenome reference including data visualization techniques and integration of prior knowledge base in the new graph based architecture and tools to submit, compare, query, annotate and retrieve relevant information from the pangenomes. The biggest challenge, however, will be the ethical, legal and social implications and the training of human resource to the new reference paradigm.
Collapse
Affiliation(s)
- Vipin Singh
- University Institute of Biotechnology, Chandigarh University, Mohali, India
| | - Shweta Pandey
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Anshu Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
- *Correspondence: Anshu Bhardwaj,
| |
Collapse
|
5
|
Venter JC, Glass JI, Hutchison CA, Vashee S. Synthetic chromosomes, genomes, viruses, and cells. Cell 2022; 185:2708-2724. [PMID: 35868275 PMCID: PMC9347161 DOI: 10.1016/j.cell.2022.06.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 10/17/2022]
Abstract
Synthetic genomics is the construction of viruses, bacteria, and eukaryotic cells with synthetic genomes. It involves two basic processes: synthesis of complete genomes or chromosomes and booting up of those synthetic nucleic acids to make viruses or living cells. The first synthetic genomics efforts resulted in the construction of viruses. This led to a revolution in viral reverse genetics and improvements in vaccine design and manufacture. The first bacterium with a synthetic genome led to construction of a minimal bacterial cell and recoded Escherichia coli strains able to incorporate multiple non-standard amino acids in proteins and resistant to phage infection. Further advances led to a yeast strain with a synthetic genome and new approaches for animal and plant artificial chromosomes. On the horizon there are dramatic advances in DNA synthesis that will enable extraordinary new opportunities in medicine, industry, agriculture, and research.
Collapse
Affiliation(s)
- J Craig Venter
- The J. Craig Venter Institute, La Jolla, CA, and Rockville, MD, USA.
| | - John I Glass
- The J. Craig Venter Institute, La Jolla, CA, and Rockville, MD, USA
| | | | - Sanjay Vashee
- The J. Craig Venter Institute, La Jolla, CA, and Rockville, MD, USA
| |
Collapse
|
6
|
Decker KT, Gao Y, Rychel K, Al Bulushi T, Chauhan S, Kim D, Cho BK, Palsson B. proChIPdb: a chromatin immunoprecipitation database for prokaryotic organisms. Nucleic Acids Res 2022; 50:D1077-D1084. [PMID: 34791440 PMCID: PMC8728212 DOI: 10.1093/nar/gkab1043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 10/05/2021] [Accepted: 10/14/2021] [Indexed: 12/03/2022] Open
Abstract
The transcriptional regulatory network in prokaryotes controls global gene expression mostly through transcription factors (TFs), which are DNA-binding proteins. Chromatin immunoprecipitation (ChIP) with DNA sequencing methods can identify TF binding sites across the genome, providing a bottom-up, mechanistic understanding of how gene expression is regulated. ChIP provides indispensable evidence toward the goal of acquiring a comprehensive understanding of cellular adaptation and regulation, including condition-specificity. ChIP-derived data's importance and labor-intensiveness motivate its broad dissemination and reuse, which is currently an unmet need in the prokaryotic domain. To fill this gap, we present proChIPdb (prochipdb.org), an information-rich, interactive web database. This website collects public ChIP-seq/-exo data across several prokaryotes and presents them in dashboards that include curated binding sites, nucleotide-resolution genome viewers, and summary plots such as motif enrichment sequence logos. Users can search for TFs of interest or their target genes, download all data, dashboards, and visuals, and follow external links to understand regulons through biological databases and the literature. This initial release of proChIPdb covers diverse organisms, including most major TFs of Escherichia coli, and can be expanded to support regulon discovery across the prokaryotic domain.
Collapse
Affiliation(s)
- Katherine T Decker
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Ye Gao
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Kevin Rychel
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Tahani Al Bulushi
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Siddharth M Chauhan
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
| | - Byung-Kwan Cho
- Department of Biological Sciences and KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon34141, Republic of Korea
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
7
|
Phaneuf PV, Zielinski DC, Yurkovich JT, Johnsen J, Szubin R, Yang L, Kim SH, Schulz S, Wu M, Dalldorf C, Ozdemir E, Lennen RM, Palsson BO, Feist AM. Escherichia coli Data-Driven Strain Design Using Aggregated Adaptive Laboratory Evolution Mutational Data. ACS Synth Biol 2021; 10:3379-3395. [PMID: 34762392 PMCID: PMC8870144 DOI: 10.1021/acssynbio.1c00337] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
![]()
Microbes are being
engineered for an increasingly large and diverse
set of applications. However, the designing of microbial genomes remains
challenging due to the general complexity of biological systems. Adaptive
Laboratory Evolution (ALE) leverages nature’s problem-solving
processes to generate optimized genotypes currently inaccessible to
rational methods. The large amount of public ALE data now represents
a new opportunity for data-driven strain design. This study describes
how novel strain designs, or genome sequences not yet observed in
ALE experiments or published designs, can be extracted from aggregated
ALE data and demonstrates this by designing, building, and testing
three novel Escherichia coli strains with fitnesses
comparable to ALE mutants. These designs were achieved through a meta-analysis
of aggregated ALE mutations data (63 Escherichia coli K-12 MG1655 based ALE experiments, described by 93 unique environmental
conditions, 357 independent evolutions, and 13 957 observed
mutations), which additionally revealed global ALE mutation trends
that inform on ALE-derived strain design principles. Such informative
trends anticipate ALE-derived strain designs as largely gene-centric,
as opposed to noncoding, and composed of a relatively small number
of beneficial variants (approximately 6). These results demonstrate
how strain design efforts can be enhanced by the meta-analysis of
aggregated ALE data.
Collapse
Affiliation(s)
- Patrick V. Phaneuf
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California 92093, United States
| | - Daniel C. Zielinski
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - James T. Yurkovich
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
| | - Josefin Johnsen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Richard Szubin
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
| | - Lei Yang
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Se Hyeuk Kim
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Sebastian Schulz
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Muyao Wu
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
| | - Christopher Dalldorf
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
| | - Emre Ozdemir
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Rebecca M. Lennen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Bernhard O. Palsson
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California 92093, United States
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
- Department of Pediatrics, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Adam M. Feist
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, United States
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
8
|
Zielinski DC, Patel A, Palsson BO. The Expanding Computational Toolbox for Engineering Microbial Phenotypes at the Genome Scale. Microorganisms 2020; 8:E2050. [PMID: 33371386 PMCID: PMC7767376 DOI: 10.3390/microorganisms8122050] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/07/2020] [Accepted: 12/16/2020] [Indexed: 02/06/2023] Open
Abstract
Microbial strains are being engineered for an increasingly diverse array of applications, from chemical production to human health. While traditional engineering disciplines are driven by predictive design tools, these tools have been difficult to build for biological design due to the complexity of biological systems and many unknowns of their quantitative behavior. However, due to many recent advances, the gap between design in biology and other engineering fields is closing. In this work, we discuss promising areas of development of computational tools for engineering microbial strains. We define five frontiers of active research: (1) Constraint-based modeling and metabolic network reconstruction, (2) Kinetics and thermodynamic modeling, (3) Protein structure analysis, (4) Genome sequence analysis, and (5) Regulatory network analysis. Experimental and machine learning drivers have enabled these methods to improve by leaps and bounds in both scope and accuracy. Modern strain design projects will require these tools to be comprehensively applied to the entire cell and efficiently integrated within a single workflow. We expect that these frontiers, enabled by the ongoing revolution of big data science, will drive forward more advanced and powerful strain engineering strategies.
Collapse
Affiliation(s)
- Daniel Craig Zielinski
- Department of Bioengineering, University of California, San Diego, San Diego, CA 92093, USA; (D.C.Z.); (A.P.)
| | - Arjun Patel
- Department of Bioengineering, University of California, San Diego, San Diego, CA 92093, USA; (D.C.Z.); (A.P.)
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California, San Diego, San Diego, CA 92093, USA; (D.C.Z.); (A.P.)
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Lyngby, Denmark
| |
Collapse
|