1
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets. Brief Bioinform 2024; 25:bbae366. [PMID: 39082650 PMCID: PMC11289684 DOI: 10.1093/bib/bbae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/21/2024] [Accepted: 07/18/2024] [Indexed: 08/03/2024] Open
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
Collapse
Affiliation(s)
- Zeyu Lu
- Department of Statistics and Data Science, Moody School of Graduate and Advanced Studies, Southern Methodist University, 3225 Daniel Ave., P.O. Box 750332, Dallas, TX, United States
| | - Xue Xiao
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
| | - Qiang Zheng
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Xinlei Wang
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
- Department of Mathematics, University of Texas at Arlington, 411 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
- Department of Pediatrics, Division of Hematology/Oncology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, United States
| |
Collapse
|
2
|
Patiyal S, Tiwari P, Ghai M, Dhapola A, Dhall A, Raghava GPS. A hybrid approach for predicting transcription factors. FRONTIERS IN BIOINFORMATICS 2024; 4:1425419. [PMID: 39119181 PMCID: PMC11306938 DOI: 10.3389/fbinf.2024.1425419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 07/03/2024] [Indexed: 08/10/2024] Open
Abstract
Transcription factors are essential DNA-binding proteins that regulate the transcription rate of several genes and control the expression of genes inside a cell. The prediction of transcription factors with high precision is important for understanding biological processes such as cell differentiation, intracellular signaling, and cell-cycle control. In this study, we developed a hybrid method that combines alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested, and evaluated on a large dataset that contains 19,406 transcription factors and 523,560 non-transcription factor protein sequences. To avoid biases in evaluation, the datasets were divided into training and validation/independent datasets, where 80% of the data was used for training, and the remaining 20% was used for external validation. In the case of alignment-free methods, models were developed using machine learning techniques and the composition-based features of a protein. Our best alignment-free model obtained an AUC of 0.97 on an independent dataset. In the case of the alignment-based method, we used BLAST at different cut-offs to predict the transcription factors. Although the alignment-based method demonstrated excellent performance, it was unable to cover all transcription factors due to instances of no hits. To combine the strengths of both methods, we developed a hybrid method that combines alignment-free and alignment-based methods. In the hybrid method, we added the scores of the alignment-free and alignment-based methods and achieved a maximum AUC of 0.99 on the independent dataset. The method proposed in this study performs better than existing methods. We incorporated the best models in the webserver/Python Package Index/standalone package of "TransFacPred" (https://webs.iiitd.edu.in/raghava/transfacpred).
Collapse
Affiliation(s)
| | | | | | | | | | - Gajendra P. S. Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
3
|
McReynolds E, Elshahed MS, Youssef NH. An ecological-evolutionary perspective on the genomic diversity and habitat preferences of the Acidobacteriota. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.05.601421. [PMID: 39005473 PMCID: PMC11245096 DOI: 10.1101/2024.07.05.601421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Members of the phylum Acidobacteriota inhabit a wide range of ecosystems including soils. We analyzed the global patterns of distribution and habitat preferences of various Acidobacteriota lineages across major ecosystems (soil, engineered, host-associated, marine, non-marine saline and alkaline, and terrestrial non-soil ecosystem) in 248,559 publicly available metagenomic datasets. Classes Terriglobia, Vicinamibacteria, Blastocatellia, and Thermoanaerobaculia were highly ubiquitous and showed clear preference to soil over non-soil habitats, class Polarisedimenticolia showed comparable ubiquity and preference between soil and non-soil habitats, while classes Aminicenantia and Holophagae showed preferences to non-soil habitats. However, while specific preferences were observed, most Acidobacteriota lineages were habitat generalists rather than specialists, with genomic and/or metagenomic fragments recovered from soil and non-soil habitats at various levels of taxonomic resolution. Comparative analysis of 1930 genomes strongly indicates that phylogenetic affiliation plays a more important role than the habitat from which the genome was recovered in shaping the genomic characteristics and metabolic capacities of the Acidobacteriota. The observed lack of strong habitat specialization and habitat transition driven lineage evolution in the Acidobacteriota suggest ready cross colonization between soil and non-soil habitats. We posit that such capacity is key to the successful establishment of Acidobacteriota as a major component in soil microbiomes post ecosystem disturbance events or during pedogenesis.
Collapse
Affiliation(s)
- Ella McReynolds
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, OK, USA
| | - Mostafa S. Elshahed
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, OK, USA
| | - Noha H. Youssef
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, OK, USA
| |
Collapse
|
4
|
Joshi SHN, Jenkins C, Ulaeto D, Gorochowski TE. Accelerating Genetic Sensor Development, Scale-up, and Deployment Using Synthetic Biology. BIODESIGN RESEARCH 2024; 6:0037. [PMID: 38919711 PMCID: PMC11197468 DOI: 10.34133/bdr.0037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 04/23/2024] [Indexed: 06/27/2024] Open
Abstract
Living cells are exquisitely tuned to sense and respond to changes in their environment. Repurposing these systems to create engineered biosensors has seen growing interest in the field of synthetic biology and provides a foundation for many innovative applications spanning environmental monitoring to improved biobased production. In this review, we present a detailed overview of currently available biosensors and the methods that have supported their development, scale-up, and deployment. We focus on genetic sensors in living cells whose outputs affect gene expression. We find that emerging high-throughput experimental assays and evolutionary approaches combined with advanced bioinformatics and machine learning are establishing pipelines to produce genetic sensors for virtually any small molecule, protein, or nucleic acid. However, more complex sensing tasks based on classifying compositions of many stimuli and the reliable deployment of these systems into real-world settings remain challenges. We suggest that recent advances in our ability to precisely modify nonmodel organisms and the integration of proven control engineering principles (e.g., feedback) into the broader design of genetic sensing systems will be necessary to overcome these hurdles and realize the immense potential of the field.
Collapse
Affiliation(s)
| | - Christopher Jenkins
- CBR Division, Defence Science and Technology Laboratory, Porton Down, Wiltshire SP4 0JQ, UK
| | - David Ulaeto
- CBR Division, Defence Science and Technology Laboratory, Porton Down, Wiltshire SP4 0JQ, UK
| | - Thomas E. Gorochowski
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
- BrisEngBio,
School of Chemistry, University of Bristol, Bristol BS8 1TS, UK
| |
Collapse
|
5
|
Nuhamunada M, Mohite OS, Phaneuf P, Palsson B, Weber T. BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets. Nucleic Acids Res 2024; 52:5478-5495. [PMID: 38686794 PMCID: PMC11162802 DOI: 10.1093/nar/gkae314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/22/2024] [Accepted: 04/11/2024] [Indexed: 05/02/2024] Open
Abstract
Genome mining is revolutionizing natural products discovery efforts. The rapid increase in available genomes demands comprehensive computational platforms to effectively extract biosynthetic knowledge encoded across bacterial pangenomes. Here, we present BGCFlow, a novel systematic workflow integrating analytics for large-scale genome mining of bacterial pangenomes. BGCFlow incorporates several genome analytics and mining tools grouped into five common stages of analysis such as: (i) data selection, (ii) functional annotation, (iii) phylogenetic analysis, (iv) genome mining, and (v) comparative analysis. Furthermore, BGCFlow provides easy configuration of different projects, parallel distribution, scheduled job monitoring, an interactive database to visualize tables, exploratory Jupyter Notebooks, and customized reports. Here, we demonstrate the application of BGCFlow by investigating the phylogenetic distribution of various biosynthetic gene clusters detected across 42 genomes of the Saccharopolyspora genus, known to produce industrially important secondary/specialized metabolites. The BGCFlow-guided analysis predicted more accurate dereplication of BGCs and guided the targeted comparative analysis of selected RiPPs. The scalable, interoperable, adaptable, re-entrant, and reproducible nature of the BGCFlow will provide an effective novel way to extract the biosynthetic knowledge from the ever-growing genomic datasets of biotechnologically relevant bacterial species.
Collapse
Affiliation(s)
- Matin Nuhamunada
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| | - Omkar S Mohite
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| | - Patrick V Phaneuf
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| | - Bernhard O Palsson
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Tilmann Weber
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby 2800, Denmark
| |
Collapse
|
6
|
Chen L, Li C, Li B, Zhou X, Bai Y, Zou X, Zhou Z, He Q, Chen B, Wang M, Xue Y, Jiang Z, Feng J, Zhou T, Liu Z, Xu P. Evolutionary divergence of subgenomes in common carp provides insights into speciation and allopolyploid success. FUNDAMENTAL RESEARCH 2024; 4:589-602. [PMID: 38933191 PMCID: PMC11197550 DOI: 10.1016/j.fmre.2023.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 06/28/2024] Open
Abstract
Hybridization and polyploidization have made great contributions to speciation, heterosis, and agricultural production within plants, but there is still limited understanding and utilization in animals. Subgenome structure and expression reorganization and cooperation post hybridization and polyploidization are essential for speciation and allopolyploid success. However, the mechanisms have not yet been comprehensively assessed in animals. Here, we produced a high-fidelity reference genome sequence for common carp, a typical allotetraploid fish species cultured worldwide. This genome enabled in-depth analysis of the evolution of subgenome architecture and expression responses. Most genes were expressed with subgenome biases, with a trend of transition from the expression of subgenome A during the early stages to that of subgenome B during the late stages of embryonic development. While subgenome A evolved more rapidly, subgenome B contributed to a greater level of expression during development and under stressful conditions. Stable dominant patterns for homoeologous gene pairs both during development and under thermal stress suggest a potential fixed heterosis in the allotetraploid genome. Preferentially expressing either copy of a homoeologous gene at higher levels to confer development and response to stress indicates the dominant effect of heterosis. The plasticity of subgenomes and their shifting of dominant expression during early development, and in response to stressful conditions, provide novel insights into the molecular basis of the successful speciation, evolution, and heterosis of the allotetraploid common carp.
Collapse
Affiliation(s)
- Lin Chen
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Chengyu Li
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Bijun Li
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Xiaofan Zhou
- Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China
| | - Yulin Bai
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Xiaoqing Zou
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Zhixiong Zhou
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Qian He
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Baohua Chen
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Mei Wang
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Yaguo Xue
- College of Fisheries, Henan Normal University, Xinxiang 453007, China
| | - Zhou Jiang
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Jianxin Feng
- Henan Academy of Fishery Science, Zhengzhou 450044, China
| | - Tao Zhou
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| | - Zhanjiang Liu
- Department of Biology, College of Arts and Sciences, Syracuse University, Syracuse 13244, USA
| | - Peng Xu
- State Key Laboratory of Mariculture Breeding, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
7
|
Ko YJ, Lee ME, Cho BH, Kim M, Hyeon JE, Han JH, Han SO. Bioproduction of porphyrins, phycobilins, and their proteins using microbial cell factories: engineering, metabolic regulations, challenges, and perspectives. Crit Rev Biotechnol 2024; 44:373-387. [PMID: 36775664 DOI: 10.1080/07388551.2023.2168512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/21/2022] [Accepted: 01/03/2023] [Indexed: 02/14/2023]
Abstract
Porphyrins, phycobilins, and their proteins have abundant π-electrons and strongly absorb visible light, some of which bind a metal ion in the center. Because of the structural and optical properties, they not only play critical roles as an essential component in natural systems but also have attracted much attention as a high value specialty chemical in various fields, including renewable energy, cosmetics, medicines, and foods. However, their commercial application seems to be still limited because the market price of porphyrins and phycobilins is generally expensive to apply them easily. Furthermore, their petroleum-based chemical synthesis is energy-intensive and emits a pollutant. Recently, to replace petroleum-based production, many studies on the bioproduction of metalloporphyrins, including Zn-porphyrin, Co-porphyrin, and heme, porphyrin derivatives including chlorophyll, biliverdin, and phycobilins, and their proteins including hemoproteins, phycobiliproteins, and phytochromes from renewable carbon sources using microbial cell factories have been reported. This review outlines recent advances in the bioproduction of porphyrins, phycobilins, and their proteins using microbial cell factories developed by various microbial biotechnology techniques, provides well-organized information on metabolic regulations of the porphyrin metabolism, and then critically discusses challenges and future perspectives. Through these, it is expected to be able to achieve possible solutions and insights and to develop an outstanding platform to be applied to the industry in future research.
Collapse
Affiliation(s)
- Young Jin Ko
- Department of Biotechnology, Korea University, Seoul, Republic of Korea
- Institute of Life Science and Natural Resources, Korea University, Seoul, Korea
| | - Myeong-Eun Lee
- Department of Biotechnology, Korea University, Seoul, Republic of Korea
| | - Byeong-Hyeon Cho
- Department of Biotechnology, Korea University, Seoul, Republic of Korea
| | - Minhye Kim
- Department of Biotechnology, Korea University, Seoul, Republic of Korea
| | - Jeong Eun Hyeon
- Department of Next Generation Applied Sciences, The Graduate School of Sungshin University, Seoul, Korea
- Department of Food Science and Biotechnology, College of Knowledge-Based Services Engineering, Sungshin Women's University, Seoul, Korea
| | - Joo Hee Han
- Department of Next Generation Applied Sciences, The Graduate School of Sungshin University, Seoul, Korea
- Department of Food Science and Biotechnology, College of Knowledge-Based Services Engineering, Sungshin Women's University, Seoul, Korea
| | - Sung Ok Han
- Department of Biotechnology, Korea University, Seoul, Republic of Korea
| |
Collapse
|
8
|
Ledesma-Dominguez L, Carbajal-Degante E, Moreno-Hagelsieb G, Perez-Rueda E. DeepReg: a deep learning hybrid model for predicting transcription factors in eukaryotic and prokaryotic genomes. Sci Rep 2024; 14:9155. [PMID: 38644393 DOI: 10.1038/s41598-024-59487-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 04/11/2024] [Indexed: 04/23/2024] Open
Abstract
Deep learning models (DLMs) have gained importance in predicting, detecting, translating, and classifying a diversity of inputs. In bioinformatics, DLMs have been used to predict protein structures, transcription factor-binding sites, and promoters. In this work, we propose a hybrid model to identify transcription factors (TFs) among prokaryotic and eukaryotic protein sequences, named Deep Regulation (DeepReg) model. Two architectures were used in the DL model: a convolutional neural network (CNN), and a bidirectional long-short-term memory (BiLSTM). DeepReg reached a precision of 0.99, a recall of 0.97, and an F1-score of 0.98. The quality of our predictions, the bias-variance trade-off approach, and the characterization of new TF predictions were evaluated and compared against those produced by DeepTFactor, as well as against experimental data from three model organisms. Predictions based on our DLM tended to exhibit less variance and bias than those from DeepTFactor, thus increasing reliability and decreasing overfitting.
Collapse
Affiliation(s)
- Leonardo Ledesma-Dominguez
- Posgrado en Ciencia en Ingeniería de la Computación, Universidad Nacional Autónoma de México, 04510, Mexico City, Mexico.
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, UNAM, 04510, Mexico City, México.
| | - Erik Carbajal-Degante
- Coordinación de Universidad Abierta y Educación Digital (CUAED), Universidad Nacional Autónoma de México, 04510, Mexico City, México
| | | | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Unidad Académica del Estado de Yucatán, Universidad Nacional Autónoma de México, Mérida, Yucatán, México.
| |
Collapse
|
9
|
Pandey U, Behara SM, Sharma S, Patil RS, Nambiar S, Koner D, Bhukya H. DeePNAP: A Deep Learning Method to Predict Protein-Nucleic Acid Binding Affinity from Their Sequences. J Chem Inf Model 2024; 64:1806-1815. [PMID: 38458968 DOI: 10.1021/acs.jcim.3c01151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Predicting the protein-nucleic acid (PNA) binding affinity solely from their sequences is of paramount importance for the experimental design and analysis of PNA interactions (PNAIs). A large number of currently developed models for binding affinity prediction are limited to specific PNAIs while also relying on the sequence and structural information of the PNA complexes for both training and testing, and also as inputs. As the PNA complex structures available are scarce, this significantly limits the diversity and generalizability due to the small training data set. Additionally, a majority of the tools predict a single parameter, such as binding affinity or free energy changes upon mutations, rendering a model less versatile for usage. Hence, we propose DeePNAP, a machine learning-based model built from a vast and heterogeneous data set with 14,401 entries (from both eukaryotes and prokaryotes) from the ProNAB database, consisting of wild-type and mutant PNA complex binding parameters. Our model precisely predicts the binding affinity and free energy changes due to the mutation(s) of PNAIs exclusively from their sequences. While other similar tools extract features from both sequence and structure information, DeePNAP employs sequence-based features to yield high correlation coefficients between the predicted and experimental values with low root mean squared errors for PNA complexes in predicting KD and ΔΔG, implying the generalizability of DeePNAP. Additionally, we have also developed a web interface hosting DeePNAP that can serve as a powerful tool to rapidly predict binding affinities for a myriad of PNAIs with high precision toward developing a deeper understanding of their implications in various biological systems. Web interface: http://14.139.174.41:8080/.
Collapse
Affiliation(s)
- Uddeshya Pandey
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Sasi M Behara
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Siddhant Sharma
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Rachit S Patil
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Souparnika Nambiar
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Debasish Koner
- Department of Chemistry, Indian Institute of Technology Hyderabad, Kandi 502284, India
| | - Hussain Bhukya
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| |
Collapse
|
10
|
Martinez GS, Perez-Rueda E, Kumar A, Dutt M, Maya CR, Ledesma-Dominguez L, Casa PL, Kumar A, de Avila e Silva S, Kelvin DJ. CDBProm: the Comprehensive Directory of Bacterial Promoters. NAR Genom Bioinform 2024; 6:lqae018. [PMID: 38385146 PMCID: PMC10880602 DOI: 10.1093/nargab/lqae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/12/2024] [Accepted: 01/29/2024] [Indexed: 02/23/2024] Open
Abstract
The decreasing cost of whole genome sequencing has produced high volumes of genomic information that require annotation. The experimental identification of promoter sequences, pivotal for regulating gene expression, is a laborious and cost-prohibitive task. To expedite this, we introduce the Comprehensive Directory of Bacterial Promoters (CDBProm), a directory of in-silico predicted bacterial promoter sequences. We first identified that an Extreme Gradient Boosting (XGBoost) algorithm would distinguish promoters from random downstream regions with an accuracy of 87%. To capture distinctive promoter signals, we generated a second XGBoost classifier trained on the instances misclassified in our first classifier. The predictor of CDBProm is then fed with over 55 million upstream regions from more than 6000 bacterial genomes. Upon finding potential promoter sequences in upstream regions, each promoter is mapped to the genomic data of the organism, linking the predicted promoter with its coding DNA sequence, and identifying the function of the gene regulated by the promoter. The collection of bacterial promoters available in CDBProm enables the quantitative analysis of a plethora of bacterial promoters. Our collection with over 24 million promoters is publicly available at https://aw.iimas.unam.mx/cdbprom/.
Collapse
Affiliation(s)
- Gustavo Sganzerla Martinez
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autonóma de México, Unidad Académica del Estado de Yucatán, Mérida 97302, Yucatán, Mexico
| | - Anuj Kumar
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Mansi Dutt
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| | - Cinthia Rodríguez Maya
- Facultad de Ciencias e Ingeniería, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Leonardo Ledesma-Dominguez
- Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico
| | - Pedro Lenz Casa
- Biotechnology Institute, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul 95070-560, Brazil
| | - Aditya Kumar
- Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam 784028, India
| | - Scheila de Avila e Silva
- Biotechnology Institute, Universidade de Caxias do Sul, Caxias do Sul, Rio Grande do Sul 95070-560, Brazil
| | - David J Kelvin
- Microbiology and Immunology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
- Pediatrics, Izaak Walton Killam (IWK) Health Center. Canadian Center for Vaccinology (CCfV), Halifax, Nova Scotia B3H 4H7, Canada
- BioForge Canada Limited, Halifax, Nova Scotia B3N 3B9, Canada
| |
Collapse
|
11
|
Brungardt J, Alarcon Y, Shiller J, Young C, Monteros MJ, Randall JJ, Bock CH. Transcriptome profile of pecan scab resistant and susceptible trees from a pecan provenance collection. BMC Genomics 2024; 25:180. [PMID: 38355402 PMCID: PMC10868059 DOI: 10.1186/s12864-024-10010-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 01/12/2024] [Indexed: 02/16/2024] Open
Abstract
Pecan scab is a devastating disease that causes damage to pecan (Carya illinoinensis (Wangenh.) K. Koch) fruit and leaves. The disease is caused by the fungus Venturia effusa (G. Winter) and the main management practice for controlling the disease is by application of fungicides at 2-to-3-week intervals throughout the growing season. Besides disease-related yield loss, application of fungicides can result in considerable cost and increases the likelihood of fungicide resistance developing in the pathogen. Resistant cultivars are available for pecan growers; although, in several cases resistance has been overcome as the pathogen adapts to infect resistant hosts. Despite the importance of host resistance in scab management, there is little information regarding the molecular basis of genetic resistance to pecan scab.The purpose of this study was to elucidate mechanisms of natural pecan scab resistance by analyzing transcripts that are differentially expressed in pecan leaf samples from scab resistant and susceptible trees. The leaf samples were collected from trees in a provenance collection orchard that represents the natural range of pecan in the US and Mexico. Trees in the orchard have been exposed to natural scab infections since planting in 1989, and scab ratings were collected over three seasons. Based on this data, ten susceptible trees and ten resistant trees were selected for analysis. RNA-seq data was collected and analyzed for diseased and non-diseased parts of susceptible trees as well as for resistant trees. A total of 313 genes were found to be differentially expressed when comparing resistant and susceptible trees without disease. For susceptible samples showing scab symptoms, 1,454 genes were identified as differentially expressed compared to non-diseased susceptible samples. Many genes involved in pathogen recognition, defense responses, and signal transduction were up-regulated in diseased samples of susceptible trees, whereas differentially expressed genes in pecan scab resistant samples were generally down-regulated compared to non-diseased susceptible samples.Our results provide the first account of candidate genes involved in resistance/susceptibility to pecan scab under natural conditions in a pecan orchard. This information can be used to aid pecan breeding programs and development of biotechnology-based approaches for generating pecan cultivars with more durable scab resistance.
Collapse
Affiliation(s)
| | - Yanina Alarcon
- Noble Research Institute, Ardmore, OK, USA
- University of Texas Southwestern, Dallas, TX, USA
| | - Jason Shiller
- Noble Research Institute, Ardmore, OK, USA
- The New Zealand Institute for Plant and Food Research, Auckland, New Zealand
| | - Carolyn Young
- Noble Research Institute, Ardmore, OK, USA.
- Entomology and Plant Pathology, NC State University, Raleigh, NC, USA.
| | - Maria J Monteros
- Noble Research Institute, Ardmore, OK, USA
- Bayer Crop Science, Chesterfield, MO, USA
| | | | | |
Collapse
|
12
|
Zhang J, Li F, Liu D, Liu Q, Song H. Engineering extracellular electron transfer pathways of electroactive microorganisms by synthetic biology for energy and chemicals production. Chem Soc Rev 2024; 53:1375-1446. [PMID: 38117181 DOI: 10.1039/d3cs00537b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
The excessive consumption of fossil fuels causes massive emission of CO2, leading to climate deterioration and environmental pollution. The development of substitutes and sustainable energy sources to replace fossil fuels has become a worldwide priority. Bio-electrochemical systems (BESs), employing redox reactions of electroactive microorganisms (EAMs) on electrodes to achieve a meritorious combination of biocatalysis and electrocatalysis, provide a green and sustainable alternative approach for bioremediation, CO2 fixation, and energy and chemicals production. EAMs, including exoelectrogens and electrotrophs, perform extracellular electron transfer (EET) (i.e., outward and inward EET), respectively, to exchange energy with the environment, whose rate determines the efficiency and performance of BESs. Therefore, we review the synthetic biology strategies developed in the last decade for engineering EAMs to enhance the EET rate in cell-electrode interfaces for facilitating the production of electricity energy and value-added chemicals, which include (1) progress in genetic manipulation and editing tools to achieve the efficient regulation of gene expression, knockout, and knockdown of EAMs; (2) synthetic biological engineering strategies to enhance the outward EET of exoelectrogens to anodes for electricity power production and anodic electro-fermentation (AEF) for chemicals production, including (i) broadening and strengthening substrate utilization, (ii) increasing the intracellular releasable reducing equivalents, (iii) optimizing c-type cytochrome (c-Cyts) expression and maturation, (iv) enhancing conductive nanowire biosynthesis and modification, (v) promoting electron shuttle biosynthesis, secretion, and immobilization, (vi) engineering global regulators to promote EET rate, (vii) facilitating biofilm formation, and (viii) constructing cell-material hybrids; (3) the mechanisms of inward EET, CO2 fixation pathway, and engineering strategies for improving the inward EET of electrotrophic cells for CO2 reduction and chemical production, including (i) programming metabolic pathways of electrotrophs, (ii) rewiring bioelectrical circuits for enhancing inward EET, and (iii) constructing microbial (photo)electrosynthesis by cell-material hybridization; (4) perspectives on future challenges and opportunities for engineering EET to develop highly efficient BESs for sustainable energy and chemical production. We expect that this review will provide a theoretical basis for the future development of BESs in energy harvesting, CO2 fixation, and chemical synthesis.
Collapse
Affiliation(s)
- Junqi Zhang
- Frontier Science Center for Synthetic Biology (Ministry of Education), Key Laboratory of Systems Bioengineering, and School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Feng Li
- Frontier Science Center for Synthetic Biology (Ministry of Education), Key Laboratory of Systems Bioengineering, and School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Dingyuan Liu
- Frontier Science Center for Synthetic Biology (Ministry of Education), Key Laboratory of Systems Bioengineering, and School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Qijing Liu
- Frontier Science Center for Synthetic Biology (Ministry of Education), Key Laboratory of Systems Bioengineering, and School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| | - Hao Song
- Frontier Science Center for Synthetic Biology (Ministry of Education), Key Laboratory of Systems Bioengineering, and School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
13
|
Zhang J, Basu S, Kurgan L. HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins. Nucleic Acids Res 2024; 52:e10. [PMID: 38048333 PMCID: PMC10810184 DOI: 10.1093/nar/gkad1131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, PR China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
14
|
Feldmeyer B, Bornberg-Bauer E, Dohmen E, Fouks B, Heckenhauer J, Huylmans AK, Jones ARC, Stolle E, Harrison MC. Comparative Evolutionary Genomics in Insects. Methods Mol Biol 2024; 2802:473-514. [PMID: 38819569 DOI: 10.1007/978-1-0716-3838-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome sequencing quality, in terms of both read length and accuracy, is constantly improving. By combining long-read sequencing technologies with various scaffolding techniques, chromosome-level genome assemblies are now achievable at an affordable price for non-model organisms. Insects represent an exciting taxon for studying the genomic underpinnings of evolutionary innovations, due to ancient origins, immense species-richness, and broad phenotypic diversity. Here we summarize some of the most important methods for carrying out a comparative genomics study on insects. We describe available tools and offer concrete tips on all stages of such an endeavor from DNA extraction through genome sequencing, annotation, and several evolutionary analyses. Along the way we describe important insect-specific aspects, such as DNA extraction difficulties or gene families that are particularly difficult to annotate, and offer solutions. We describe results from several examples of comparative genomics analyses on insects to illustrate the fascinating questions that can now be addressed in this new age of genomics research.
Collapse
Affiliation(s)
- Barbara Feldmeyer
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Molecular Ecology, Frankfurt, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Bertrand Fouks
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Ann Kathrin Huylmans
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Alun R C Jones
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Eckart Stolle
- Museum Koenig, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Bonn, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
15
|
Al-Tohamy A, Grove A. Targeting bacterial transcription factors for infection control: opportunities and challenges. Transcription 2023:1-28. [PMID: 38126125 DOI: 10.1080/21541264.2023.2293523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 12/07/2023] [Indexed: 12/23/2023] Open
Abstract
The rising threat of antibiotic resistance in pathogenic bacteria emphasizes the need for new therapeutic strategies. This review focuses on bacterial transcription factors (TFs), which play crucial roles in bacterial pathogenesis. We discuss the regulatory roles of these factors through examples, and we outline potential therapeutic strategies targeting bacterial TFs. Specifically, we discuss the use of small molecules to interfere with TF function and the development of transcription factor decoys, oligonucleotides that compete with promoters for TF binding. We also cover peptides that target the interaction between the bacterial TF and other factors, such as RNA polymerase, and the targeting of sigma factors. These strategies, while promising, come with challenges, from identifying targets to designing interventions, managing side effects, and accounting for changing bacterial resistance patterns. We also delve into how Artificial Intelligence contributes to these efforts and how it may be exploited in the future, and we touch on the roles of multidisciplinary collaboration and policy to advance this research domain.Abbreviations: AI, artificial intelligence; CNN, convolutional neural networks; DTI: drug-target interaction; HTH, helix-turn-helix; IHF, integration host factor; LTTRs, LysR-type transcriptional regulators; MarR, multiple antibiotic resistance regulator; MRSA, methicillin resistant Staphylococcus aureus; MSA: multiple sequence alignment; NAP, nucleoid-associated protein; PROTACs, proteolysis targeting chimeras; RNAP, RNA polymerase; TF, transcription factor; TFD, transcription factor decoying; TFTRs, TetR-family transcriptional regulators; wHTH, winged helix-turn-helix.
Collapse
Affiliation(s)
- Ahmed Al-Tohamy
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Department of Cell Biology, Biotechnology Research Institute, National Research Centre, Cairo, Egypt
| | - Anne Grove
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| |
Collapse
|
16
|
Jin P, Zhu B, Jia Y, Zhang Y, Wang W, Shen Y, Zhong Y, Zheng Y, Wang Y, Tong Y, Zhang W, Li S. Single-cell transcriptomics reveals the brain evolution of web-building spiders. Nat Ecol Evol 2023; 7:2125-2142. [PMID: 37919396 PMCID: PMC10697844 DOI: 10.1038/s41559-023-02238-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 09/29/2023] [Indexed: 11/04/2023]
Abstract
Spiders are renowned for their efficient capture of flying insects using intricate aerial webs. How the spider nervous systems evolved to cope with this specialized hunting strategy and various environmental clues in an aerial space remains unknown. Here we report a brain-cell atlas of >30,000 single-cell transcriptomes from a web-building spider (Hylyphantes graminicola). Our analysis revealed the preservation of ancestral neuron types in spiders, including the potential coexistence of noradrenergic and octopaminergic neurons, and many peptidergic neuronal types that are lost in insects. By comparing the genome of two newly sequenced plesiomorphic burrowing spiders with three aerial web-building spiders, we found that the positively selected genes in the ancestral branch of web-building spiders were preferentially expressed (42%) in the brain, especially in the three mushroom body-like neuronal types. By gene enrichment analysis and RNAi experiments, these genes were suggested to be involved in the learning and memory pathway and may influence the spiders' web-building and hunting behaviour. Our results provide key sources for understanding the evolution of behaviour in spiders and reveal how molecular evolution drives neuron innovation and the diversification of associated complex behaviours.
Collapse
Affiliation(s)
- Pengyu Jin
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Bingyue Zhu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yinjun Jia
- School of Life Sciences, IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Yiming Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Wang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Guangxi Normal University, Guilin, China
| | - Yunxiao Shen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yu Zhong
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yami Zheng
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yang Wang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yan Tong
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wei Zhang
- School of Life Sciences, IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, China
- Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Shuqiang Li
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
17
|
Kim GB, Kim JY, Lee JA, Norsigian CJ, Palsson BO, Lee SY. Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nat Commun 2023; 14:7370. [PMID: 37963869 PMCID: PMC10645960 DOI: 10.1038/s41467-023-43216-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 11/03/2023] [Indexed: 11/16/2023] Open
Abstract
Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.
Collapse
Affiliation(s)
- Gi Bae Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Ji Yeon Kim
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Jong An Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea
| | - Charles J Norsigian
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark
| | - Sang Yup Lee
- Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
- Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST, Daejeon, 34141, Republic of Korea.
- KAIST Institute for the BioCentury and KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.
- BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
18
|
Lamoureux CR, Decker KT, Sastry AV, Rychel K, Gao Y, McConn J, Zielinski D, Palsson BO. A multi-scale expression and regulation knowledge base for Escherichia coli. Nucleic Acids Res 2023; 51:10176-10193. [PMID: 37713610 PMCID: PMC10602906 DOI: 10.1093/nar/gkad750] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/02/2023] [Accepted: 09/05/2023] [Indexed: 09/17/2023] Open
Abstract
Transcriptomic data is accumulating rapidly; thus, scalable methods for extracting knowledge from this data are critical. Here, we assembled a top-down expression and regulation knowledge base for Escherichia coli. The expression component is a 1035-sample, high-quality RNA-seq compendium consisting of data generated in our lab using a single experimental protocol. The compendium contains diverse growth conditions, including: 9 media; 39 supplements, including antibiotics; 42 heterologous proteins; and 76 gene knockouts. Using this resource, we elucidated global expression patterns. We used machine learning to extract 201 modules that account for 86% of known regulatory interactions, creating the regulatory component. With these modules, we identified two novel regulons and quantified systems-level regulatory responses. We also integrated 1675 curated, publicly-available transcriptomes into the resource. We demonstrated workflows for analyzing new data against this knowledge base via deconstruction of regulation during aerobic transition. This resource illuminates the E. coli transcriptome at scale and provides a blueprint for top-down transcriptomic analysis of non-model organisms.
Collapse
Affiliation(s)
- Cameron R Lamoureux
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Katherine T Decker
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Anand V Sastry
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Kevin Rychel
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ye Gao
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - John Luke McConn
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Daniel C Zielinski
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet, Building 220, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
19
|
Glasscock CJ, Pecoraro R, McHugh R, Doyle LA, Chen W, Boivin O, Lonnquist B, Na E, Politanska Y, Haddox HK, Cox D, Norn C, Coventry B, Goreshnik I, Vafeados D, Lee GR, Gordan R, Stoddard BL, DiMaio F, Baker D. Computational design of sequence-specific DNA-binding proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.20.558720. [PMID: 37790440 PMCID: PMC10542524 DOI: 10.1101/2023.09.20.558720] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Sequence-specific DNA-binding proteins (DBPs) play critical roles in biology and biotechnology, and there has been considerable interest in the engineering of DBPs with new or altered specificities for genome editing and other applications. While there has been some success in reprogramming naturally occurring DBPs using selection methods, the computational design of new DBPs that recognize arbitrary target sites remains an outstanding challenge. We describe a computational method for the design of small DBPs that recognize specific target sequences through interactions with bases in the major groove, and employ this method in conjunction with experimental screening to generate binders for 5 distinct DNA targets. These binders exhibit specificity closely matching the computational models for the target DNA sequences at as many as 6 base positions and affinities as low as 30-100 nM. The crystal structure of a designed DBP-target site complex is in close agreement with the design model, highlighting the accuracy of the design method. The designed DBPs function in both Escherichia coli and mammalian cells to repress and activate transcription of neighboring genes. Our method is a substantial step towards a general route to small and hence readily deliverable sequence-specific DBPs for gene regulation and editing.
Collapse
Affiliation(s)
- Cameron J. Glasscock
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Robert Pecoraro
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Physics, University of Washington, Seattle, WA, USA
| | - Ryan McHugh
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lindsey A. Doyle
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Wei Chen
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Olivier Boivin
- Program in Genetics and Genomic, Duke University, Durham, NC, USA
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
| | - Beau Lonnquist
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Emily Na
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Yuliya Politanska
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Hugh K. Haddox
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - David Cox
- Department of Biochemistry, Stanford University School of Medicine, Palo Alto, CA USA
- Department of Medicine, Division of Hematology, Stanford University, Stanford, CA, USA
| | - Christoffer Norn
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| | - Brian Coventry
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Inna Goreshnik
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Dionne Vafeados
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA USA
| | - Raluca Gordan
- Center for Advanced Genomic Technologies, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Department of Computer Science, Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, USA
| | - Barry L. Stoddard
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| |
Collapse
|
20
|
Chen RJ, Wang JJ, Williamson DFK, Chen TY, Lipkova J, Lu MY, Sahai S, Mahmood F. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat Biomed Eng 2023; 7:719-742. [PMID: 37380750 PMCID: PMC10632090 DOI: 10.1038/s41551-023-01056-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2023] [Indexed: 06/30/2023]
Abstract
In healthcare, the development and deployment of insufficiently fair systems of artificial intelligence (AI) can undermine the delivery of equitable care. Assessments of AI models stratified across subpopulations have revealed inequalities in how patients are diagnosed, treated and billed. In this Perspective, we outline fairness in machine learning through the lens of healthcare, and discuss how algorithmic biases (in data acquisition, genetic variation and intra-observer labelling variability, in particular) arise in clinical workflows and the resulting healthcare disparities. We also review emerging technology for mitigating biases via disentanglement, federated learning and model explainability, and their role in the development of AI-based software as a medical device.
Collapse
Affiliation(s)
- Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Judy J Wang
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Boston University School of Medicine, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sharifa Sahai
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
21
|
Barbero-Aparicio JA, Olivares-Gil A, Díez-Pastor JF, García-Osorio C. Deep learning and support vector machines for transcription start site identification. PeerJ Comput Sci 2023; 9:e1340. [PMID: 37346545 PMCID: PMC10280436 DOI: 10.7717/peerj-cs.1340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 03/21/2023] [Indexed: 06/23/2023]
Abstract
Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.
Collapse
Affiliation(s)
| | - Alicia Olivares-Gil
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - José F. Díez-Pastor
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| | - César García-Osorio
- Departamento de Ingeniería Informática, Universidad de Burgos, Burgos, Spain
| |
Collapse
|
22
|
Cho C, Lee D, Jeong D, Kim S, Kim MK, Srinivasan S. Characterization of radiation-resistance mechanism in Spirosoma montaniterrae DY10 T in terms of transcriptional regulatory system. Sci Rep 2023; 13:4739. [PMID: 36959250 PMCID: PMC10036542 DOI: 10.1038/s41598-023-31509-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 03/13/2023] [Indexed: 03/25/2023] Open
Abstract
To respond to the external environmental changes for survival, bacteria regulates expression of a number of genes including transcription factors (TFs). To characterize complex biological phenomena, a biological system-level approach is necessary. Here we utilized six computational biology methods to infer regulatory network and to characterize underlying biologically mechanisms relevant to radiation-resistance. In particular, we inferred gene regulatory network (GRN) and operons of radiation-resistance bacterium Spirosoma montaniterrae DY10[Formula: see text] and identified the major regulators for radiation-resistance. Our results showed that DNA repair and reactive oxygen species (ROS) scavenging mechanisms are key processes and Crp/Fnr family transcriptional regulator works as a master regulatory TF in early response to radiation.
Collapse
Affiliation(s)
- Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dohoon Lee
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Republic of Korea
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Myung Kyum Kim
- Department of Bio & Environmental Technology, College of Natural Science, Seoul Women's University, Seoul, 01797, Republic of Korea.
| | - Sathiyaraj Srinivasan
- Department of Bio & Environmental Technology, College of Natural Science, Seoul Women's University, Seoul, 01797, Republic of Korea.
| |
Collapse
|
23
|
Genomic Features Predict Bacterial Life History Strategies in Soil, as Identified by Metagenomic Stable Isotope Probing. mBio 2023; 14:e0358422. [PMID: 36877031 PMCID: PMC10128055 DOI: 10.1128/mbio.03584-22] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023] Open
Abstract
Bacteria catalyze the formation and destruction of soil organic matter, but the bacterial dynamics in soil that govern carbon (C) cycling are not well understood. Life history strategies explain the complex dynamics of bacterial populations and activities based on trade-offs in energy allocation to growth, resource acquisition, and survival. Such trade-offs influence the fate of soil C, but their genomic basis remains poorly characterized. We used multisubstrate metagenomic DNA stable isotope probing to link genomic features of bacteria to their C acquisition and growth dynamics. We identify several genomic features associated with patterns of bacterial C acquisition and growth, notably genomic investment in resource acquisition and regulatory flexibility. Moreover, we identify genomic trade-offs defined by numbers of transcription factors, membrane transporters, and secreted products, which match predictions from life history theory. We further show that genomic investment in resource acquisition and regulatory flexibility can predict bacterial ecological strategies in soil. IMPORTANCE Soil microbes are major players in the global carbon cycle, yet we still have little understanding of how the carbon cycle operates in soil communities. A major limitation is that carbon metabolism lacks discrete functional genes that define carbon transformations. Instead, carbon transformations are governed by anabolic processes associated with growth, resource acquisition, and survival. We use metagenomic stable isotope probing to link genome information to microbial growth and carbon assimilation dynamics as they occur in soil. From these data, we identify genomic traits that can predict bacterial ecological strategies which define bacterial interactions with soil carbon.
Collapse
|
24
|
Flores-Díaz A, Escoto-Sandoval C, Cervantes-Hernández F, Ordaz-Ortiz JJ, Hayano-Kanashiro C, Reyes-Valdés H, Garcés-Claver A, Ochoa-Alejo N, Martínez O. Gene Functional Networks from Time Expression Profiles: A Constructive Approach Demonstrated in Chili Pepper ( Capsicum annuum L.). PLANTS (BASEL, SWITZERLAND) 2023; 12:1148. [PMID: 36904008 PMCID: PMC10005043 DOI: 10.3390/plants12051148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 02/20/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Gene co-expression networks are powerful tools to understand functional interactions between genes. However, large co-expression networks are difficult to interpret and do not guarantee that the relations found will be true for different genotypes. Statistically verified time expression profiles give information about significant changes in expressions through time, and genes with highly correlated time expression profiles, which are annotated in the same biological process, are likely to be functionally connected. A method to obtain robust networks of functionally related genes will be useful to understand the complexity of the transcriptome, leading to biologically relevant insights. We present an algorithm to construct gene functional networks for genes annotated in a given biological process or other aspects of interest. We assume that there are genome-wide time expression profiles for a set of representative genotypes of the species of interest. The method is based on the correlation of time expression profiles, bound by a set of thresholds that assure both, a given false discovery rate, and the discard of correlation outliers. The novelty of the method consists in that a gene expression relation must be repeatedly found in a given set of independent genotypes to be considered valid. This automatically discards relations particular to specific genotypes, assuring a network robustness, which can be set a priori. Additionally, we present an algorithm to find transcription factors candidates for regulating hub genes within a network. The algorithms are demonstrated with data from a large experiment studying gene expression during the development of the fruit in a diverse set of chili pepper genotypes. The algorithm is implemented and demonstrated in a new version of the publicly available R package "Salsa" (version 1.0).
Collapse
Affiliation(s)
- Alan Flores-Díaz
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato 36824, Mexico
| | - Christian Escoto-Sandoval
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato 36824, Mexico
| | - Felipe Cervantes-Hernández
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato 36824, Mexico
| | - José J. Ordaz-Ortiz
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato 36824, Mexico
| | - Corina Hayano-Kanashiro
- Departamento de Investigaciones Científicas y Tecnológicas de la Universidad de Sonora, Hermosillo 83000, Mexico
| | - Humberto Reyes-Valdés
- Department of Plant Breeding, Universidad Autónoma Agraria Antonio Narro, Saltillo 25315, Mexico
| | - Ana Garcés-Claver
- Unidad de Hortofruticultura, Centro de Investigación y Tecnología Agroalimentaria de Aragón, Instituto Agroalimentario de Aragón-IA2 (CITA-Universidad de Zaragoza), 50059 Zaragoza, Spain
| | - Neftalí Ochoa-Alejo
- Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato 36824, Mexico
| | - Octavio Martínez
- Unidad de Genómica Avanzada (Langebio), Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Irapuato 36824, Mexico
| |
Collapse
|
25
|
Du Z, Huang T, Uversky VN, Li J. Predicting TF Proteins by Incorporating Evolution Information Through PSSM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1319-1326. [PMID: 35981062 DOI: 10.1109/tcbb.2022.3199758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Transcription factors (TFs) are DNA binding proteins involved in the regulation of gene expression. They exist in all organisms and activate or repress transcription by binding to specific DNA sequences. Traditionally, TFs have been identified by experimental methods that are time-consuming and costly. In recent years, various computational methods have been developed to identify TF to overcome these limitations. However, there is a room for further improvement in the predictive performance of these tools in terms of accuracy. We report here a novel computational tool, TFnet, that provides accurate and comprehensive TF predictions from protein sequences. The accuracy of these predictions is substantially better than the results of the existing TF predictors and methods. Especially, it outperforms comparable methods significantly when sequence similarity to other known sequences in the database drops below 40%. Ablation tests reveal that the high predictive performance stems from innovative ways used in TFnet to derive sequence Position-Specific Scoring Matrix (PSSM) and encode inputs.
Collapse
|
26
|
Tellechea-Luzardo J, Martín Lázaro H, Moreno López R, Carbonell P. Sensbio: an online server for biosensor design. BMC Bioinformatics 2023; 24:71. [PMID: 36855083 PMCID: PMC9972687 DOI: 10.1186/s12859-023-05201-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/22/2023] [Indexed: 03/02/2023] Open
Abstract
Allosteric transcription factor (aTF) based biosensors can be used to engineer genetic circuits for a wide range of applications. The literature and online databases contain hundreds of experimentally validated molecule-TF pairs; however, the knowledge is scattered and often incomplete. Additionally, compared to the number of compounds that can be produced in living systems, those with known associated TF-compound interactions are low. For these reasons, new tools that help researchers find new possible TF-ligand pairs are called for. In this work, we present Sensbio, a computational tool that through similarity comparison against a TF-ligand reference database, is able to identify putative transcription factors that can be activated by a given input molecule. In addition to the collection of algorithms, an online application has also been developed, together with a predictive model created to find new possible matches based on machine learning.
Collapse
Affiliation(s)
- Jonathan Tellechea-Luzardo
- grid.157927.f0000 0004 1770 5832Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), 46022 Valencia, Spain
| | - Hèctor Martín Lázaro
- grid.157927.f0000 0004 1770 5832Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), 46022 Valencia, Spain
| | - Raúl Moreno López
- grid.157927.f0000 0004 1770 5832Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), 46022 Valencia, Spain
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), 46022, Valencia, Spain. .,Institute for Integrative Systems Biology I2SysBio, Universitat de València-CSIC, 46980, Paterna, Spain.
| |
Collapse
|
27
|
Tellechea-Luzardo J, Stiebritz MT, Carbonell P. Transcription factor-based biosensors for screening and dynamic regulation. Front Bioeng Biotechnol 2023; 11:1118702. [PMID: 36814719 PMCID: PMC9939652 DOI: 10.3389/fbioe.2023.1118702] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 01/26/2023] [Indexed: 02/09/2023] Open
Abstract
Advances in synthetic biology and genetic engineering are bringing into the spotlight a wide range of bio-based applications that demand better sensing and control of biological behaviours. Transcription factor (TF)-based biosensors are promising tools that can be used to detect several types of chemical compounds and elicit a response according to the desired application. However, the wider use of this type of device is still hindered by several challenges, which can be addressed by increasing the current metabolite-activated transcription factor knowledge base, developing better methods to identify new transcription factors, and improving the overall workflow for the design of novel biosensor circuits. These improvements are particularly important in the bioproduction field, where researchers need better biosensor-based approaches for screening production-strains and precise dynamic regulation strategies. In this work, we summarize what is currently known about transcription factor-based biosensors, discuss recent experimental and computational approaches targeted at their modification and improvement, and suggest possible future research directions based on two applications: bioproduction screening and dynamic regulation of genetic circuits.
Collapse
Affiliation(s)
- Jonathan Tellechea-Luzardo
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
| | - Martin T. Stiebritz
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain
| | - Pablo Carbonell
- Institute of Industrial Control Systems and Computing (AI2), Universitat Politècnica de València (UPV), Valencia, Spain,Institute for Integrative Systems Biology I2SysBio, Universitat de València-CSIC, Paterna, Spain,*Correspondence: Pablo Carbonell,
| |
Collapse
|
28
|
Sieow BFL, De Sotto R, Seet ZRD, Hwang IY, Chang MW. Synthetic Biology Meets Machine Learning. Methods Mol Biol 2023; 2553:21-39. [PMID: 36227537 DOI: 10.1007/978-1-0716-2617-7_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This chapter outlines the myriad applications of machine learning (ML) in synthetic biology, specifically in engineering cell and protein activity, and metabolic pathways. Though by no means comprehensive, the chapter highlights several prominent computational tools applied in the field and their potential use cases. The examples detailed reinforce how ML algorithms can enhance synthetic biology research by providing data-driven insights into the behavior of living systems, even without detailed knowledge of their underlying mechanisms. By doing so, ML promises to increase the efficiency of research projects by modeling hypotheses in silico that can then be tested through experiments. While challenges related to training dataset generation and computational costs remain, ongoing improvements in ML tools are paving the way for smarter and more streamlined synthetic biology workflows that can be readily employed to address grand challenges across manufacturing, medicine, engineering, agriculture, and beyond.
Collapse
Affiliation(s)
- Brendan Fu-Long Sieow
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- NUS Graduate School for Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore
| | - Ryan De Sotto
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhi Ren Darren Seet
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - In Young Hwang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Matthew Wook Chang
- NUS Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore, Singapore, Singapore.
- Synthetic Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
29
|
Volk MJ, Tran VG, Tan SI, Mishra S, Fatma Z, Boob A, Li H, Xue P, Martin TA, Zhao H. Metabolic Engineering: Methodologies and Applications. Chem Rev 2022; 123:5521-5570. [PMID: 36584306 DOI: 10.1021/acs.chemrev.2c00403] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Metabolic engineering aims to improve the production of economically valuable molecules through the genetic manipulation of microbial metabolism. While the discipline is a little over 30 years old, advancements in metabolic engineering have given way to industrial-level molecule production benefitting multiple industries such as chemical, agriculture, food, pharmaceutical, and energy industries. This review describes the design, build, test, and learn steps necessary for leading a successful metabolic engineering campaign. Moreover, we highlight major applications of metabolic engineering, including synthesizing chemicals and fuels, broadening substrate utilization, and improving host robustness with a focus on specific case studies. Finally, we conclude with a discussion on perspectives and future challenges related to metabolic engineering.
Collapse
Affiliation(s)
- Michael J Volk
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Vinh G Tran
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Shih-I Tan
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Shekhar Mishra
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Zia Fatma
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Aashutosh Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hongxiang Li
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Pu Xue
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Teresa A Martin
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,DOE Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.,Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
30
|
Singh D, Roy J. A large-scale benchmark study of tools for the classification of protein-coding and non-coding RNAs. Nucleic Acids Res 2022; 50:12094-12111. [PMID: 36420898 PMCID: PMC9757047 DOI: 10.1093/nar/gkac1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/22/2022] [Accepted: 10/28/2022] [Indexed: 11/27/2022] Open
Abstract
Identification of protein-coding and non-coding transcripts is paramount for understanding their biological roles. Computational approaches have been addressing this task for over a decade; however, generalized and high-performance models are still unreliable. This benchmark study assessed the performance of 24 tools producing >55 models on the datasets covering a wide range of species. We have collected 135 small and large transcriptomic datasets from existing studies for comparison and identified the potential bottlenecks hampering the performance of current tools. The key insights of this study include lack of standardized training sets, reliance on homogeneous training data, gradual changes in annotated data, lack of augmentation with homology searches, the presence of false positives and negatives in datasets and the lower performance of end-to-end deep learning models. We also derived a new dataset, RNAChallenge, from the benchmark considering hard instances that may include potential false alarms. The best and least well performing models under- and overfit the dataset, respectively, thereby serving a dual purpose. For computational approaches, it will be valuable to develop accurate and unbiased models. The identification of false alarms will be of interest for genome annotators, and experimental study of hard RNAs will help to untangle the complexity of the RNA world.
Collapse
Affiliation(s)
- Dalwinder Singh
- To whom correspondence should be addressed. Tel: +91 172 5221206;
| | - Joy Roy
- Correspondence may also be addressed to Joy Roy.
| |
Collapse
|
31
|
Qin R, Mahal LK, Bojar D. Deep learning explains the biology of branched glycans from single-cell sequencing data. iScience 2022; 25:105163. [PMID: 36217547 PMCID: PMC9547197 DOI: 10.1016/j.isci.2022.105163] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 09/06/2022] [Accepted: 09/16/2022] [Indexed: 11/03/2022] Open
Abstract
Glycosylation is ubiquitous and often dysregulated in disease. However, the regulation and functional significance of various types of glycosylation at cellular levels is hard to unravel experimentally. Multi-omics, single-cell measurements such as SUGAR-seq, which quantifies transcriptomes and cell surface glycans, facilitate addressing this issue. Using SUGAR-seq data, we pioneered a deep learning model to predict the glycan phenotypes of cells (mouse T lymphocytes) from transcripts, with the example of predicting β1,6GlcNAc-branching across T cell subtypes (test set F1 score: 0.9351). Model interpretation via SHAP (SHapley Additive exPlanations) identified highly predictive genes, in part known to impact (i) branched glycan levels and (ii) the biology of branched glycans. These genes included physiologically relevant low-abundance genes that were not captured by conventional differential expression analysis. Our work shows that interpretable deep learning models are promising for uncovering novel functions and regulatory mechanisms of glycans from integrated transcriptomic and glycomic datasets.
Collapse
Affiliation(s)
- Rui Qin
- Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2, Canada
| | - Lara K. Mahal
- Department of Chemistry, University of Alberta, Edmonton, AB T6G 2G2, Canada
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, 405 30 Gothenburg, Sweden
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 405 30 Gothenburg, Sweden
| |
Collapse
|
32
|
Dai Z, Zhang Z, Zhu L, Zhu Z, Jiang L. Complete Genome Sequencing Analysis of Deinococcus wulumuqiensis R12, an Extremely Radiation-Resistant Strain. Curr Microbiol 2022; 79:292. [PMID: 35972568 DOI: 10.1007/s00284-022-02984-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 07/20/2022] [Indexed: 11/03/2022]
Abstract
Genome sequencing was performed by the PacBio RS II platform and Illumina HiSeq 4000 platform to discover the metabolic profile of the Deinococcus wulumuqiensis R12, which was isolated from radiation-contaminated soils in Xinjiang Uygur Autonomous Region of northwest China. The genome of 3.5 Mbp comprises one circular chromosome and four circular plasmids with 3679 genes and a GC content of 66.97%. A total of 41 new transcriptional factors were identified using the DeepTFactor tool. Genomic analysis revealed the presence of genes for homologous recombination repair, which suggested high recombination efficiency in R12. Three Type I and one Type II RM systems, two CRISPR arrays, and one Cas-Type IC protein were found, allowing the development of endogenous CRISPR-Cas gene-editing tools. Additionally, we found that R12 has a broad spectrum of substrate utilization, which was validated by physiological experiments. Genes involved in the carotenoid biosynthesis pathway and the antioxidative system were also identified. Overall, the comprehensive description of the genome of R12 will facilitate the additional exploitation of this strain as a versatile cell factory for biotechnological applications.
Collapse
Affiliation(s)
- Zijie Dai
- State Key Laboratory of Materials-Oriented Chemical Engineering, College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Zhidong Zhang
- Xinjiang Key Laboratory of Special Environmental Microbiology, Institute of Applied Microbiology, Xinjiang Academy of Agricultural Sciences, Urumqi, 830091, Xinjiang, China
| | - Liying Zhu
- College of Chemical and Molecular Engineering, Nanjing Tech University, Nanjing, 211816, China
| | - Zhengming Zhu
- State Key Laboratory of Materials-Oriented Chemical Engineering, College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China. .,College of Food Science and Light Industry, Nanjing Tech University, Nanjing, 211816, China.
| | - Ling Jiang
- State Key Laboratory of Materials-Oriented Chemical Engineering, College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211816, China. .,College of Food Science and Light Industry, Nanjing Tech University, Nanjing, 211816, China.
| |
Collapse
|
33
|
Liu Q, Wang F, Shuai Y, Huang L, Zhang X. Integrated Analysis of Single-Molecule Real-Time Sequencing and Next-Generation Sequencing Eveals Insights into Drought Tolerance Mechanism of Lolium multiflorum. Int J Mol Sci 2022; 23:ijms23147921. [PMID: 35887272 PMCID: PMC9320196 DOI: 10.3390/ijms23147921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/13/2022] [Accepted: 07/14/2022] [Indexed: 02/01/2023] Open
Abstract
Lolium multiflorum is widely planted in temperate and subtropical regions globally, and it has high economic value owing to its use as forage grass for a wide variety of livestock and poultry. However, drought seriously restricts its yield and quality. At present, owing to the lack of available genomic resources, many types of basic research cannot be conducted, which severely limits the in-depth functional analysis of genes in L. multiflorum. Therefore, we used single-molecule real-time (SMRT) and next-generation sequencing (NGS) to sequence the complex transcriptome of L. multiflorum under drought. We identified 41,141 DEGs in leaves, 35,559 DEGs in roots, respectively. Moreover, we identified 1243 alternative splicing events under drought. LmPIP5K9 produced two different transcripts with opposite expression patterns, possibly through the phospholipid signaling pathway or the negatively regulated sugar-mediated root growth response to drought stress, respectively. Additionally, 13,079 transcription factors in 90 families were obtained. An in-depth analysis of R2R3-MYB gene family members was performed to preliminarily demonstrate their functions by utilizing subcellular localization and overexpression in yeast. Our data make a significant contribution to the genetics of L. multiflorum, offering a current understanding of plant adaptation to drought stress.
Collapse
|
34
|
Fu X, Bates PA. Application of deep learning methods: From molecular modelling to patient classification. Exp Cell Res 2022; 418:113278. [PMID: 35810775 DOI: 10.1016/j.yexcr.2022.113278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/16/2022] [Accepted: 07/05/2022] [Indexed: 11/28/2022]
Abstract
We are now well into the information driven age with complex, heterogeneous, datasets in the biological sciences continuing to grow at a rapid pace. Moreover, distilling of such datasets, to find new governing principles, are underway. Leading the surge are new and exciting algorithmic developments in computer simulation and machine learning, most notably for the latter, those centred on deep learning. However, practical applications of cell centric computations within the biological sciences, even when carefully benchmarked against existing experimental datasets, remain challenging. Here we discuss the application of deep learning methodologies to support our understanding of cell functionality and as an aid to patient classification. Whilst comprehensive end-to-end deep learning approaches that utilise knowledge of the cell and its molecular components to aid human disease classification are yet to be implemented, important for opening the door to more effective molecular and cell-based therapies, we illustrate that many deep learning applications have been developed to tackle components of such an ambitious pipeline. We end our discussion on what the future may hold, especially how an integrated framework of computer simulations and deep learning, in conjunction with wet-bench experimentation, could enable to reveal the governing principles underlying cell functionalities within the tissue environments cells operate.
Collapse
Affiliation(s)
- Xiao Fu
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Rd, London, NW1 1AT, UK.
| |
Collapse
|
35
|
Wang L, Zhang J, Wang D, Song C. Membrane contact probability: An essential and predictive character for the structural and functional studies of membrane proteins. PLoS Comput Biol 2022; 18:e1009972. [PMID: 35353812 PMCID: PMC9000120 DOI: 10.1371/journal.pcbi.1009972] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/11/2022] [Accepted: 02/25/2022] [Indexed: 11/20/2022] Open
Abstract
One of the unique traits of membrane proteins is that a significant fraction of their hydrophobic amino acids is exposed to the hydrophobic core of lipid bilayers rather than being embedded in the protein interior, which is often not explicitly considered in the protein structure and function predictions. Here, we propose a characteristic and predictive quantity, the membrane contact probability (MCP), to describe the likelihood of the amino acids of a given sequence being in direct contact with the acyl chains of lipid molecules. We show that MCP is complementary to solvent accessibility in characterizing the outer surface of membrane proteins, and it can be predicted for any given sequence with a machine learning-based method by utilizing a training dataset extracted from MemProtMD, a database generated from molecular dynamics simulations for the membrane proteins with a known structure. As the first of many potential applications, we demonstrate that MCP can be used to systematically improve the prediction precision of the protein contact maps and structures. The distribution of residues on protein surfaces is largely determined by the surrounding environment. For soluble proteins, most of the residues on the outer surface are hydrophilic, and people use the quantity “solvent accessibility” to describe and predict these surface residues. In contrast, for membrane proteins that are embedded in a lipid bilayer, many of their surface residues are hydrophobic and membrane-contacting, but there is yet a widely-accepted quantity for the description or prediction of this characteristic property. Here, we propose a new quantity termed “membrane contact probability (MCP)”, which can be used to describe and predict the membrane-contacting surface residues of proteins. We also propose a machine learning-based method to predict MCP from protein sequences, utilizing the dataset generated by physics-based computer simulations. We demonstrate that a quantity such as MCP is helpful for protein structure prediction, and we believe that it will find broad applications in the structure and function studies of membrane proteins.
Collapse
Affiliation(s)
- Lei Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary studies, Peking University, Beijing, China
| | - Jiangguo Zhang
- School of Life Sciences, Peking University, Beijing, China
| | - Dali Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Chen Song
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- * E-mail:
| |
Collapse
|
36
|
Barrows JK, Van Dyke MW. Biolayer interferometry for DNA-protein interactions. PLoS One 2022; 17:e0263322. [PMID: 35108320 PMCID: PMC8809612 DOI: 10.1371/journal.pone.0263322] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 01/14/2022] [Indexed: 11/18/2022] Open
Abstract
Biolayer interferometry (BLI) is a widely utilized technique for determining macromolecular interaction dynamics in real time. Using changes in the interference pattern of white light reflected off a biosensor tip, BLI can determine binding parameters for protein-protein (e.g., antibody-substrate kinetics) or protein-small molecule (e.g., drug discovery) interactions. However, a less-appreciated application for BLI analysis is DNA-protein interactions. DNA-binding proteins play an immense role in cellular biology, controlling critical processes including transcription, DNA replication, and DNA repair. Understanding how proteins interact with DNA often provides important insight into their biological function, and novel technologies to assay DNA-protein interactions are of broad interest. Currently, a detailed protocol utilizing BLI for DNA-protein interactions is lacking. In the following protocol, we describe the use of BLI and biotinylated-DNA probes to determine the binding kinetics of a transcription factor to a specific DNA sequence. The experimental steps include the generation of biotinylated-DNA probes, the execution of the BLI experiment, and data analysis by scientific graphing and statistical software (e.g., GraphPad Prism). Although the example experiment used throughout this protocol involves a prokaryotic transcription factor, this technique can be easily translated to any DNA-binding protein. Pitfalls and potential solutions for investigating DNA-binding proteins by BLI are also presented.
Collapse
Affiliation(s)
- John K. Barrows
- Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, GA, United States of America
- * E-mail: (JKB); (MWVD)
| | - Michael W. Van Dyke
- Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, GA, United States of America
- * E-mail: (JKB); (MWVD)
| |
Collapse
|
37
|
Oliveira Monteiro LM, Saraiva JP, Brizola Toscan R, Stadler PF, Silva-Rocha R, Nunes da Rocha U. PredicTF: prediction of bacterial transcription factors in complex microbial communities using deep learning. ENVIRONMENTAL MICROBIOME 2022; 17:7. [PMID: 35135629 PMCID: PMC8822659 DOI: 10.1186/s40793-021-00394-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 12/03/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND Transcription factors (TFs) are proteins controlling the flow of genetic information by regulating cellular gene expression. A better understanding of TFs in a bacterial community context may open novel revenues for exploring gene regulation in ecosystems where bacteria play a key role. Here we describe PredicTF, a platform supporting the prediction and classification of novel bacterial TF in single species and complex microbial communities. PredicTF is based on a deep learning algorithm. RESULTS To train PredicTF, we created a TF database (BacTFDB) by manually curating a total of 11,961 TF distributed in 99 TF families. Five model organisms were used to test the performance and the accuracy of PredicTF. PredicTF was able to identify 24-62% of the known TFs with an average precision of 88% in our five model organisms. We demonstrated PredicTF using pure cultures and a complex microbial community. In these demonstrations, we used (meta)genomes for TF prediction and (meta)transcriptomes for determining the expression of putative TFs. CONCLUSION PredicTF demonstrated high accuracy in predicting transcription factors in model organisms. We prepared the pipeline to be easily implemented in studies profiling TFs using (meta)genomes and (meta)transcriptomes. PredicTF is an open-source software available at https://github.com/mdsufz/PredicTF .
Collapse
Affiliation(s)
- Lummy Maria Oliveira Monteiro
- Helmholtz Center for Environmental Research (UFZ), Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
- Ribeirão Preto Medical School (FMRP), University of São Paulo (USP), Ribeirão Prêto, Brazil
| | | | | | - Peter F. Stadler
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Rafael Silva-Rocha
- Ribeirão Preto Medical School (FMRP), University of São Paulo (USP), Ribeirão Prêto, Brazil
| | | |
Collapse
|
38
|
Ledesma L, Hernandez-Guerrero R, Perez-Rueda E. Prediction of DNA-Binding Transcription Factors in Bacteria and Archaea Genomes. Methods Mol Biol 2022; 2516:103-112. [PMID: 35922624 DOI: 10.1007/978-1-0716-2413-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
DNA-binding transcription factors (TFs) play a central role in the gene expression of all organisms, from viruses to humans, including bacteria and archaea. The role of these proteins is the fate of gene expression in the context of environmental challenges. Because thousands of genomes have been sequenced to date, predictions of the encoded proteins are validated through the use of bioinformatics tools to obtain the necessary experimental, posterior knowledge. In this chapter, we describe three approaches to identify TFs in protein sequences. The first approach integrates the results of sequence comparisons and PFAM assignments, using as reference a manually curated collection of TFs. The second approach considers the prediction of DNA-binding structures, such as the classical helix-turn-helix (HTH); and the third approach considers a deep learning model. We suggest that all approaches must be considered together to increase the possibility of identifying new TFs in bacterial and archaeal genomes.
Collapse
Affiliation(s)
- Leonardo Ledesma
- Posgrado en Ciencia e Ingeniería de la Computación, Universidad Nacional Autónoma de México, Ciudad de México, Mexico
| | - Rafael Hernandez-Guerrero
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, Mexico
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, Mexico.
| |
Collapse
|
39
|
Wan X, Saltepe B, Yu L, Wang B. Programming living sensors for environment, health and biomanufacturing. Microb Biotechnol 2021; 14:2334-2342. [PMID: 33960658 PMCID: PMC8601174 DOI: 10.1111/1751-7915.13820] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/05/2021] [Accepted: 04/11/2021] [Indexed: 01/10/2023] Open
Abstract
Synthetic biology offers new tools and capabilities of engineering cells with desired functions for example as new biosensing platforms leveraging engineered microbes. In the last two decades, bacterial cells have been programmed to sense and respond to various input cues for versatile purposes including environmental monitoring, disease diagnosis and adaptive biomanufacturing. Despite demonstrated proof-of-concept success in the laboratory, the real-world applications of microbial sensors have been restricted due to certain technical and societal limitations. Yet, most limitations can be addressed by new technological developments in synthetic biology such as circuit design, biocontainment and machine learning. Here, we summarize the latest advances in synthetic biology and discuss how they could accelerate the development, enhance the performance and address the present limitations of microbial sensors to facilitate their use in the field. We view that programmable living sensors are promising sensing platforms to achieve sustainable, affordable and easy-to-use on-site detection in diverse settings.
Collapse
Affiliation(s)
- Xinyi Wan
- Centre for Synthetic and Systems BiologySchool of Biological SciencesUniversity of EdinburghEdinburghEH9 3FFUK
- Hangzhou Innovation CenterZhejiang UniversityHangzhou311200China
| | - Behide Saltepe
- Centre for Synthetic and Systems BiologySchool of Biological SciencesUniversity of EdinburghEdinburghEH9 3FFUK
| | - Luyang Yu
- The Provincial International Science and Technology Cooperation Base for Engineering BiologyInternational CampusZhejiang UniversityHaining314400China
- College of Life SciencesZhejiang UniversityHangzhou310058China
| | - Baojun Wang
- Centre for Synthetic and Systems BiologySchool of Biological SciencesUniversity of EdinburghEdinburghEH9 3FFUK
- Hangzhou Innovation CenterZhejiang UniversityHangzhou311200China
- The Provincial International Science and Technology Cooperation Base for Engineering BiologyInternational CampusZhejiang UniversityHaining314400China
- College of Life SciencesZhejiang UniversityHangzhou310058China
| |
Collapse
|
40
|
Gao Y, Lim HG, Verkler H, Szubin R, Quach D, Rodionova I, Chen K, Yurkovich JT, Cho BK, Palsson BO. Unraveling the functions of uncharacterized transcription factors in Escherichia coli using ChIP-exo. Nucleic Acids Res 2021; 49:9696-9710. [PMID: 34428301 PMCID: PMC8464067 DOI: 10.1093/nar/gkab735] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 08/08/2021] [Accepted: 08/11/2021] [Indexed: 02/07/2023] Open
Abstract
Bacteria regulate gene expression to adapt to changing environments through transcriptional regulatory networks (TRNs). Although extensively studied, no TRN is fully characterized since the identity and activity of all the transcriptional regulators comprising a TRN are not known. Here, we experimentally evaluate 40 uncharacterized proteins in Escherichia coli K-12 MG1655, which were computationally predicted to be transcription factors (TFs). First, we used a multiplexed chromatin immunoprecipitation method combined with lambda exonuclease digestion (multiplexed ChIP-exo) assay to characterize binding sites for these candidate TFs; 34 of them were found to be DNA-binding proteins. We then compared the relative location between binding sites and RNA polymerase (RNAP). We found 48% (283/588) overlap between the TFs and RNAP. Finally, we used these data to infer potential functions for 10 of the 34 TFs with validated DNA binding sites and consensus binding motifs. Taken together, this study: (i) significantly expands the number of confirmed TFs to 276, close to the estimated total of about 280 TFs; (ii) provides putative functions for the newly discovered TFs and (iii) confirms the functions of four representative TFs through mutant phenotypes.
Collapse
Affiliation(s)
- Ye Gao
- Department of Biological Sciences, University of California San Diego, La Jolla, CA 92093, USA.,Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Hyun Gyu Lim
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Hans Verkler
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Richard Szubin
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel Quach
- Department of Biological Sciences, University of California San Diego, La Jolla, CA 92093, USA.,Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Irina Rodionova
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Ke Chen
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - James T Yurkovich
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Byung-Kwan Cho
- Department of Biological Sciences and KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.,Department of Pediatrics, University of California San Diego, La Jolla, CA 92093, USA.,Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA.,Novo Nordisk Foundation Center for Biosustainability, 2800, Kongens Lyngby, Denmark
| |
Collapse
|
41
|
Escherichia coli as a platform microbial host for systems metabolic engineering. Essays Biochem 2021; 65:225-246. [PMID: 33956149 DOI: 10.1042/ebc20200172] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 04/12/2021] [Accepted: 04/14/2021] [Indexed: 12/19/2022]
Abstract
Bio-based production of industrially important chemicals and materials from non-edible and renewable biomass has become increasingly important to resolve the urgent worldwide issues including climate change. Also, bio-based production, instead of chemical synthesis, of food ingredients and natural products has gained ever increasing interest for health benefits. Systems metabolic engineering allows more efficient development of microbial cell factories capable of sustainable, green, and human-friendly production of diverse chemicals and materials. Escherichia coli is unarguably the most widely employed host strain for the bio-based production of chemicals and materials. In the present paper, we review the tools and strategies employed for systems metabolic engineering of E. coli. Next, representative examples and strategies for the production of chemicals including biofuels, bulk and specialty chemicals, and natural products are discussed, followed by discussion on materials including polyhydroxyalkanoates (PHAs), proteins, and nanomaterials. Lastly, future perspectives and challenges remaining for systems metabolic engineering of E. coli are discussed.
Collapse
|
42
|
Affiliation(s)
- Aashutosh Boob
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
43
|
Transforming traditional nutrition paradigms with synthetic biology driven microbial production platforms. CURRENT RESEARCH IN BIOTECHNOLOGY 2021. [DOI: 10.1016/j.crbiot.2021.07.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
|