1
|
Krutz NL, Kimber I, Winget J, Nguyen MN, Limviphuvadh V, Maurer-Stroh S, Mahony C, Gerberick GF. Identification and semi-quantification of protein allergens in complex mixtures using proteomic and AllerCatPro 2.0 bioinformatic analyses: a proof-of-concept investigation. J Immunotoxicol 2024; 21:2305452. [PMID: 38291955 DOI: 10.1080/1547691x.2024.2305452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/09/2024] [Indexed: 02/01/2024] Open
Abstract
The demand for botanicals and natural substances in consumer products has increased in recent years. These substances usually contain proteins and these, in turn, can pose a risk for immunoglobulin E (IgE)-mediated sensitization and allergy. However, no method has yet been accepted or validated for assessment of potential allergenic hazards in such materials. In the studies here, a dual proteomic-bioinformatic approach is proposed to evaluate holistically allergenic hazards in complex mixtures of plants, insects, or animal proteins. Twelve commercial preparations of source materials (plant products, dust mite extract, and preparations of animal dander) known to contain allergenic proteins were analyzed by label-free proteomic analyses to identify and semi-quantify proteins. These were then evaluated by bioinformatics using AllerCatPro 2.0 (https://allercatpro.bii.a-star.edu.sg/) to predict no, weak, or strong evidence for allergenicity and similarity to source-specific allergens. In total, 4,586 protein sequences were identified in the 12 source materials combined. Of these, 1,665 sequences were predicted with weak or strong evidence for allergenic potential. This first-tier approach provided top-level information about the occurrence and abundance of proteins and potential allergens. With regards to source-specific allergens, 129 allergens were identified. The sum of the relative abundance of these allergens ranged from 0.8% (lamb's quarters) to 63% (olive pollen). It is proposed here that this dual proteomic-bioinformatic approach has the potential to provide detailed information on the presence and relative abundance of allergens, and can play an important role in identifying potential allergenic hazards in complex protein mixtures for the purposes of safety assessments.
Collapse
Affiliation(s)
- Nora L Krutz
- NV Procter & Gamble Services Company SA, Global Product Stewardship, Strombeek-Bever, Belgium
| | - Ian Kimber
- Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | | | - Minh N Nguyen
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute, Singapore, Singapore
| | - Vachiranee Limviphuvadh
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute, Singapore, Singapore
| | - Sebastian Maurer-Stroh
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute, Singapore, Singapore
- Yong Loo Lin School of Medicine and Department of Biological Sciences, National University of Singapore (NUS), Singapore, Singapore
| | | | | |
Collapse
|
2
|
McCoubrey LE, Shen C, Mwasambu S, Favaron A, Sangfuang N, Thomaidou S, Orlu M, Globisch D, Basit AW. Characterising and preventing the gut microbiota's inactivation of trifluridine, a colorectal cancer drug. Eur J Pharm Sci 2024; 203:106922. [PMID: 39368784 DOI: 10.1016/j.ejps.2024.106922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 10/01/2024] [Accepted: 10/02/2024] [Indexed: 10/07/2024]
Abstract
The gut microbiome can metabolise hundreds of drugs, potentially affecting their bioavailability and pharmacological effect. As most gut bacteria reside in the colon, drugs that reach the colon in significant proportions may be most impacted by microbiome metabolism. In this study the anti-colorectal cancer drug trifluridine was used as a model drug for characterising metabolism by the colonic microbiota, identifying correlations between bacterial species and individuals' rates of microbiome drug inactivation, and developing strategies to prevent drug inactivation following targeted colonic delivery. High performance liquid chromatography and ultra-high performance liquid chromatography coupled with high resolution tandem mass spectrometry demonstrated trifluridine's variable and multi-route metabolism by the faecal microbiota sourced from six healthy humans. Here, four drug metabolites were linked to the microbiome for the first time. Metagenomic sequencing of the human microbiota samples revealed their composition, which facilitated prediction of individual donors' microbial trifluridine inactivation. Notably, the abundance of Clostridium perfringens strongly correlated with the extent of trifluridine inactivation by microbiota samples after 2 hours (R2 = 0.8966). Finally, several strategies were trialled for the prevention of microbial trifluridine metabolism. It was shown that uridine, a safe and well-tolerated molecule, significantly reduced the microbiota's metabolism of trifluridine by acting as a competitive enzyme inhibitor. Further, uridine was found to provide prebiotic effects. The findings in this study greatly expand knowledge on trifluridine's interactions with the gut microbiome and provide valuable insights for investigating the microbiome metabolism of other drugs. The results demonstrate how protection strategies could enhance the colonic stability of microbiome-sensitive drugs.
Collapse
Affiliation(s)
- Laura E McCoubrey
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom
| | - Chenghao Shen
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom
| | - Sydney Mwasambu
- Department of Chemistry - BMC, Science for Life Laboratory, Uppsala University, 75124 Uppsala, Sweden
| | - Alessia Favaron
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom
| | - Nannapat Sangfuang
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom
| | - Stavrina Thomaidou
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom
| | - Mine Orlu
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom
| | - Daniel Globisch
- Department of Chemistry - BMC, Science for Life Laboratory, Uppsala University, 75124 Uppsala, Sweden
| | - Abdul W Basit
- UCL School of Pharmacy, 29-39 Brunswick Square, London, WC1N 1AX, United Kingdom.
| |
Collapse
|
3
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
4
|
Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q, Zhu Y, Hu Y, Zhao Z, Chen M. T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors. Comput Struct Biotechnol J 2024; 23:801-812. [PMID: 38328004 PMCID: PMC10847861 DOI: 10.1016/j.csbj.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024] Open
Abstract
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.
Collapse
Affiliation(s)
- Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
- Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Haoyu Chao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Sida Li
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Qinyang Ni
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yanyan Zhu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Institute of Hematology, Zhejiang University School of Medicine, The First Affiliated Hospital, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
5
|
Schmidt B, Kallenborn F, Chacon A, Hundt C. CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search. BMC Bioinformatics 2024; 25:342. [PMID: 39488701 PMCID: PMC11531700 DOI: 10.1186/s12859-024-05965-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 10/22/2024] [Indexed: 11/04/2024] Open
Abstract
BACKGROUND The maximal sensitivity for local pairwise alignment makes the Smith-Waterman algorithm a popular choice for protein sequence database search. However, its quadratic time complexity makes it compute-intensive. Unfortunately, current state-of-the-art software tools are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. This motivates the need for more efficient implementations. RESULTS CUDASW++4.0 is a fast software tool for scanning protein sequence databases with the Smith-Waterman algorithm on CUDA-enabled GPUs. Our approach achieves high efficiency for dynamic programming-based alignment computation by minimizing memory accesses and instructions. We provide both efficient matrix tiling, and sequence database partitioning schemes, and exploit next generation floating point arithmetic and novel DPX instructions. This leads to close-to-peak performance on modern GPU generations (Ampere, Ada, Hopper) with throughput rates of up to 1.94 TCUPS, 5.01 TCUPS, 5.71 TCUPS on an A100, L40S, and H100, respectively. Evaluation on the Swiss-Prot, UniRef50, and TrEMBL databases shows that CUDASW++4.0 gains over an order-of-magnitude performance improvements over previous GPU-based approaches (CUDASW++3.0, ADEPT, SW#DB). In addition, our algorithm demonstrates significant speedups over top-performing CPU-based tools (BLASTP, SWIPE, SWIMM2.0), can exploit multi-GPU nodes with linear scaling, and features an impressive energy efficiency of up to 15.7 GCUPS/Watt. CONCLUSION CUDASW++4.0 changes the standing of GPUs in protein sequence database search with Smith-Waterman alignment by providing close-to-peak performance on modern GPUs. It is freely available at https://github.com/asbschmidt/CUDASW4 .
Collapse
Affiliation(s)
- Bertil Schmidt
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany.
| | - Felix Kallenborn
- Department of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
| | | | | |
Collapse
|
6
|
Freire-Zapata V, Holland-Moritz H, Cronin DR, Aroney S, Smith DA, Wilson RM, Ernakovich JG, Woodcroft BJ, Bagby SC, Rich VI, Sullivan MB, Stegen JC, Tfaily MM. Microbiome-metabolite linkages drive greenhouse gas dynamics over a permafrost thaw gradient. Nat Microbiol 2024; 9:2892-2908. [PMID: 39354152 PMCID: PMC11522005 DOI: 10.1038/s41564-024-01800-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/30/2024] [Indexed: 10/03/2024]
Abstract
Interactions between microbiomes and metabolites play crucial roles in the environment, yet how these interactions drive greenhouse gas emissions during ecosystem changes remains unclear. Here we analysed microbial and metabolite composition across a permafrost thaw gradient in Stordalen Mire, Sweden, using paired genome-resolved metagenomics and high-resolution Fourier transform ion cyclotron resonance mass spectrometry guided by principles from community assembly theory to test whether microorganisms and metabolites show concordant responses to changing drivers. Our analysis revealed divergence between the inferred microbial versus metabolite assembly processes, suggesting distinct responses to the same selective pressures. This contradicts common assumptions in trait-based microbial models and highlights the limitations of measuring microbial community-level data alone. Furthermore, feature-scale analysis revealed connections between microbial taxa, metabolites and observed CO2 and CH4 porewater variations. Our study showcases insights gained by using feature-level data and microorganism-metabolite interactions to better understand metabolic processes that drive greenhouse gas emissions during ecosystem changes.
Collapse
Affiliation(s)
| | - Hannah Holland-Moritz
- Department of Natural Resources and the Environment, University of New Hampshire, Durham, NH, USA
- Center for Soil Biogeochemistry and Microbial Ecology, University of New Hampshire, Durham, NH, USA
| | - Dylan R Cronin
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA
| | - Sam Aroney
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, QLD, Australia
| | - Derek A Smith
- Department of Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Rachel M Wilson
- Department of Earth Ocean and Atmospheric Sciences, Florida State University, Tallahassee, FL, USA
| | - Jessica G Ernakovich
- Department of Natural Resources and the Environment, University of New Hampshire, Durham, NH, USA
| | - Ben J Woodcroft
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, QLD, Australia
| | - Sarah C Bagby
- Department of Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Virginia I Rich
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
| | - Matthew B Sullivan
- Department of Microbiology, The Ohio State University, Columbus, OH, USA
- Center of Microbiome Science, The Ohio State University, Columbus, OH, USA
- Department of Civil, Environmental, and Geodetic Engineering, The Ohio State University, Columbus, OH, USA
| | - James C Stegen
- Terrestrial and Aquatic Integration Team, Pacific Northwest National Laboratory, Richland, WA, USA
- School of the Environment, Washington State University, Pullman, WA, USA
| | - Malak M Tfaily
- Department of Environmental Science, The University of Arizona, Tucson, AZ, USA.
- Bio5 Institute, The University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
7
|
Walden KKO, Cao Y, Fields CJ, Hernandez AG, Rendon GA, Robinson GE, Skinner RK, Stein JA, Dietrich CH. High-quality genome assemblies for nine non-model North American insect species representing six orders (Insecta: Coleoptera, Diptera, Hemiptera, Hymenoptera, Lepidoptera, Neuroptera). Mol Ecol Resour 2024; 24:e14010. [PMID: 39155537 DOI: 10.1111/1755-0998.14010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 07/24/2024] [Accepted: 08/06/2024] [Indexed: 08/20/2024]
Abstract
Field-collected specimens were used to obtain nine high-quality genome assemblies from a total of 10 insect species native to prairies and savannas of central Illinois (USA): Mellilla xanthometata (Lepidoptera: Geometridae), Stenolophus ochropezus (Coleoptera: Carabidae), Forcipata loca (Hemiptera: Cicadellidae), Coelinius sp. (Hymenoptera: Braconidae), Thaumatomyia glabra (Diptera: Chloropidae), Brachynemurus abdominalus (Neuroptera: Myrmeleontidae), Catonia carolina (Hemiptera: Achilidae), Oncometopia orbona (Hemiptera: Cicadellidae), Flexamia atlantica (Hemiptera: Cicadellidae) and Stictocephala bisonia (Hemiptera: Membracidae). Sequencing library preparation from single specimens was successful despite extremely small DNA yields (<0.1 μg) for some samples. Additional sequencing and assembly workflows were adapted to each sample depending on the initial DNA yield. PacBio circular consensus (CCS/HiFi) or continuous long reads (CLR) libraries were used to sequence DNA fragments up to 50 kb in length, with Illumina sequenced linked-reads (TellSeq libraries) and Omni-C libraries used for scaffolding and gap-filling. Assembled genome sizes ranged from 135 MB to 3.2 GB. The number of assembled scaffolds ranged from 47 to >13,000, with the longest scaffold per assembly ranging from ~23 to 439 Mb. Genome completeness was high, with BUSCO scores ranging from 85.5% completeness for the largest genome (Stictocephala bisonia) to 98.8% completeness for the smallest genome (Coelinius sp.). The unique content was estimated using RepeatMasker and GenomeScope2, which ranged from 50.7% to 75.8% and roughly decreased with increasing genome size. Structural annotation predicted a range of 19,281-72,469 protein models for sequenced species. Sequencing costs per genome at the time ranged from US$3-5k, averaged ~1600 CPU-hours on a high-performance cluster and required approximately 14 h of bioinformatics analyses with samples using PacBio HiFi data. Most assemblies would benefit from further manual curation to correct possible scaffold misjoins and translocations suggested by off-diagonal or depleted signals in Omni-C contact maps.
Collapse
Affiliation(s)
- Kimberly K O Walden
- Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Yanghui Cao
- Key Laboratory of Plant Protection Resources and Pest Management of the Ministry of Education, Entomological Museum, Northwest A&F University, Yangling, Shaanxi, China
| | - Christopher J Fields
- Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Alvaro G Hernandez
- Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Gloria A Rendon
- Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Gene E Robinson
- Department of Entomology, Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Rachel K Skinner
- Pennsylvania State University at Brandywine, Media, Pennsylvania, USA
| | - Jeffrey A Stein
- Prairie Research Institute, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Christopher H Dietrich
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
8
|
Qi D, Song C, Liu T. PreDBP-PLMs: Prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks. Anal Biochem 2024; 694:115603. [PMID: 38986796 DOI: 10.1016/j.ab.2024.115603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 06/15/2024] [Accepted: 07/06/2024] [Indexed: 07/12/2024]
Abstract
The recognition of DNA-binding proteins (DBPs) is the crucial step to understanding their roles in various biological processes such as genetic regulation, gene expression, cell cycle control, DNA repair, and replication within cells. However, conventional experimental methods for identifying DBPs are usually time-consuming and expensive. Therefore, there is an urgent need to develop rapid and efficient computational methods for the prediction of DBPs. In this study, we proposed a novel predictor named PreDBP-PLMs to further improve the identification accuracy of DBPs by fusing the pre-trained protein language model (PLM) ProtT5 embedding with evolutionary features as input to the classic convolutional neural network (CNN) model. Firstly, the ProtT5 embedding was combined with different evolutionary features derived from the position-specific scoring matrix (PSSM) to represent protein sequences. Then, the optimal feature combination was selected and input to the CNN classifier for the prediction of DBPs. Finally, the 5-fold cross-validation (CV), the leave-one-out CV (LOOCV), and the independent set test were adopted to examine the performance of PreDBP-PLMs on the benchmark datasets. Compared to the existing state-of-the-art predictors, PreDBP-PLMs exhibits an accuracy improvement of 0.5 % and 5.2 % on the PDB186 and PDB2272 datasets, respectively. It demonstrated that the proposed method could serve as a useful tool for the recognition of DBPs.
Collapse
Affiliation(s)
- Dawei Qi
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Chen Song
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China.
| |
Collapse
|
9
|
Hu J, Chen KX, Rao B, Ni JY, Thafar MA, Albaradei S, Arif M. Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism. Anal Biochem 2024; 694:115637. [PMID: 39121938 DOI: 10.1016/j.ab.2024.115637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/28/2024] [Accepted: 08/06/2024] [Indexed: 08/12/2024]
Abstract
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China; Center for AI and Computational Biology, Suzhou Institution of Systems Medicine, Suzhou, 215123, China.
| | - Kai-Xin Chen
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- School of Information & Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China
| | - Jing-Yuan Ni
- NUIST Reading Academy, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Maha A Thafar
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
| | - Somayah Albaradei
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, 34110, Qatar.
| |
Collapse
|
10
|
Pomianowski K, Kulczykowska E, Burzyński A. Genome guided, organ-specific transcriptome assembly of the European flounder (P. flesus) from the Baltic Sea. Sci Data 2024; 11:1184. [PMID: 39477936 PMCID: PMC11525550 DOI: 10.1038/s41597-024-04004-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 10/15/2024] [Indexed: 11/02/2024] Open
Abstract
Although the European flounder is frequently used in research and has economic importance, there is still lack of comprehensive transcriptome data for this species. In the present research we show RNA-Seq data from ten selected organs of P. flesus female inhabiting brackish waters of the Gulf of Gdańsk (southern Baltic Sea). High throughput Next Generation Sequencing technology NovaSeq 6000 was used to generate 500 M sequencing reads. These were mapped against European flounder reference genome and reads extracted from the mapping were assembled producing 61k reliable contigs. Gene ontology (GO) terms were assigned to the majority of annotated contigs/unigenes based on the results of PFAM, PANTHER, UniProt and InterPro protein databases searches. BUSCOs statistics for eukaryota, metazoa, vertebrata and actinopterygii databases showed that the reported transcriptome represents a high level of completeness. The data set can be successfully used as a tool in design of experiments from various research fields including biology, aquaculture and toxicology.
Collapse
Affiliation(s)
- Konrad Pomianowski
- Department of Genetics and Marine Biotechnology, Institute of Oceanology, Polish Academy of Sciences, Powstańców Warszawy 55 Str., 81-712, Sopot, Poland.
| | - Ewa Kulczykowska
- Department of Genetics and Marine Biotechnology, Institute of Oceanology, Polish Academy of Sciences, Powstańców Warszawy 55 Str., 81-712, Sopot, Poland
| | - Artur Burzyński
- Department of Genetics and Marine Biotechnology, Institute of Oceanology, Polish Academy of Sciences, Powstańców Warszawy 55 Str., 81-712, Sopot, Poland
| |
Collapse
|
11
|
Hu X, Li J, Liu T. Alg-MFDL: A multi-feature deep learning framework for allergenic proteins prediction. Anal Biochem 2024:115701. [PMID: 39481588 DOI: 10.1016/j.ab.2024.115701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/26/2024] [Accepted: 10/28/2024] [Indexed: 11/02/2024]
Abstract
The escalating global incidence of allergy patients illustrates the growing impact of allergic issues on global health. Allergens are small molecule antigens that trigger allergic reactions. A widely recognized strategy for allergy prevention involves identifying allergens and avoiding re-exposure. However, the laboratory methods to identify allergenic proteins are often time-consuming and resource-intensive. There is a crucial need to establish efficient and reliable computational approaches for the identification of allergenic proteins. In this study, we developed a novel allergenic proteins predictor named Alg-MFDL, which integrates pre-trained protein language models (PLMs) and traditional handcrafted features to achieve a more complete protein representation. First, we compared the performance of eight pre-trained PLMs from ProtTrans and ESM-2 and selected the best-performing one from each of the two groups. In addition, we evaluated the performance of three handcrafted features and different combinations of them to select the optimal feature or feature combination. Then, these three protein representations were fused and used as inputs to train the convolutional neural network (CNN). Finally, the independent validation was performed on benchmark datasets to evaluate the performance of Alg-MFDL. As a result, Alg-MFDL achieved an accuracy of 0.973, a precision of 0.996, a sensitivity of 0.951, and an F1 value of 0.973, outperforming the most of current state-of-the-art (SOTA) methods across all key metrics. We anticipated that the proposed model could be considered a useful tool for predicting allergen proteins. The datasets and code utilized in this study are freely available at https://github.com/Hupenpen/Alg-MFDL.
Collapse
Affiliation(s)
- Xiang Hu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Jingyi Li
- AIEN Institute, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
12
|
Macandog ADG, Catozzi C, Capone M, Nabinejad A, Nanaware PP, Liu S, Vinjamuri S, Stunnenberg JA, Galiè S, Jodice MG, Montani F, Armanini F, Cassano E, Madonna G, Mallardo D, Mazzi B, Pece S, Tagliamonte M, Vanella V, Barberis M, Ferrucci PF, Blank CU, Bouvier M, Andrews MC, Xu X, Santambrogio L, Segata N, Buonaguro L, Cocorocchio E, Ascierto PA, Manzo T, Nezi L. Longitudinal analysis of the gut microbiota during anti-PD-1 therapy reveals stable microbial features of response in melanoma patients. Cell Host Microbe 2024:S1931-3128(24)00392-5. [PMID: 39481388 DOI: 10.1016/j.chom.2024.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 09/15/2024] [Accepted: 10/07/2024] [Indexed: 11/02/2024]
Abstract
Immune checkpoint inhibitors (ICIs) improve outcomes in advanced melanoma, but many patients are refractory or experience relapse. The gut microbiota modulates antitumor responses. However, inconsistent baseline predictors point to heterogeneity in responses and inadequacy of cross-sectional data. We followed patients with unresectable melanoma from baseline and during anti-PD-1 therapy, collecting fecal and blood samples that were surveyed for changes in the gut microbiota and immune markers. Varying patient responses were linked to different gut microbiota dynamics during ICI treatment. We select complete responders by their stable microbiota functions and validate them using multiple external cohorts and experimentally. We identify major histocompatibility complex class I (MHC class I)-restricted peptides derived from flagellin-related genes of Lachnospiraceae (FLach) as structural homologs of tumor-associated antigens, detect FLach-reactive CD8+ T cells in complete responders before ICI therapy, and demonstrate that FLach peptides improve antitumor immunity. These findings highlight the prognostic value of microbial functions and therapeutic potential of tumor-mimicking microbial peptides.
Collapse
Affiliation(s)
- Angeli D G Macandog
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Carlotta Catozzi
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Mariaelena Capone
- Melanoma, Cancer Immunotherapy and Development Therapeutics Unit, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | - Amir Nabinejad
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Padma P Nanaware
- Department of Radiation Oncology, Weill Cornell Medicine, New York, NY 10065, USA
| | - Shujing Liu
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104-4238, USA
| | - Smita Vinjamuri
- Department of Microbiology and Immunology, College of Medicine, University of Illinois at Chicago, Chicago, IL 60612-7342, USA
| | - Johanna A Stunnenberg
- Netherlands Cancer Institute (NKI)-AVL, North Holland, Amsterdam 1066 CX, the Netherlands
| | - Serena Galiè
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Maria Giovanna Jodice
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Francesca Montani
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Federica Armanini
- Department of CIBIO, University of Trento, Trento, Povo 38123, Italy
| | - Ester Cassano
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Gabriele Madonna
- Melanoma, Cancer Immunotherapy and Development Therapeutics Unit, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | - Domenico Mallardo
- Melanoma, Cancer Immunotherapy and Development Therapeutics Unit, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | | | - Salvatore Pece
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Maria Tagliamonte
- Innovative Immunological Models, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | - Vito Vanella
- Melanoma, Cancer Immunotherapy and Development Therapeutics Unit, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | - Massimo Barberis
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | | | - Christian U Blank
- Netherlands Cancer Institute (NKI)-AVL, North Holland, Amsterdam 1066 CX, the Netherlands
| | - Marlene Bouvier
- Department of Microbiology and Immunology, College of Medicine, University of Illinois at Chicago, Chicago, IL 60612-7342, USA
| | - Miles C Andrews
- Department of Medicine, School of Translational Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Xiaowei Xu
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104-4238, USA
| | - Laura Santambrogio
- Department of Radiation Oncology, Weill Cornell Medicine, New York, NY 10065, USA
| | - Nicola Segata
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy; Department of CIBIO, University of Trento, Trento, Povo 38123, Italy
| | - Luigi Buonaguro
- Innovative Immunological Models, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | - Emilia Cocorocchio
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy
| | - Paolo A Ascierto
- Melanoma, Cancer Immunotherapy and Development Therapeutics Unit, Istituto Nazionale Tumori-IRCCS Fondazione G. Pascale, Naples 80131, Italy
| | - Teresa Manzo
- Department of Molecular Biotechnology and Health Sciences, University of Torino, Turin 10126, Italy
| | - Luigi Nezi
- Department of Experimental Oncology, Istituto Europeo di Oncologia-IRCCS, Milan 20139, Italy.
| |
Collapse
|
13
|
Sun X, Wu Z, Su J, Li C. GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5. Int J Biol Macromol 2024:136933. [PMID: 39471921 DOI: 10.1016/j.ijbiomac.2024.136933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 10/21/2024] [Accepted: 10/24/2024] [Indexed: 11/01/2024]
Abstract
Protein-protein/peptide interactions play crucial roles in various biological processes. Exploring their interactions attracts wide attention. However, accurately predicting their binding sites remains a challenging task. Here, we develop an effective model GraphPBSP based on Graph Attention Network with Convolutional Neural Network and Multilayer Perceptron for protein-protein/peptide binding site prediction, which utilizes various feature types derived from protein sequence and structure including interface residue pairwise propensity developed by us and sequence embeddings obtained from a new pre-trained model ProstT5, alongside physicochemical properties and structural features. To our best knowledge, ProstT5 sequence embeddings and residue pairwise propensity are first introduced for protein-protein/peptide binding site prediction. Additionally, we propose a spatial neighbor-based feature statistic method for effectively considering key spatially neighboring information that significantly improves the model's prediction ability. For model training, a multi-scale objective function is constructed, which enhances the learning capability across samples of the same or different classes. On multiple protein-protein/peptide binding site test sets, GraphPBSP outperforms the currently available state-of-the-art methods with an excellent performance. Additionally, its performances on protein-DNA/RNA binding site test sets also demonstrate its good generalization ability. In conclusion, GraphPBSP is a promising method, which can offer valuable information for protein engineering and drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
14
|
Dong B, Su H, Xu D, Hou C, Liu Z, Niu N, Wang G. ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure. Genes (Basel) 2024; 15:1350. [PMID: 39457474 PMCID: PMC11507629 DOI: 10.3390/genes15101350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Revised: 10/07/2024] [Accepted: 10/18/2024] [Indexed: 10/28/2024] Open
Abstract
BACKGROUND Protein secondary structure prediction (PSSP) is a critical task in computational biology, pivotal for understanding protein function and advancing medical diagnostics. Recently, approaches that integrate multiple amino acid sequence features have gained significant attention in PSSP research. OBJECTIVES We aim to automatically extract additional features represented by evolutionary information from a large number of sequences while simultaneously incorporating positional information for more comprehensive sequence features. Additionally, we consider the interdependence between secondary structures during the prediction stage. METHODS To this end, we propose a deep neural network model, ILMCNet, which utilizes a language model and Conditional Random Field (CRF). Protein language models (PLMs) pre-trained on sequences from multiple large databases can provide sequence features that incorporate evolutionary information. ILMCNet uses positional encoding to ensure that the input features include positional information. To better utilize these features, we propose a hybrid network architecture that employs a Transformer Encoder to enhance features and integrates a feature extraction module combining a Convolutional Neural Network (CNN) with a Bidirectional Long Short-Term Memory Network (BiLSTM). This design enables deep extraction of localized features while capturing global bidirectional information. In the prediction stage, ILMCNet employs CRF to capture the interdependencies between secondary structures. RESULTS Experimental results on benchmark datasets such as CB513, TS115, NEW364, CASP11, and CASP12 demonstrate that the prediction performance of our method surpasses that of comparable approaches. CONCLUSIONS This study proposes a new approach to PSSP research and is expected to play an important role in other protein-related research fields, such as protein tertiary structure prediction.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China; (B.D.); (H.S.); (D.X.); (C.H.); (Z.L.); (N.N.)
| |
Collapse
|
15
|
Qi D, Liu T. VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning. Biochim Biophys Acta Gen Subj 2024; 1868:130721. [PMID: 39426757 DOI: 10.1016/j.bbagen.2024.130721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/24/2024] [Accepted: 10/11/2024] [Indexed: 10/21/2024]
Abstract
Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs.
Collapse
Affiliation(s)
- Dawei Qi
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
16
|
Tripp A, Braun M, Wieser F, Oberdorfer G, Lechner H. Click, Compute, Create: A Review of Web-based Tools for Enzyme Engineering. Chembiochem 2024; 25:e202400092. [PMID: 38634409 DOI: 10.1002/cbic.202400092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/19/2024]
Abstract
Enzyme engineering, though pivotal across various biotechnological domains, is often plagued by its time-consuming and labor-intensive nature. This review aims to offer an overview of supportive in silico methodologies for this demanding endeavor. Starting from methods to predict protein structures, to classification of their activity and even the discovery of new enzymes we continue with describing tools used to increase thermostability and production yields of selected targets. Subsequently, we discuss computational methods to modulate both, the activity as well as selectivity of enzymes. Last, we present recent approaches based on cutting-edge machine learning methods to redesign enzymes. With exception of the last chapter, there is a strong focus on methods easily accessible via web-interfaces or simple Python-scripts, therefore readily useable for a diverse and broad community.
Collapse
Affiliation(s)
- Adrian Tripp
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Markus Braun
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Florian Wieser
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
| | - Gustav Oberdorfer
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| | - Horst Lechner
- Institute of Biochemistry, Graz University of Technology, Petersgasse 12/2, 8010, Graz, Austria
- BioTechMed, Graz, Austria
| |
Collapse
|
17
|
Leibovitzh H, Sarbagili Shabat C, Hirsch A, Zittan E, Mentella MC, Petito V, Cohen NA, Ron Y, Fliss Isakov N, Pfeffer J, Yaakov M, Fanali C, Turchini L, Masucci L, Quaranta G, Kolonimos N, Godneva A, Weinberger A, Scaldaferri F, Maharshak N. Faecal Transplantation for Ulcerative Colitis From Diet Conditioned Donors Followed by Dietary Intervention Results in Favourable Gut Microbial Profile Compared to Faecal Transplantation Alone. J Crohns Colitis 2024; 18:1606-1614. [PMID: 38720628 DOI: 10.1093/ecco-jcc/jjae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/17/2024] [Accepted: 05/08/2024] [Indexed: 10/17/2024]
Abstract
BACKGROUND AND AIMS Several faecal microbial transplantation [FMT] approaches for ulcerative colitis [UC] have been investigated with conflicting results. We have recently published the clinical outcomes from the CRAFT UC Trial using FMT with the UC Exclusion Diet [UCED], compared with FMT alone. Here we aimed to compare the two FMT strategies in terms of microbial profile and function. METHODS Subjects recruited to the CRAFT UC study with available pre- and post-intervention faecal samples were included. Donors received diet conditioning for 14 days based on the UCED principles. Group 1 received single FMT by colonoscopy [Day 1] and enemas [Days 2 and 14] without donors' dietary conditioning [N = 11]. Group 2 received FMT but with donors' dietary pre-conditioning and UCED for the patients [N = 10]. Faecal samples were assessed by DNA shotgun metagenomic sequencing. RESULTS Following diet conditioning, donors showed depletion in metabolic pathways involved in biosynthesis of sulphur-containing amino acids. Only Group 2 showed significant shifts towards the donors' microbial composition [ADONIS: R2 = 0.15, p = 0.008] and significantly increased Eubacterium_sp_AF228LB post-intervention [β-coefficient 2.66, 95% confidence interval 2.1-3.3, q < 0.05] which was inversely correlated with faecal calprotectin [rho = -0.52, p = 0.035]. Moreover, pathways involved in gut inflammation and barrier function including branched chain amino acids were enriched post-intervention in Group 2 and were significantly inversely correlated with faecal calprotectin. CONCLUSION FMT from diet conditioned donors followed by the UCED led to microbial alterations associated with favourable microbial profiles which correlated with decreased faecal calprotectin. Our findings support further exploration of the additive benefit of dietary intervention for both donors and patients undergoing FMT as a potential treatment of UC.
Collapse
Affiliation(s)
- Haim Leibovitzh
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Chen Sarbagili Shabat
- Pediatric Gastroenterology Unit, PIBD Research Center, Wolfson Medical Center, Holon, Israel
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ayal Hirsch
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Eran Zittan
- Gastroenterology Institute, IBD Unit, Haemek Medical Center, Afula, Israel
| | - Maria Chiara Mentella
- UOC di Nutrizione Clinica, Fondazione Policlinico Universitario A. Gemelli IRCCS, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Valentina Petito
- Cemad [CENTER for Digestive Disease], UOC Medicina Internae Gastroenterologia, Fondazione Policlinico 'A. Gemelli' IRCCS, Rome, Italy
| | - Nathaniel Aviv Cohen
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Yulia Ron
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Naomi Fliss Isakov
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Department of Health, School of Public Health, Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Jorge Pfeffer
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Michal Yaakov
- Pediatric Gastroenterology Unit, PIBD Research Center, Wolfson Medical Center, Holon, Israel
| | - Caterina Fanali
- Cemad [CENTER for Digestive Disease], UOC Medicina Internae Gastroenterologia, Fondazione Policlinico 'A. Gemelli' IRCCS, Rome, Italy
| | - Laura Turchini
- Cemad [CENTER for Digestive Disease], UOC Medicina Internae Gastroenterologia, Fondazione Policlinico 'A. Gemelli' IRCCS, Rome, Italy
| | - Luca Masucci
- Istituto di Microbiologia, Università Cattolica del Sacro Cuore - Fondazione Policlinico 'A. Gemelli' IRCSS, Rome, Italy
- Dipartimento Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Gianluca Quaranta
- Istituto di Microbiologia, Università Cattolica del Sacro Cuore - Fondazione Policlinico 'A. Gemelli' IRCSS, Rome, Italy
- Dipartimento Scienze di Laboratorio e Infettivologiche, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
| | - Nitzan Kolonimos
- Gastroenterology Institute, IBD Unit, Haemek Medical Center, Afula, Israel
| | - Anastasia Godneva
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Adina Weinberger
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel
| | - Franco Scaldaferri
- Cemad [CENTER for Digestive Disease], UOC Medicina Internae Gastroenterologia, Fondazione Policlinico 'A. Gemelli' IRCCS, Rome, Italy
- Dipartimento di Medicina e Chirurgia Traslazionale, Università Cattolica del Sacro Cuore- Fondazione Policlinico 'A. Gemelli' IRCCS, Rome, Italy
| | - Nitsan Maharshak
- Department of Gastroenterology and Hepatology, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Faculty of Medical and Health Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
18
|
Yang S, Xu P. LLM4THP: a computing tool to identify tumor homing peptides by molecular and sequence representation of large language model based on two-layer ensemble model strategy. Amino Acids 2024; 56:62. [PMID: 39404804 PMCID: PMC11480143 DOI: 10.1007/s00726-024-03422-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 10/04/2024] [Indexed: 10/19/2024]
Abstract
Tumor homing peptides (THPs) have a distinctive capacity to specifically attach to tumor cells, providing a promising approach for targeted cancer treatment and detection. Although THPs have the potential for significant impact, their detection by conventional methods is both time-consuming and expensive. To tackle this issue, we provide LLM4THP, an innovative computational approach that utilizes large language models (LLMs) to quickly and effectively detect THPs. LLM4THP utilizes two protein LLMs, ESM2 and Prot_T5_XL_UniRef50, to encode peptide sequences. This allows for the capture of complex patterns and relationships within the peptide data. In addition, we utilize inherent sequence characteristics such as Amino Acid Composition (AAC), Pseudo Amino Acid Composition (PAAC), Amphiphilic Pseudo Amino Acid Composition (APAAC), and Composition, Transition, and Distribution (CTD) to improve the representation of peptides. The RDKitDescriptors feature representation approach transforms peptide sequences into molecular objects and computes chemical characteristics, resulting in enhanced THP identification. The LLM4THP ensemble strategy incorporates various features into a two-layer learning architecture. The first layer consists of LightGBM, XGBoost, Random Forest, and Extremely Randomized Trees, which generate a set of meta results. The second layer utilizes Logistic Regression to further refine the identification of sequences as either THP or non-THP. LLM4THP exhibits exceptional performance compared to the most advanced methods, showcasing enhancements in accuracy, Matthew's correlation coefficient, F1 score, area under the curve, and average precision. The source code and dataset can be accessed at the following URL: https://github.com/abcair/LLM4THP.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
- The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, Nanjing, 210037, China.
| |
Collapse
|
19
|
Mancin L, Paoli A, Berry S, Gonzalez JT, Collins AJ, Lizarraga MA, Mota JF, Nicola S, Rollo I. Standardization of gut microbiome analysis in sports. Cell Rep Med 2024; 5:101759. [PMID: 39368478 PMCID: PMC11514603 DOI: 10.1016/j.xcrm.2024.101759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/09/2024] [Accepted: 09/10/2024] [Indexed: 10/07/2024]
Abstract
The gut microbiome plays a significant role in physiological functions such as nutrient processing, vitamin production, inflammatory response, and immune modulation, which, in turn, are important contributors to athlete health and performance. To date, the interpretation, discussion, and visualization of microbiome results of athletes are challenging, due to a lack of standard parameters and reference data for collection and comparison. The purpose of this perspective piece is to provide researchers with an easy-to-understand framework for the collection, analysis, and data management related to the gut microbiome with a specific focus on athletic populations. In the absence of a consensus on microbiome research in the sports field, we hope that these considerations serve as foundational "best practice." Adherence to these standard operating procedures will accelerate the path toward improving the quality of data and ultimately our understanding of the influence of the gut microbiome in sport settings.
Collapse
Affiliation(s)
- Laura Mancin
- Department of Biomedical Sciences, University of Padua, Padua, Italy; Human Inspired Technology Research Center HIT, University of Padua, Padua, Italy.
| | - Antonio Paoli
- Department of Biomedical Sciences, University of Padua, Padua, Italy; Human Inspired Technology Research Center HIT, University of Padua, Padua, Italy
| | - Sara Berry
- Department of Nutritional Sciences, King's College London, London, UK
| | | | - Adam J Collins
- Department for Health, University of Bath, BA2 7AY Bath, UK
| | | | - Joao Felipe Mota
- APC Microbiome Ireland, Department of Medicine, School of Microbiology, University College Cork, T12 YT20 Cork, Ireland
| | - Segata Nicola
- Centre for Integrative Biology, University of Trento, Trento, Italy
| | - Ian Rollo
- Gatorade Sports Science Institute, PepsiCo Life Sciences, Global R&D, Leicestershire, UK; School of Sports Exercise and Health Sciences, Loughborough University, Leicestershire, UK
| |
Collapse
|
20
|
Gao W, Zhao J, Gui J, Wang Z, Chen J, Yue Z. Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides. J Chem Inf Model 2024; 64:7772-7785. [PMID: 39316765 DOI: 10.1021/acs.jcim.4c00507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.
Collapse
Affiliation(s)
- Wanling Gao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jun Zhao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jianfeng Gui
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zehan Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jie Chen
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
21
|
Fernández I, Bontems F, Brun D, Coquin Y, Goverde CA, Correia BE, Gessain A, Buseyne F, Rey FA, Backovic M. Structures of the Foamy virus fusion protein reveal an unexpected link with the F protein of paramyxo- and pneumoviruses. SCIENCE ADVANCES 2024; 10:eado7035. [PMID: 39392890 PMCID: PMC11468914 DOI: 10.1126/sciadv.ado7035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 09/06/2024] [Indexed: 10/13/2024]
Abstract
Foamy viruses (FVs) constitute a subfamily of retroviruses. Their envelope (Env) glycoprotein drives the merger of viral and cellular membranes during entry into cells. The only available structures of retroviral Envs are those from human and simian immunodeficiency viruses from the subfamily of orthoretroviruses, which are only distantly related to the FVs. We report the cryo-electron microscopy structures of the FV Env ectodomain in the pre- and post-fusion states, which unexpectedly demonstrate structural similarity with the fusion protein (F) of paramyxo- and pneumoviruses, implying an evolutionary link between the viral fusogens. We describe the structural features that are unique to the FV Env and propose a mechanistic model for its conformational change, highlighting how the interplay of its structural elements could drive membrane fusion and viral entry. The structural knowledge on the FV Env now provides a framework for functional investigations, which can benefit the design of FV Env variants with improved features for use as gene therapy vectors.
Collapse
Affiliation(s)
- Ignacio Fernández
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité de Virologie Structurale, 75015 Paris, France
| | - François Bontems
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité de Virologie Structurale, 75015 Paris, France
- Institut de Chimie des Substances Naturelles, CNRS UPR2301, Université Paris Saclay, 91190 Gif-sur-Yvette, France
| | - Delphine Brun
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité de Virologie Structurale, 75015 Paris, France
| | - Youna Coquin
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité d’Epidémiologie et Physiopathologie des Virus Oncogènes, 75015 Paris, France
| | - Casper A. Goverde
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Bruno E. Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Antoine Gessain
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité d’Epidémiologie et Physiopathologie des Virus Oncogènes, 75015 Paris, France
| | - Florence Buseyne
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité d’Epidémiologie et Physiopathologie des Virus Oncogènes, 75015 Paris, France
| | - Felix A. Rey
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité de Virologie Structurale, 75015 Paris, France
| | - Marija Backovic
- Institut Pasteur, Université Paris Cité, CNRS UMR3569, Unité de Virologie Structurale, 75015 Paris, France
| |
Collapse
|
22
|
Zeng S, Wang D, Jiang L, Xu D. Parameter-efficient fine-tuning on large protein language models improves signal peptide prediction. Genome Res 2024; 34:1445-1454. [PMID: 39060029 PMCID: PMC11529868 DOI: 10.1101/gr.279132.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 07/15/2024] [Indexed: 07/28/2024]
Abstract
Signal peptides (SPs) play a crucial role in protein translocation in cells. The development of large protein language models (PLMs) and prompt-based learning provide a new opportunity for SP prediction, especially for the categories with limited annotated data. We present a parameter-efficient fine-tuning (PEFT) framework for SP prediction, PEFT-SP, to effectively utilize pretrained PLMs. We integrated low-rank adaptation (LoRA) into ESM-2 models to better leverage the protein sequence evolutionary knowledge of PLMs. Experiments show that PEFT-SP using LoRA enhances state-of-the-art results, leading to a maximum Matthews correlation coefficient (MCC) gain of 87.3% for SPs with small training samples and an overall MCC gain of 6.1%. Furthermore, we also employed two other PEFT methods, prompt tuning and adapter tuning, in ESM-2 for SP prediction. More elaborate experiments show that PEFT-SP using adapter tuning can also improve the state-of-the-art results by up to 28.1% MCC gain for SPs with small training samples and an overall MCC gain of 3.8%. LoRA requires fewer computing resources and less memory than the adapter tuning during the training stage, making it possible to adapt larger and more powerful protein models for SP prediction.
Collapse
Affiliation(s)
- Shuai Zeng
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA
| | - Lei Jiang
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, USA
| |
Collapse
|
23
|
Ge Y, Yang M, Yu X, Zhou Y, Zhang Y, Mou M, Chen Z, Sun X, Ni F, Fu T, Liu S, Han L, Zhu F. MolBiC: the cell-based landscape illustrating molecular bioactivities. Nucleic Acids Res 2024:gkae868. [PMID: 39373530 DOI: 10.1093/nar/gkae868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/13/2024] [Accepted: 09/20/2024] [Indexed: 10/08/2024] Open
Abstract
The measurement of cell-based molecular bioactivity (CMB) is critical for almost every step of drug development. With the booming application of AI in biomedicine, it is essential to have the CMB data to promote the learning of cell-based patterns for guiding modern drug discovery, but no database providing such information has been constructed yet. In this study, we introduce MolBiC, a knowledge base designed to describe valuable data on molecular bioactivity measured within a cellular context. MolBiC features 550 093 experimentally validated CMBs, encompassing 321 086 molecules and 2666 targets across 988 cell lines. Our MolBiC database is unique in describing the valuable data of CMB, which meets the critical demands for CMB-based big data promoting the learning of cell-based molecular/pharmaceutical pattern in drug discovery and development. MolBiC is now freely accessible without any login requirement at: https://idrblab.org/MolBiC/.
Collapse
Affiliation(s)
- Yichao Ge
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
- Department of Dermatology, Huashan Hospital, Fudan University, Shanghai Institute of Dermatology, Shanghai 200040, China
- Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Guangzhou, Guangzhou 511458, China
| | - Mengjie Yang
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Xinyuan Yu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Yintao Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhen Chen
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Ni
- Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
- LeadArt Biotechnologies Ltd., Ningbo 315201, China
| | - Tingting Fu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiping Liu
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China
| | - Lianyi Han
- Department of Dermatology, Huashan Hospital, Fudan University, Shanghai Institute of Dermatology, Shanghai 200040, China
- Greater Bay Area Institute of Precision Medicine, School of Life Sciences, Guangzhou, Guangzhou 511458, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
24
|
Qian Y, Liu X, Hu P, Gao L, Gu JD. Identifying the major metabolic potentials of microbial-driven carbon, nitrogen and sulfur cycling on stone cultural heritage worldwide. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 954:176757. [PMID: 39378943 DOI: 10.1016/j.scitotenv.2024.176757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 10/02/2024] [Accepted: 10/04/2024] [Indexed: 10/10/2024]
Abstract
Microbial activities and biochemical reactions are responsible for the biodeterioration of stone cultural heritage, but information on microbial metabolic potentials remains elusive. Here we profiled microbial community signatures and its functional traits on stone cultural heritage from different climate zones globally using sequencing datasets available publicly. Bacterial community on stone cultural heritage shows a significant separation between BSk (cold semi-arid climate) and Cfb (temperate oceanic climate) with Aw (tropical savanna climate) as a transition region. Importantly, the ubiquity of ammonia oxidizers and nitrite oxidizers on stone cultural heritage under different climates supports the active production and accumulation of nitrates while ammonia/ammonium can be supplied by dinitrogen fixation and dissimilatory nitrate reduction to ammonium (DNRA), together with the hydrolysis of urea, arginine, formamide and cyanate. Sulfate accumulation on stone cultural heritage is mainly resulted from the microbial-driven transformation of organosulfur and thiosulfate, with little dissimilatory reduction of sulfate. Pseudorhodoplanes was identified and reported in elemental sulfur turnover for the first time. Notably, carbon sequestration via the reductive tricarboxylic acid (rTCA) cycle and an incomplete 3-hydroxypropionate/4-hydroxybutynate (HP/HB) cycle other than the Calvin Benson-Bassham (CBB) cycle is also significant on stone cultural heritage under relatively humid climate. These results advance our understanding of microbial metabolic potentials and their genetical partitioning patterns on stone cultural heritage of different climate zones globally.
Collapse
Affiliation(s)
- Youfen Qian
- Civil and Environmental Engineering, Technion - Israel Institute of Technology, Haifa 320003, Israel; Environmental Science and Engineering Research Group, Guangdong Technion - Israel Institute of Technology, 241 Daxue Road, Shantou, Guangdong 515063, China
| | - Xiaobo Liu
- School of Environmental and Biological Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing, Jiangsu 210094, China
| | - Pengfei Hu
- Civil and Environmental Engineering, Technion - Israel Institute of Technology, Haifa 320003, Israel; Environmental Science and Engineering Research Group, Guangdong Technion - Israel Institute of Technology, 241 Daxue Road, Shantou, Guangdong 515063, China
| | - Lin Gao
- Civil and Environmental Engineering, Technion - Israel Institute of Technology, Haifa 320003, Israel; Environmental Science and Engineering Research Group, Guangdong Technion - Israel Institute of Technology, 241 Daxue Road, Shantou, Guangdong 515063, China
| | - Ji-Dong Gu
- Civil and Environmental Engineering, Technion - Israel Institute of Technology, Haifa 320003, Israel; Environmental Science and Engineering Research Group, Guangdong Technion - Israel Institute of Technology, 241 Daxue Road, Shantou, Guangdong 515063, China; Guangdong Provincial Key Laboratory of Materials and Technologies for Energy Conversion, Guangdong Technion - Israel Institute of Technology, 241 Daxue Road, Shantou, Guangdong 515063, China.
| |
Collapse
|
25
|
Groseclose T, Kober EA, Clark M, Moore B, Banerjee S, Bemmer V, Beckham GT, Pickford AR, Dale TT, Nguyen HB. A High-Throughput Screening Platform for Engineering Poly(ethylene Terephthalate) Hydrolases. ACS Catal 2024; 14:14622-14638. [PMID: 39386920 PMCID: PMC11459431 DOI: 10.1021/acscatal.4c04321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/28/2024] [Accepted: 09/06/2024] [Indexed: 10/12/2024]
Abstract
The ability of enzymes to hydrolyze the ubiquitous polyester, poly(ethylene terephthalate) (PET), has enabled the potential for bioindustrial recycling of this waste plastic. To date, many of these PET hydrolases have been engineered for improved catalytic activity and stability, but current screening methods have limitations in screening large libraries, including under high-temperature conditions. Here, we developed a platform that can simultaneously interrogate PET hydrolase libraries of 104-105 variants (per round) for protein solubility, thermostability, and activity via paired, plate-based split green fluorescent protein and model substrate screens. We then applied this platform to improve the performance of a benchmark PET hydrolase, leaf-branch compost cutinase, by directed evolution. Our engineered enzyme exhibited higher catalytic activity relative to the benchmark, LCC-ICCG, on amorphous PET film coupon substrates (∼9.4% crystallinity) in pH-controlled bioreactors at both 65 °C (8.5% higher conversion at 48 h and 38% higher maximum rate, at 2.9% substrate loading) and 68 °C (11.2% higher conversion at 48 h and 43% higher maximum rate, at 16.5% substrate loading), up to 48 h, highlighting the potential of this screening platform to accelerate enzyme development for PET recycling.
Collapse
Affiliation(s)
- Thomas
M. Groseclose
- Bioscience
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- BOTTLE
Consortium, Golden, Colorado 80401, United States
| | - Erin A. Kober
- Bioscience
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- BOTTLE
Consortium, Golden, Colorado 80401, United States
| | - Matilda Clark
- BOTTLE
Consortium, Golden, Colorado 80401, United States
- Centre
for Enzyme Innovation, School of the Environmental and Life Sciences, University of Portsmouth, Portsmouth, PO1 2DT, U.K.
| | - Benjamin Moore
- BOTTLE
Consortium, Golden, Colorado 80401, United States
- Centre
for Enzyme Innovation, School of the Environmental and Life Sciences, University of Portsmouth, Portsmouth, PO1 2DT, U.K.
| | - Shounak Banerjee
- Bioscience
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- BOTTLE
Consortium, Golden, Colorado 80401, United States
| | - Victoria Bemmer
- BOTTLE
Consortium, Golden, Colorado 80401, United States
- Centre
for Enzyme Innovation, School of the Environmental and Life Sciences, University of Portsmouth, Portsmouth, PO1 2DT, U.K.
| | - Gregg T. Beckham
- BOTTLE
Consortium, Golden, Colorado 80401, United States
- Renewable
Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Andrew R. Pickford
- BOTTLE
Consortium, Golden, Colorado 80401, United States
- Centre
for Enzyme Innovation, School of the Environmental and Life Sciences, University of Portsmouth, Portsmouth, PO1 2DT, U.K.
| | - Taraka T. Dale
- Bioscience
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- BOTTLE
Consortium, Golden, Colorado 80401, United States
| | - Hau B. Nguyen
- Bioscience
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- BOTTLE
Consortium, Golden, Colorado 80401, United States
| |
Collapse
|
26
|
Zhang L, Liu T. PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models. Int J Biol Macromol 2024; 281:136147. [PMID: 39357703 DOI: 10.1016/j.ijbiomac.2024.136147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 09/11/2024] [Accepted: 09/27/2024] [Indexed: 10/04/2024]
Abstract
Protein-DNA interactions play critical roles in various biological processes and are essential for drug discovery. However, traditional experimental methods are labor-intensive and unable to keep pace with the increasing volume of protein sequences, leading to a substantial number of proteins lacking DNA-binding annotations. Therefore, developing an efficient computational method to identify protein-DNA binding sites is crucial. Unfortunately, most existing computational methods rely on manually selected features or protein structure information, making these methods inapplicable to large-scale prediction tasks. In this study, we introduced PDNAPred, a sequence-based method that combines two pre-trained protein language models with a designed CNN-GRU network to identify DNA-binding sites. Additionally, to tackle the issue of imbalanced dataset samples, we employed focal loss. Our comprehensive experiments demonstrated that PDNAPred significantly improved the accuracy of DNA-binding site prediction, outperforming existing state-of-the-art sequence-based methods. Remarkably, PDNAPred also achieved results comparable to advanced structure-based methods. The designed CNN-GRU network enhances its capability to detect DNA-binding sites accurately. Furthermore, we validated the versatility of PDNAPred by training it on RNA-binding site datasets, showing its potential as a general framework for amino acid binding site prediction. Finally, we conducted model interpretability analysis to elucidate the reasons behind PDNAPred's outstanding performance.
Collapse
Affiliation(s)
- Lingrong Zhang
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
27
|
Bradley D, Hogrebe A, Dandage R, Dubé AK, Leutert M, Dionne U, Chang A, Villén J, Landry CR. The fitness cost of spurious phosphorylation. EMBO J 2024; 43:4720-4751. [PMID: 39256561 PMCID: PMC11480408 DOI: 10.1038/s44318-024-00200-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 09/12/2024] Open
Abstract
The fidelity of signal transduction requires the binding of regulatory molecules to their cognate targets. However, the crowded cell interior risks off-target interactions between proteins that are functionally unrelated. How such off-target interactions impact fitness is not generally known. Here, we use Saccharomyces cerevisiae to inducibly express tyrosine kinases. Because yeast lacks bona fide tyrosine kinases, the resulting tyrosine phosphorylation is biologically spurious. We engineered 44 yeast strains each expressing a tyrosine kinase, and quantitatively analysed their phosphoproteomes. This analysis resulted in ~30,000 phosphosites mapping to ~3500 proteins. The number of spurious pY sites generated correlates strongly with decreased growth, and we predict over 1000 pY events to be deleterious. However, we also find that many of the spurious pY sites have a negligible effect on fitness, possibly because of their low stoichiometry. This result is consistent with our evolutionary analyses demonstrating a lack of phosphotyrosine counter-selection in species with tyrosine kinases. Our results suggest that, alongside the risk for toxicity, the cell can tolerate a large degree of non-functional crosstalk as interaction networks evolve.
Collapse
Affiliation(s)
- David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rohan Dandage
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada
- Department of Biology, Université Laval, Québec, QC, Canada
| | - Alexis Chang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC, Canada.
- Department of Biochemistry, Microbiology and Bioinformatics, Université Laval, Québec, QC, Canada.
- Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Université du Québec à Montréal, Montréal, QC, Canada.
- Université Laval Big Data Research Center (BDRC_UL), Québec, QC, Canada.
- Department of Biology, Université Laval, Québec, QC, Canada.
| |
Collapse
|
28
|
Wang X, Zhang Z, Liu C. iACP-DFSRA: Identification of Anticancer Peptides Based on a Dual-channel Fusion Strategy of ResCNN and Attention. J Mol Biol 2024; 436:168810. [PMID: 39362624 DOI: 10.1016/j.jmb.2024.168810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/10/2024] [Accepted: 09/27/2024] [Indexed: 10/05/2024]
Abstract
Anticancer peptides (ACPs) have been widely applied in the treatment of cancer owing to good safety, rational side effects, and high selectivity. However, the number of ACPs that have been experimentally validated is limited as identification of ACPs is extremely expensive. Hence, accurate and cost-effective identification methods for ACPs are urgently needed. In this work, we proposed a deep learning-based model, named iACP-DFSRA, for ACPs identification. Specifically, we adopted two kinds of sequence embedding technologies, ProtBert_BFD pre-training language model and handcrafted features to encode protein sequences. Then, the LightGBM was used for feature selection, and the selected features were input into ResCNN and Attention mechanism, respectively, to extract local and global features. Finally, the concatenate features were deeply fused by using the Attention mechanism to allow key features to be paid more attention to by the model and make predictions by fully connected layer. The results of 10-fold cross-validation demonstrated that the iACP-DFSRA model delivered improved results in most metrics with Sp of 94.15%, Sn of 95.32%, Acc of 94.74% and MCC of 89.48% compared to the latest AACFlow model. Indeed, the iACP-DFSRA model is the only model with Acc > 90% and MCC > 80% on this independent test dataset. Furthermore, we have further demonstrated the superiority of our model on additional datasets. In addition, t-SNE and SHAP interpretation analysis demonstrated that it is crucial to use two channels for feature extraction and use the Attention mechanism for deep fusion, which helps the iACP-DFSRA to predict ACPs more effectively.
Collapse
Affiliation(s)
- Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China.
| | - Zimeng Zhang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Chang Liu
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
29
|
Mirabello C, Wallner B. DockQ v2: improved automatic quality measure for protein multimers, nucleic acids, and small molecules. Bioinformatics 2024; 40:btae586. [PMID: 39348158 PMCID: PMC11467047 DOI: 10.1093/bioinformatics/btae586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 09/24/2024] [Accepted: 09/27/2024] [Indexed: 10/01/2024] Open
Abstract
MOTIVATION It is important to assess the quality of modeled biomolecules to benchmark and assess the performance of different prediction methods. DockQ has emerged as the standard tool for assessing the quality of protein interfaces in model structures against given references. However, as predictions of large multimers with multiple chains become more common, DockQ needs to be updated with more functionality for robustness and speed. Moreover, as the field progresses and more methods are released to predict interactions between proteins and other types of molecules, such as nucleic acids and small molecules, it becomes necessary to have a tool that can assess all types of interactions. RESULTS Here, we present a complete reimplementation of DockQ in pure Python. The updated version of DockQ is more portable, faster and introduces novel functionalities, such as automatic DockQ calculations for multiple interfaces and automatic chain mapping with multi-threading. These enhancements are designed to facilitate comparative analyses of protein complexes, particularly large multi-chain complexes. Furthermore, DockQ is now also able to score interfaces between proteins, nucleic acids, and small molecules. AVAILABILITY AND IMPLEMENTATION DockQ v2 is available online at: https://wallnerlab.org/DockQ.
Collapse
Affiliation(s)
- Claudio Mirabello
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping, Sweden
- National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Linköping University, SE-581 83 Linköping, Sweden
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping, Sweden
| |
Collapse
|
30
|
Fleskes RE, Johnson SJ, Honap TP, Abin CA, Gilmore JK, Oubré L, Bueschgen WD, Abel SM, Ofunniyin AA, Lewis CM, Schurr TG. Oral microbial diversity in 18th century African individuals from South Carolina. Commun Biol 2024; 7:1213. [PMID: 39342044 PMCID: PMC11439080 DOI: 10.1038/s42003-024-06893-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 09/13/2024] [Indexed: 10/01/2024] Open
Abstract
As part of the Anson Street African Burial Ground Project, we characterized the oral microbiomes of twelve 18th century African-descended individuals (Ancestors) from Charleston, South Carolina, USA, to study their oral health and diet. We found that their oral microbiome composition resembled that of other historic (18th-19th century) dental calculus samples but differed from that of modern samples, and was not influenced by indicators of oral health and wear observed in the dentition. Phylogenetic analysis of the oral bacteria, Tannerella forsythia and Pseudoramibacter alactolyticus, revealed varied patterns of lineage diversity and replacement in the Americas, with the Ancestors carrying strains similar to historic period Europeans and Africans. Functional profiling of metabolic pathways suggested that the Ancestors consumed a diet low in animal protein. Overall, our study reveals important insights into the oral microbial histories of African-descended individuals, particularly oral health and diet in colonial North American enslavement contexts.
Collapse
Affiliation(s)
- Raquel E Fleskes
- Department of Anthropology, Dartmouth College, Hanover, NH, USA.
- The Anson Street African Burial Ground Project, Mount Pleasant, SC, USA.
| | - Sarah J Johnson
- Laboratories of Molecular Anthropology and Microbiome Research (LMAMR), University of Oklahoma, Norman, OK, USA
- Department of Anthropology, University of Oklahoma, Norman, OK, USA
| | - Tanvi P Honap
- Laboratories of Molecular Anthropology and Microbiome Research (LMAMR), University of Oklahoma, Norman, OK, USA
- Department of Anthropology, University of Oklahoma, Norman, OK, USA
| | - Christopher A Abin
- Laboratories of Molecular Anthropology and Microbiome Research (LMAMR), University of Oklahoma, Norman, OK, USA
- Department of Anthropology, University of Oklahoma, Norman, OK, USA
| | - Joanna K Gilmore
- The Anson Street African Burial Ground Project, Mount Pleasant, SC, USA
- Department of Sociology and Anthropology, College of Charleston, Charleston, SC, USA
| | - La'Sheia Oubré
- The Anson Street African Burial Ground Project, Mount Pleasant, SC, USA
| | | | - Suzanne M Abel
- Charleston County Coroner's Office, North Charleston, SC, USA
| | - Ade A Ofunniyin
- The Anson Street African Burial Ground Project, Mount Pleasant, SC, USA
- Department of Sociology and Anthropology, College of Charleston, Charleston, SC, USA
| | - Cecil M Lewis
- Laboratories of Molecular Anthropology and Microbiome Research (LMAMR), University of Oklahoma, Norman, OK, USA.
- Department of Anthropology, University of Oklahoma, Norman, OK, USA.
| | - Theodore G Schurr
- The Anson Street African Burial Ground Project, Mount Pleasant, SC, USA.
- Department of Anthropology, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
31
|
Cheng M, Xu Y, Cui X, Wei X, Chang Y, Xu J, Lei C, Xue L, Zheng Y, Wang Z, Huang L, Zheng M, Luo H, Leng Y, Jiang C. Deep longitudinal lower respiratory tract microbiome profiling reveals genome-resolved functional and evolutionary dynamics in critical illness. Nat Commun 2024; 15:8361. [PMID: 39333527 PMCID: PMC11436904 DOI: 10.1038/s41467-024-52713-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 09/18/2024] [Indexed: 09/29/2024] Open
Abstract
The lower respiratory tract (LRT) microbiome impacts human health, especially among critically ill patients. However, comprehensive characterizations of the LRT microbiome remain challenging due to low microbial mass and host contamination. We develop a chelex100-based low-biomass microbial-enrichment method (CMEM) that enables deep metagenomic profiling of LRT samples to recover near-complete microbial genomes. We apply the method to 453 longitudinal LRT samples from 157 intensive care unit (ICU) patients in three geographically distant hospitals. We recover 120 high-quality metagenome-assembled genomes (MAGs) and associated plasmids without culturing. We detect divergent longitudinal microbiome dynamics and hospital-specific dominant opportunistic pathogens and resistomes in pneumonia patients. Diagnosed pneumonia and the ICU stay duration were associated with the abundance of specific antibiotic-resistance genes (ARGs). Moreover, CMEM can serve as a robust tool for genome-resolved analyses. MAG-based analyses reveal strain-specific resistome and virulome among opportunistic pathogen strains. Evolutionary analyses discover increased mobilome in prevailing opportunistic pathogens, highly conserved plasmids, and new recombination hotspots associated with conjugative elements and prophages. Integrative analysis with epidemiological data reveals frequent putative inter-patient strain transmissions in ICUs. In summary, we present a genome-resolved functional, transmission, and evolutionary landscape of the LRT microbiota in critically ill patients.
Collapse
Affiliation(s)
- Minghui Cheng
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, 310030, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, 310009, China
| | - Yingjie Xu
- Department of Pulmonary and Critical Care Medicine, the Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Xiao Cui
- Department of Intensive Care Unit, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Xin Wei
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, 310030, China
| | - Yundi Chang
- Department of Intensive Care Unit, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Jun Xu
- Department of Critical Care Medicine, the First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Cheng Lei
- Department of Pulmonary and Critical Care Medicine, the Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Lei Xue
- Department of Intensive Care Unit, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China
| | - Yifan Zheng
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, 310030, China
| | - Zhang Wang
- School of Life Sciences, South China Normal University, Guangzhou, Guangdong Province, China
| | - Lingtong Huang
- Department of Critical Care Medicine, the First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Min Zheng
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, 310009, China
| | - Hong Luo
- Department of Pulmonary and Critical Care Medicine, the Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China.
| | - Yuxin Leng
- Department of Intensive Care Unit, Peking University Third Hospital, 49 North Garden Road, Haidian District, Beijing, 100191, China.
| | - Chao Jiang
- MOE Key Laboratory of Biosystems Homeostasis & Protection, and Zhejiang Provincial Key Laboratory of Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang, 310030, China.
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, 310009, China.
- Center for Life Sciences, Shaoxing Institute, Zhejiang University, Shaoxing, 321000, China.
| |
Collapse
|
32
|
Borges KCM, Costa VAF, Neves B, Kipnis A, Junqueira-Kipnis AP. New antibacterial candidates against Acinetobacter baumannii discovered by in silico-driven chemogenomics repurposing. PLoS One 2024; 19:e0307913. [PMID: 39325805 PMCID: PMC11426455 DOI: 10.1371/journal.pone.0307913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 07/14/2024] [Indexed: 09/28/2024] Open
Abstract
Acinetobacter baumannii is a worldwide Gram-negative bacterium with a high resistance rate, responsible for a broad spectrum of hospital-acquired infections. A computational chemogenomics framework was applied to investigate the repurposing of approved drugs to target A. baumannii. This comprehensive approach involved compiling and preparing proteomic data, identifying homologous proteins in drug-target databases, evaluating the evolutionary conservation of targets, and conducting molecular docking studies and in vitro assays. Seven drugs were selected for experimental assays. Among them, tavaborole exhibited the most promising antimicrobial activity with a minimum inhibitory concentration (MIC) value of 2 μg/ml, potent activity against several clinically relevant strains, and robust efficacy against biofilms from multidrug-resistant strains at a concentration of 16 μg/ml. Molecular docking studies elucidated the binding modes of tavaborole in the editing and active domains of leucyl-tRNA synthetase, providing insights into its structural basis for antimicrobial activity. Tavaborole shows promise as an antimicrobial agent for combating A. baumannii infections and warrants further investigation in preclinical studies.
Collapse
Affiliation(s)
- Kellen Christina Malheiros Borges
- Molecular Bacteriology Laboratory, Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
- Microbiology Laboratory, Department of Biology, Academic Areas, Federal Institute of Goiás, Anápolis, Goiás, Brazil
| | | | - Bruno Neves
- Laboratory of Cheminformatics, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Goiás, Brazil
| | - André Kipnis
- Molecular Bacteriology Laboratory, Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
| | - Ana Paula Junqueira-Kipnis
- Molecular Bacteriology Laboratory, Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
| |
Collapse
|
33
|
Zhang L, Liu T. PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network. Int J Biol Macromol 2024; 280:135762. [PMID: 39322150 DOI: 10.1016/j.ijbiomac.2024.135762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 09/03/2024] [Accepted: 09/16/2024] [Indexed: 09/27/2024]
Abstract
Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.
Collapse
Affiliation(s)
- Lingrong Zhang
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
34
|
Németh BZ, Kiss B, Sahin-Tóth M, Magyar C, Pál G. The High-Affinity Chymotrypsin Inhibitor Eglin C Poorly Inhibits Human Chymotrypsin-Like Protease: Gln192 and Lys218 Are Key Determinants. Proteins 2024. [PMID: 39301701 DOI: 10.1002/prot.26750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/17/2024] [Accepted: 09/06/2024] [Indexed: 09/22/2024]
Abstract
Eglin C, a small protein from the medicinal leech, has been long considered a general high-affinity inhibitor of chymotrypsins and elastases. Here, we demonstrate that eglin C inhibits human chymotrypsin-like protease (CTRL) weaker by several orders of magnitude than other chymotrypsins. In order to identify the underlying structural aspects of this unique deviation, we performed comparative molecular dynamics simulations on experimental and AlphaFold model structures of bovine CTRA and human CTRL. Our results indicate that in CTRL, the primary determinants of the observed weak inhibition are amino-acid positions 192 and 218 (using conventional chymotrypsin numbering), which participate in shaping the S1 substrate-binding pocket and thereby affect the stability of the protease-inhibitor complexes.
Collapse
Affiliation(s)
- Bálint Zoltán Németh
- Department of Biochemistry, ELTE Eötvös Loránd University, Budapest, Hungary
- Institute of Molecular Life Sciences, Protein Bioinformatics Research Group, Hungarian Research Network, Budapest, Hungary
| | - Bence Kiss
- Department of Biochemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Miklós Sahin-Tóth
- Department of Surgery, University of California Los Angeles, California, Los Angeles, USA
| | - Csaba Magyar
- Institute of Molecular Life Sciences, Protein Bioinformatics Research Group, Hungarian Research Network, Budapest, Hungary
| | - Gábor Pál
- Department of Biochemistry, ELTE Eötvös Loránd University, Budapest, Hungary
| |
Collapse
|
35
|
Kouraki A, Zheng AS, Miller S, Kelly A, Ashraf W, Bazzani D, Bonadiman A, Tonidandel G, Bolzan M, Vijay A, Nightingale J, Menni C, Ollivere BJ, Valdes AM. Metagenomic changes in response to antibiotic treatment in severe orthopedic trauma patients. iScience 2024; 27:110783. [PMID: 39286492 PMCID: PMC11403444 DOI: 10.1016/j.isci.2024.110783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/21/2024] [Accepted: 08/19/2024] [Indexed: 09/19/2024] Open
Abstract
We investigated changes in microbiome composition and abundance of antimicrobial resistance (AMR) genes post-antibiotic treatment in severe trauma patients. Shotgun sequencing revealed beta diversity (Bray-Curtis) differences between 16 hospitalized multiple rib fractures patients and 10 age- and sex-matched controls (p = 0.043), and between antibiotic-treated and untreated patients (p = 0.015). Antibiotic-treated patients had lower alpha diversity (Shannon) at discharge (p = 0.003) and 12-week post-discharge (p = 0.007). At 12 weeks, they also exhibited a 5.50-fold (95% confidence interval [CI]: 2.86-8.15) increase in Escherichia coli (p = 0.0004) compared to controls. Differential analysis identified nine AMRs that increased in antibiotic-treated compared to untreated patients between hospital discharge and 6 and 12 weeks follow-up (false discovery rate [FDR] < 0.20). Two aminoglycoside genes and a beta-lactamase gene were directly related to antibiotics administered, while five were unrelated. In trauma patients, lower alpha diversity, higher abundance of pathobionts, and increases in AMRs persisted for 12 weeks post-discharge, suggesting prolonged microbiome disruption. Probiotic or symbiotic therapies may offer future treatment avenues.
Collapse
Affiliation(s)
- Afroditi Kouraki
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Amy S Zheng
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Suzanne Miller
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Anthony Kelly
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Waheed Ashraf
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | | | | | | | | | - Amrita Vijay
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Jessica Nightingale
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Cristina Menni
- Department of Twin Research, King's College London, London SE1 7EH, UK
| | - Benjamin J Ollivere
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Ana M Valdes
- Academic Unit of Injury, Recovery and Inflammation Sciences, School of Medicine, University of Nottingham, Nottingham NG7 2UH, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals NHS Trust and the University of Nottingham, Nottingham NG7 2UH, UK
| |
Collapse
|
36
|
Cheng J, Pu Z, Chen J, Chen D, Li B, Wen Z, Jin Y, Yao Y, Shao K, Gu X, Yang G. Development of a green Komagataella phaffii cell factory for sustainable production of plant-derived sesquiterpene (-)-α-bisabolol. Synth Syst Biotechnol 2024; 10:120-126. [PMID: 39493337 PMCID: PMC11530781 DOI: 10.1016/j.synbio.2024.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 08/18/2024] [Accepted: 09/12/2024] [Indexed: 11/05/2024] Open
Abstract
(-)-α-Bisabolol is a plant-derived sesquiterpene derived from Eremanthus erythropappus, which can be used as a raw material in cosmetics and has anti-inflammatory function. In this study, we designed six mutation sites of the (-)-α-bisabolol synthase BOS using the plmDCA algorithm. Among these, the F324Y mutation demonstrated exceptional performance, increasing the product yield by 73 %. We constructed a de novo (-)-α-bisabolol biosynthesis pathways through systematic synthetic biology strategies, including the enzyme design of BOS, selection of different linkers in fusion expression, and optimization of the mevalonate pathway, weakening the branching metabolic flow and multi-copy strategies, the yield of (-)-α-bisabolol was significantly increased, which was nearly 35-fold higher than that of the original strain (2.03 mg/L). The engineered strain was capable of producing 69.7 mg/L in shake flasks. To the best of our knowledge, this is the first report on the biosynthesis of (-)-α-bisabolol in Komagataella phaffii, implying this is a robust cell factory for sustainable production of other terpenoids.
Collapse
Affiliation(s)
| | - Zhongji Pu
- Xianghu Laboratory, Hangzhou, 310027, China
| | - Jiali Chen
- Xianghu Laboratory, Hangzhou, 310027, China
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Dingfeng Chen
- Xianghu Laboratory, Hangzhou, 310027, China
- School of Food and Pharmacy, Zhejiang Ocean University, Zhoushan, 316022, China
| | - Baoxian Li
- Xianghu Laboratory, Hangzhou, 310027, China
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Zhengshun Wen
- Xianghu Laboratory, Hangzhou, 310027, China
- School of Food and Pharmacy, Zhejiang Ocean University, Zhoushan, 316022, China
| | - Yuanxiang Jin
- Xianghu Laboratory, Hangzhou, 310027, China
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, 310032, China
| | - Yanlai Yao
- Xianghu Laboratory, Hangzhou, 310027, China
| | - Kan Shao
- Department of Environmental and Occupational Health, School of Public Health, Indiana University, Bloomington, IN, 47405, USA
| | - Xiaosong Gu
- Hubei Province Key Lab Yeast Function, Yichang, 443003, China
| | - Guiling Yang
- Xianghu Laboratory, Hangzhou, 310027, China
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Laboratory (Hangzhou) for Risk Assessment of Agricultural Products of Ministry of Agriculture, Institute of Agro-product Safety and Nutrition, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021, Zhejiang, China
| |
Collapse
|
37
|
Anselmi NK, Vanyo ST, Clark ND, Rodriguez DML, Jones MM, Rosenthal S, Patel D, Marconi RT, Visser MB. Topology and functional characterization of major outer membrane proteins of Treponema maltophilum and Treponema lecithinolyticum. Mol Oral Microbiol 2024. [PMID: 39263909 DOI: 10.1111/omi.12484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 08/21/2024] [Accepted: 08/22/2024] [Indexed: 09/13/2024]
Abstract
Numerous Treponema species are prevalent in the dysbiotic subgingival microbial community during periodontitis. The major outer sheath protein is a highly expressed virulence factor of the well-characterized species Treponema denticola. Msp forms an oligomeric membrane protein complex with adhesin and porin properties and contributes to host-microbial interaction. Treponema maltophilum and Treponema lecithinolyticum species are also prominent during periodontitis but are relatively understudied. Msp-like membrane surface proteins exist in T. maltophilum (MspA) and T. lecithinolyticum (MspTL), but limited information exists regarding their structural features or functionality. Protein profiling reveals numerous differences between these species, but minimal differences between strains of the same species. Using protein modeling tools, we predict MspA and MspTL monomeric forms to be large β-barrel structures composed of 20 all-next-neighbor antiparallel β strands which most likely adopt a homotrimer formation. Using cell fractionation, Triton X-114 phase partitioning, heat modifiability, and chemical and detergent release assays, we found evidence of amphiphilic integral membrane-associated oligomerization for both native MspA and MspTL in intact spirochetes. Proteinase K accessibility and immunofluorescence assays demonstrate surface exposure of MspA and MspTL. Functionally, purified recombinant MspA or MspTL monomer proteins can impair neutrophil chemotaxis. Expressions of MspA or MspTL with a PelB leader sequence in Escherichia coli also demonstrate surface exposure and can impair neutrophil chemotaxis in an in vivo air pouch model of inflammation. Collectively, our data demonstrate that MspA and MspTL membrane proteins can contribute to pathogenesis of these understudied oral spirochete species.
Collapse
Affiliation(s)
- Natalie K Anselmi
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, New York, USA
| | - Stephen T Vanyo
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, New York, USA
| | - Nicholas D Clark
- Department of Structural Biology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, The State University of New York, Buffalo, New York, USA
| | - Dayron M Leyva Rodriguez
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, New York, USA
| | - Megan M Jones
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, New York, USA
| | - Sara Rosenthal
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, New York, USA
| | - Dhara Patel
- Department of Microbiology and Immunology, Virginia Commonwealth University Medical Center, Richmond, Virginia, USA
| | - Richard T Marconi
- Department of Microbiology and Immunology, Virginia Commonwealth University Medical Center, Richmond, Virginia, USA
| | - Michelle B Visser
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, The State University of New York, Buffalo, New York, USA
| |
Collapse
|
38
|
Zeng W, Dou Y, Pan L, Xu L, Peng S. Improving prediction performance of general protein language model by domain-adaptive pretraining on DNA-binding protein. Nat Commun 2024; 15:7838. [PMID: 39244557 PMCID: PMC11380688 DOI: 10.1038/s41467-024-52293-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open
Abstract
DNA-protein interactions exert the fundamental structure of many pivotal biological processes, such as DNA replication, transcription, and gene regulation. However, accurate and efficient computational methods for identifying these interactions are still lacking. In this study, we propose a method ESM-DBP through refining the DNA-binding protein sequence repertory and domain-adaptive pretraining based the general protein language model. Our method considers the lacking exploration of general language model for DNA-binding protein domain-specific knowledge, so we screen out 170,264 DNA-binding protein sequences to construct the domain-adaptive language model. Experimental results on four downstream tasks show that ESM-DBP provides a better feature representation of DNA-binding protein compared to the original language model, resulting in improved prediction performance and outperforming the state-of-the-art methods. Moreover, ESM-DBP can still perform well even for those sequences with only a few homologous sequences. ChIP-seq on two predicted cases further support the validity of the proposed method.
Collapse
Affiliation(s)
- Wenwu Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Yutao Dou
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Liangrui Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| | - Liwen Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China.
| |
Collapse
|
39
|
Ravikrishnan A, Wijaya I, Png E, Chng KR, Ho EXP, Ng AHQ, Mohamed Naim AN, Gounot JS, Guan SP, Hanqing JL, Guan L, Li C, Koh JY, de Sessions PF, Koh WP, Feng L, Ng TP, Larbi A, Maier AB, Kennedy BK, Nagarajan N. Gut metagenomes of Asian octogenarians reveal metabolic potential expansion and distinct microbial species associated with aging phenotypes. Nat Commun 2024; 15:7751. [PMID: 39237540 PMCID: PMC11377447 DOI: 10.1038/s41467-024-52097-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 08/23/2024] [Indexed: 09/07/2024] Open
Abstract
While rapid demographic changes in Asia are driving the incidence of chronic aging-related diseases, the limited availability of high-quality in vivo data hampers our ability to understand complex multi-factorial contributions, including gut microbial, to healthy aging. Leveraging a well-phenotyped cohort of community-living octogenarians in Singapore, we used deep shotgun-metagenomic sequencing for high-resolution taxonomic and functional characterization of their gut microbiomes (n = 234). Joint species-level analysis with other Asian cohorts identified distinct age-associated shifts characterized by reduction in microbial richness, and specific Alistipes and Bacteroides species enrichment (e.g., Alistipes shahii and Bacteroides xylanisolvens). Functional analysis confirmed these changes correspond to metabolic potential expansion in aging towards alternate pathways synthesizing and utilizing amino-acid precursors, vis-à-vis dominant microbial guilds producing butyrate in gut from pyruvate (e.g., Faecalibacterium prausnitzii, Roseburia inulinivorans). Extending these observations to key clinical markers helped identify >10 robust microbial associations to inflammation, cardiometabolic and liver health, including potential probiotic species (e.g., Parabacteroides goldsteinii) and pathobionts (e.g., Klebsiella pneumoniae), highlighting the microbiome's role as biomarkers and potential targets for promoting healthy aging.
Collapse
Affiliation(s)
- Aarthi Ravikrishnan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Indrik Wijaya
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Eileen Png
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Kern Rei Chng
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Eliza Xin Pei Ho
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Amanda Hui Qi Ng
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Ahmad Nazri Mohamed Naim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Jean-Sebastien Gounot
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Shou Ping Guan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Jasinda Lee Hanqing
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Lihuan Guan
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Chenhao Li
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Jia Yu Koh
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Paola Florez de Sessions
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
| | - Woon-Puay Koh
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
- Singapore Institute for Clinical Sciences (SICS), Agency for Science Technology and Research (A*STAR), 30 Medical Drive, Brenner Centre for Molecular Medicine, Singapore, 117609, Republic of Singapore
| | - Lei Feng
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Tze Pin Ng
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Anis Larbi
- Singapore Immunology Network (SigN), Agency for Science Technology and Research (A*STAR), 8A Biomedical Grove, Immunos, Singapore, 138648, Republic of Singapore
| | - Andrea B Maier
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
- Department of Human Movement Sciences, @AgeAmsterdam, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, The Netherlands
| | - Brian K Kennedy
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore.
| |
Collapse
|
40
|
Erckert K, Rost B. Assessing the role of evolutionary information for enhancing protein language model embeddings. Sci Rep 2024; 14:20692. [PMID: 39237735 PMCID: PMC11377704 DOI: 10.1038/s41598-024-71783-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.
Collapse
Affiliation(s)
- Kyra Erckert
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Burkhard Rost
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
41
|
Rimal P, Panday SK, Xu W, Peng Y, Alexov E. SAAMBE-MEM: a sequence-based method for predicting binding free energy change upon mutation in membrane protein-protein complexes. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae544. [PMID: 39240325 PMCID: PMC11407696 DOI: 10.1093/bioinformatics/btae544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 08/04/2024] [Accepted: 09/04/2024] [Indexed: 09/07/2024]
Abstract
MOTIVATION Mutations in protein-protein interactions can affect the corresponding complexes, impacting function and potentially leading to disease. Given the abundance of membrane proteins, it is crucial to assess the impact of mutations on the binding affinity of these proteins. Although several methods exist to predict the binding free energy change due to mutations in protein-protein complexes, most require structural information of the protein complex and are primarily trained on the SKEMPI database, which is composed mainly of soluble proteins. RESULTS A novel sequence-based method (SAAMBE-MEM) for predicting binding free energy changes (ΔΔG) in membrane protein-protein complexes due to mutations has been developed. This method utilized the MPAD database, which contains binding affinities for wild-type and mutant membrane protein complexes. A machine learning model was developed to predict ΔΔG by leveraging features such as amino acid indices and position-specific scoring matrices (PSSM). Through extensive dataset curation and feature extraction, SAAMBE-MEM was trained and validated using the XGBoost regression algorithm. The optimal feature set, including PSSM-related features, achieved a Pearson correlation coefficient of 0.64, outperforming existing methods trained on the SKEMPI database. Furthermore, it was demonstrated that SAAMBE-MEM performs much better when utilizing evolution-based features in contrast to physicochemical features. AVAILABILITY AND IMPLEMENTATION The method is accessible via a web server and standalone code at http://compbio.clemson.edu/SAAMBE-MEM/. The cleaned MPAD database is available at the website.
Collapse
Affiliation(s)
- Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| | - Wang Xu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, Hubei 430079, China
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, United States
| |
Collapse
|
42
|
Ahmed KT, Ansari MI, Zhang W. DTI-LM: language model powered drug-target interaction prediction. Bioinformatics 2024; 40:btae533. [PMID: 39221997 PMCID: PMC11520403 DOI: 10.1093/bioinformatics/btae533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 08/05/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024] Open
Abstract
MOTIVATION The identification and understanding of drug-target interactions (DTIs) play a pivotal role in the drug discovery and development process. Sequence representations of drugs and proteins in computational model offer advantages such as their widespread availability, easier input quality control, and reduced computational resource requirements. These make them an efficient and accessible tools for various computational biology and drug discovery applications. Many sequence-based DTI prediction methods have been developed over the years. Despite the advancement in methodology, cold start DTI prediction involving unknown drug or protein remains a challenging task, particularly for sequence-based models. Introducing DTI-LM, a novel framework leveraging advanced pretrained language models, we harness their exceptional context-capturing abilities along with neighborhood information to predict DTIs. DTI-LM is specifically designed to rely solely on sequence representations for drugs and proteins, aiming to bridge the gap between warm start and cold start predictions. RESULTS Large-scale experiments on four datasets show that DTI-LM can achieve state-of-the-art performance on DTI predictions. Notably, it excels in overcoming the common challenges faced by sequence-based models in cold start predictions for proteins, yielding impressive results. The incorporation of neighborhood information through a graph attention network further enhances prediction accuracy. Nevertheless, a disparity persists between cold start predictions for proteins and drugs. A detailed examination of DTI-LM reveals that language models exhibit contrasting capabilities in capturing similarities between drugs and proteins. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/compbiolabucf/DTI-LM.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Md Istiaq Ansari
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
43
|
Velecký J, Berezný M, Musil M, Damborsky J, Bednar D, Mazurenko S. BenchStab: a tool for automated querying of web-based stability predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae553. [PMID: 39259175 PMCID: PMC11427696 DOI: 10.1093/bioinformatics/btae553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 08/02/2024] [Accepted: 09/10/2024] [Indexed: 09/12/2024]
Abstract
SUMMARY Protein design requires information about how mutations affect protein stability. Many web-based predictors are available for this purpose, yet comparing them or using them en masse is difficult. Here, we present BenchStab, a console tool/Python package for easy and quick execution of 19 predictors and result collection on a list of mutants. Moreover, the tool is easily extensible with additional predictors. We created an independent dataset derived from the FireProtDB and evaluated 24 different prediction methods. AVAILABILITY AND IMPLEMENTATION BenchStab is an open-source Python package available at https://github.com/loschmidt/BenchStab with a detailed README and example usage at https://loschmidt.chemi.muni.cz/benchstab. The BenchStab dataset is available on Zenodo: https://zenodo.org/records/10637728.
Collapse
Affiliation(s)
- Jan Velecký
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
| | - Matej Berezný
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 00 Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, 612 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic
- International Clinical Research Centre, St. Anne's University Hospital, 656 91 Brno, Czech Republic
| |
Collapse
|
44
|
Kroll A, Niebuhr N, Butler G, Lercher MJ. SPOT: A machine learning model that predicts specific substrates for transport proteins. PLoS Biol 2024; 22:e3002807. [PMID: 39325691 PMCID: PMC11426516 DOI: 10.1371/journal.pbio.3002807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 08/13/2024] [Indexed: 09/28/2024] Open
Abstract
Transport proteins play a crucial role in cellular metabolism and are central to many aspects of molecular biology and medicine. Determining the function of transport proteins experimentally is challenging, as they become unstable when isolated from cell membranes. Machine learning-based predictions could provide an efficient alternative. However, existing methods are limited to predicting a small number of specific substrates or broad transporter classes. These limitations stem partly from using small data sets for model training and a choice of input features that lack sufficient information about the prediction problem. Here, we present SPOT, the first general machine learning model that can successfully predict specific substrates for arbitrary transport proteins, achieving an accuracy above 92% on independent and diverse test data covering widely different transporters and a broad range of metabolites. SPOT uses Transformer Networks to represent transporters and substrates numerically. To overcome the problem of missing negative data for training, it augments a large data set of known transporter-substrate pairs with carefully sampled random molecules as non-substrates. SPOT not only predicts specific transporter-substrate pairs, but also outperforms previously published models designed to predict broad substrate classes for individual transport proteins. We provide a web server and Python function that allows users to explore the substrate scope of arbitrary transporters.
Collapse
Affiliation(s)
- Alexander Kroll
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Nico Niebuhr
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| | - Gregory Butler
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, Canada
| | - Martin J Lercher
- Institute for Computer Science and Department of Biology, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
45
|
Gong X, Zhang J, Gan Q, Teng Y, Hou J, Lyu Y, Liu Z, Wu Z, Dai R, Zou Y, Wang X, Zhu D, Zhu H, Liu T, Yan Y. Advancing microbial production through artificial intelligence-aided biology. Biotechnol Adv 2024; 74:108399. [PMID: 38925317 DOI: 10.1016/j.biotechadv.2024.108399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 05/20/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
Microbial cell factories (MCFs) have been leveraged to construct sustainable platforms for value-added compound production. To optimize metabolism and reach optimal productivity, synthetic biology has developed various genetic devices to engineer microbial systems by gene editing, high-throughput protein engineering, and dynamic regulation. However, current synthetic biology methodologies still rely heavily on manual design, laborious testing, and exhaustive analysis. The emerging interdisciplinary field of artificial intelligence (AI) and biology has become pivotal in addressing the remaining challenges. AI-aided microbial production harnesses the power of processing, learning, and predicting vast amounts of biological data within seconds, providing outputs with high probability. With well-trained AI models, the conventional Design-Build-Test (DBT) cycle has been transformed into a multidimensional Design-Build-Test-Learn-Predict (DBTLP) workflow, leading to significantly improved operational efficiency and reduced labor consumption. Here, we comprehensively review the main components and recent advances in AI-aided microbial production, focusing on genome annotation, AI-aided protein engineering, artificial functional protein design, and AI-enabled pathway prediction. Finally, we discuss the challenges of integrating novel AI techniques into biology and propose the potential of large language models (LLMs) in advancing microbial production.
Collapse
Affiliation(s)
- Xinyu Gong
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jianli Zhang
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Qi Gan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Yuxi Teng
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Jixin Hou
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Yanjun Lyu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Zhengliang Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Zihao Wu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Runpeng Dai
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yusong Zou
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA
| | - Xianqiao Wang
- School of ECAM, College of Engineering, University of Georgia, Athens, GA 30602, USA
| | - Dajiang Zhu
- Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington 76019, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tianming Liu
- School of Computing, The University of Georgia, Athens, GA 30602, USA
| | - Yajun Yan
- School of Chemical, Materials, and Biomedical Engineering, College of Engineering, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
46
|
Gemler BT, Mukherjee C, Fullerton PA, Diggans J, Bartling C. A Sensitivity Study for Interpreting Nucleic Acid Sequence Screening Regulatory and Guidance Documentation: Toward a Foundational Synthetic Nucleic Acid Sequence Screening Framework. APPLIED BIOSAFETY 2024; 29:150-158. [PMID: 39372510 PMCID: PMC11447129 DOI: 10.1089/apb.2023.0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Objectives The primary objectives of this study were to develop an objective nucleic acid sequence screening framework and to leverage the framework for an empirical sensitivity study that measures the impact of ambiguities in regulatory and guidance documentation regarding the control of synthetic nucleic acids and screening of nucleic acid orders. Methods Foundational risk levels were constructed using the bioinformatic sequencing screening tool UltraSEQ. The risk levels range from high (corresponding to regulated sequences) to low (corresponding to nonregulated sequences of concern) to no-risk. A representative sequence data set (141,651 sequences) was constructed from publicly available synthetically derived sequences, and the percentage sequences in each risk level was determined, followed by the impact of changing key UltraSEQ parameters. Results The results of this study show that no-risk sequences represent 90-92% of sequences, and nonregulated sequences of concern represented 7-9% of the sequences regardless of the parameters. The parameter with the biggest impact on the number of sequences flagged was the minimum hit homology level, followed by minimum sequence region length, and finally uniqueness of the hit to a select agent sequence. Conclusion The results of this empirical study provide a greater understanding for gene synthesis providers, biosafety and biosecurity practitioners, and the scientific community regarding the impact of various interpretations of regulatory and guidance documentation. The risk level framework provides a foundation to build upon for nucleic acid sequence screening as the threat landscape evolves. However, additional development is needed to build tools that connect predictions across sequences and orders to provide contextual risk-based predictions.
Collapse
Affiliation(s)
| | | | | | - James Diggans
- Twist Bioscience Corporation, South San Francisco, California, USA
| | | |
Collapse
|
47
|
Yakoubi S. Synergistic integration of deep learning with protein docking in cardiovascular disease treatment strategies. IUBMB Life 2024; 76:666-696. [PMID: 38748776 DOI: 10.1002/iub.2819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 03/13/2024] [Indexed: 08/31/2024]
Abstract
This research delves into the exploration of the potential of tocopherol-based nanoemulsion as a therapeutic agent for cardiovascular diseases (CVD) through an in-depth molecular docking analysis. The study focuses on elucidating the molecular interactions between tocopherol and seven key proteins (1O8a, 4YAY, 4DLI, 1HW9, 2YCW, 1BO9 and 1CX2) that play pivotal roles in CVD development. Through rigorous in silico docking investigations, assessment was conducted on the binding affinities, inhibitory potentials and interaction patterns of tocopherol with these target proteins. The findings revealed significant interactions, particularly with 4YAY, displaying a robust binding energy of -6.39 kcal/mol and a promising Ki value of 20.84 μM. Notable interactions were also observed with 1HW9, 4DLI, 2YCW and 1CX2, further indicating tocopherol's potential therapeutic relevance. In contrast, no interaction was observed with 1BO9. Furthermore, an examination of the common residues of 4YAY bound to tocopherol was carried out, highlighting key intermolecular hydrophobic bonds that contribute to the interaction's stability. Tocopherol complies with pharmacokinetics (Lipinski's and Veber's) rules for oral bioavailability and proves safety non-toxic and non-carcinogenic. Thus, deep learning-based protein language models ESM1-b and ProtT5 were leveraged for input encodings to predict interaction sites between the 4YAY protein and tocopherol. Hence, highly accurate predictions of these critical protein-ligand interactions were achieved. This study not only advances the understanding of these interactions but also highlights deep learning's immense potential in molecular biology and drug discovery. It underscores tocopherol's promise as a cardiovascular disease management candidate, shedding light on its molecular interactions and compatibility with biomolecule-like characteristics.
Collapse
Affiliation(s)
- Sana Yakoubi
- Faculty of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
- Alliance for Research on the Mediterranean North Africa (ARENA), University of Tsukuba, Ibaraki, Japan
- University of Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
48
|
Yang S, Cheng P, Liu Y, Feng D, Wang S. Exploring the Knowledge of an Outstanding Protein to Protein Interaction Transformer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1287-1298. [PMID: 38536676 DOI: 10.1109/tcbb.2024.3381825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Protein-to-protein interaction (PPI) prediction aims to predict whether two given proteins interact or not. Compared with traditional experimental methods of high cost and low efficiency, the current deep learning based approach makes it possible to discover massive potential PPIs from large-scale databases. However, deep PPI prediction models perform poorly on unseen species, as their proteins are not in the training set. Targetting on this issue, the paper first proposes PPITrans, a Transformer based PPI prediction model that exploits a language model pre-trained on proteins to conduct binary PPI prediction. To validate the effectiveness on unseen species, PPITrans is trained with Human PPIs and tested on PPIs of other species. Experimental results show that PPITrans significantly outperforms the previous state-of-the-art on various metrics, especially on PPIs of unseen species. For example, the AUPR improves 0.339 absolutely on Fly PPIs. Aiming to explore the knowledge learned by PPITrans from PPI data, this paper also designs a series of probes belonging to three categories. Their results reveal several interesting findings, like that although PPITrans cannot capture the spatial structure of proteins, it can obtain knowledge of PPI type and binding affinity, learning more than binary PPI.
Collapse
|
49
|
Santos AS, Costa VAF, Freitas VAQ, Dos Anjos LRB, de Almeida Santos ES, Arantes TD, Costa CR, de Sene Amâncio Zara AL, do Rosário Rodrigues Silva M, Neves BJ. Drug to genome to drug: a computational large-scale chemogenomics screening for novel drug candidates against sporotrichosis. Braz J Microbiol 2024; 55:2655-2667. [PMID: 38888692 PMCID: PMC11405749 DOI: 10.1007/s42770-024-01406-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/28/2024] [Indexed: 06/20/2024] Open
Abstract
Sporotrichosis is recognized as the predominant subcutaneous mycosis in South America, attributed to pathogenic species within the Sporothrix genus. Notably, in Brazil, Sporothrix brasiliensis emerges as the principal species, exhibiting significant sapronotic, zoonotic and enzootic epidemic potential. Consequently, the discovery of novel therapeutic agents for the treatment of sporotrichosis is imperative. The present study is dedicated to the repositioning of pharmaceuticals for sporotrichosis therapy. To achieve this goal, we designed a pipeline with the following steps: (a) compilation and preparation of Sporothrix genome data; (b) identification of orthologous proteins among the species; (c) identification of homologous proteins in publicly available drug-target databases; (d) selection of Sporothrix essential targets using validated genes from Saccharomyces cerevisiae; (e) molecular modeling studies; and (f) experimental validation of selected candidates. Based on this approach, we were able to prioritize eight drugs for in vitro experimental validation. Among the evaluated compounds, everolimus and bifonazole demonstrated minimum inhibitory concentration (MIC) values of 0.5 µg/mL and 4.0 µg/mL, respectively. Subsequently, molecular docking studies suggest that bifonazole and everolimus may target specific proteins within S. brasiliensis- namely, sterol 14-α-demethylase and serine/threonine-protein kinase TOR, respectively. These findings shed light on the potential binding affinities and binding modes of bifonazole and everolimus with their probable targets, providing a preliminary understanding of the antifungal mechanism of action of these compounds. In conclusion, our research advances the understanding of the therapeutic potential of bifonazole and everolimus, supporting their further investigation as antifungal agents for sporotrichosis in prospective hit-to-lead and preclinical investigations.
Collapse
Affiliation(s)
- Andressa Santana Santos
- Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
- Laboratory of Cheminformatics, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Goiás, Brazil
| | | | | | - Laura Raniere Borges Dos Anjos
- Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
- Laboratory of Cheminformatics, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Goiás, Brazil
| | | | - Thales Domingos Arantes
- Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
| | - Carolina Rodrigues Costa
- Institute of Tropical Pathology and Public Health, Federal University of Goiás, Goiânia, Goiás, Brazil
| | - Ana Laura de Sene Amâncio Zara
- Postgraduate Program in Health Technology Assistance and Assessment (PPG-AAS), Faculty of Pharmacy, Federal University of Goiás, Goiânia, Goiás, Brazil
| | | | - Bruno Junior Neves
- Laboratory of Cheminformatics, Faculty of Pharmacy, Federal University of Goiás, Goiânia, Goiás, Brazil.
| |
Collapse
|
50
|
Walsh C, Vanderburgh C, Grant L, Katz E, Kliebenstein DJ, Fierer N. Microbial terroir: associations between soil microbiomes and the flavor chemistry of mustard (Brassica juncea). THE NEW PHYTOLOGIST 2024; 243:1951-1965. [PMID: 38553428 DOI: 10.1111/nph.19708] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 03/05/2024] [Indexed: 08/02/2024]
Abstract
Here, we characterized the independent role of soil microbiomes (bacterial and fungal communities) in determining the flavor chemistry of harvested mustard seed (Brassica juncea). Given the known impacts of soil microbial communities on various plant characteristics, we hypothesized that differences in rhizosphere microbiomes would result in differences in seed flavor chemistry (glucosinolate content). In a glasshouse study, we introduced distinct soil microbial communities to mustard plants growing in an otherwise consistent environment. At the end of the plant life cycle, we characterized the rhizosphere and root microbiomes and harvested produced mustard seeds for chemical characterization. Specifically, we measured the concentrations of glucosinolates, secondary metabolites known to create spicy and bitter flavors. We examined associations between rhizosphere microbial taxa or genes and seed flavor chemistry. We identified links between the rhizosphere microbial community composition and the concentration of the main glucosinolate, allyl, in seeds. We further identified specific rhizosphere taxa predictive of seed allyl concentration and identified bacterial functional genes, namely genes for sulfur metabolism, which could partly explain the observed associations. Together, this work offers insight into the potential influence of the belowground microbiome on the flavor of harvested crops.
Collapse
Affiliation(s)
- Corinne Walsh
- Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO, 80309, USA
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Caihong Vanderburgh
- Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO, 80309, USA
| | - Lady Grant
- Department of Soil and Crop Sciences, Colorado State University, Fort Collins, CO, 80523, USA
| | - Ella Katz
- Department of Plant Sciences, University of California Davis, Davis, CA, 95616, USA
| | | | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, CO, 80309, USA
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, 80309, USA
| |
Collapse
|