1
|
Bizzotto E, Fraulini S, Zampieri G, Orellana E, Treu L, Campanaro S. MICROPHERRET: MICRObial PHEnotypic tRait ClassifieR using Machine lEarning Techniques. ENVIRONMENTAL MICROBIOME 2024; 19:58. [PMID: 39113074 PMCID: PMC11308548 DOI: 10.1186/s40793-024-00600-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 07/24/2024] [Indexed: 08/10/2024]
Abstract
BACKGROUND In recent years, there has been a rapid increase in the number of microbial genomes reconstructed through shotgun sequencing, and obtained by newly developed approaches including metagenomic binning and single-cell sequencing. However, our ability to functionally characterize these genomes by experimental assays is orders of magnitude less efficient. Consequently, there is a pressing need for the development of swift and automated strategies for the functional classification of microbial genomes. RESULTS The present work leverages a suite of supervised machine learning algorithms to establish a range of 86 metabolic and other ecological functions, such as methanotrophy and plastic degradation, starting from widely obtainable microbial genome annotations. Tests performed on independent datasets demonstrated robust performance across complete, fragmented, and incomplete genomes above a 70% completeness level for most of the considered functions. Application of the algorithms to the Biogas Microbiome database yielded predictions broadly consistent with current biological knowledge and correctly detecting functionally-related nuances of archaeal genomes. Finally, a case study focused on acetoclastic methanogenesis demonstrated how the developed machine learning models can be refined or expanded with models describing novel functions of interest. CONCLUSIONS The resulting tool, MICROPHERRET, incorporates a total of 86 models, one for each tested functional class, and can be applied to high-quality microbial genomes as well as to low-quality genomes derived from metagenomics and single-cell sequencing. MICROPHERRET can thus aid in understanding the functional role of newly generated genomes within their micro-ecological context.
Collapse
Affiliation(s)
- Edoardo Bizzotto
- Department of Biology, University of Padova, Padova, 35131, Italy
| | - Sofia Fraulini
- Department of Biology, University of Padova, Padova, 35131, Italy
| | - Guido Zampieri
- Department of Biology, University of Padova, Padova, 35131, Italy.
| | - Esteban Orellana
- Department of Biology, University of Padova, Padova, 35131, Italy
| | - Laura Treu
- Department of Biology, University of Padova, Padova, 35131, Italy
| | | |
Collapse
|
2
|
Cai G, Xu J, Zhang C, Jiang J, Chen G, Chen J, Liu Q, Xu G, Lan Y. Identifying biomarkers related to motor function in chronic stroke: A fNIRS and TMS study. CNS Neurosci Ther 2024; 30:e14889. [PMID: 39073240 DOI: 10.1111/cns.14889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 06/07/2024] [Accepted: 07/17/2024] [Indexed: 07/30/2024] Open
Abstract
BACKGROUND Upper limb motor impairment commonly occurs after stroke, impairing quality of life. Brain network reorganization likely differs between subgroups with differing impairment severity. This study explored differences in functional connectivity (FC) and corticospinal tract (CST) integrity between patients with mild/moderate versus severe hemiplegia poststroke to clarify the neural correlates underlying motor deficits. METHOD Sixty chronic stroke patients with upper limb motor impairment were categorized into mild/moderate and severe groups based on Fugl-Meyer scores. Resting-state FC was assessed using functional near-infrared spectroscopy (fNIRS) to compare connectivity patterns between groups across motor regions. CST integrity was evaluated by inducing motor evoked potentials (MEP) via transcranial magnetic stimulation. RESULTS Compared to the mild/moderate group, the severe group exhibited heightened premotor cortex-primary motor cortex (PMC-M1) connectivity (t = 4.56, p < 0.01). Absence of MEP was also more frequent in the severe group (χ2 = 12.31, p = 0.01). Bayesian models effectively distinguished subgroups and identified the PMC-M1 connection as highly contributory (accuracy = 91.30%, area under the receiver operating characteristic curve [AUC] = 0.86). CONCLUSION Distinct patterns of connectivity and corticospinal integrity exist between stroke subgroups with differing impairments. Strengthened connectivity potentially indicates recruitment of additional motor resources to compensate for damage. These findings elucidate the neural correlates underlying motor deficits poststroke and could guide personalized, network-based therapies targeting predictive biomarkers to improve rehabilitation outcomes.
Collapse
Affiliation(s)
- Guiyuan Cai
- Department of Rehabilitation Medicine, School of Medicine, The Second Affiliated Hospital, South China University of Technology, Guangzhou, China
- Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Jiayue Xu
- Department of Rehabilitation Medicine, School of Medicine, The Second Affiliated Hospital, South China University of Technology, Guangzhou, China
- Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Cailing Zhang
- Department of Rehabilitation Medicine, School of Medicine, The Second Affiliated Hospital, South China University of Technology, Guangzhou, China
- Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Junbo Jiang
- Department of Rehabilitation Medicine, School of Medicine, The Second Affiliated Hospital, South China University of Technology, Guangzhou, China
- Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
| | - Gengbin Chen
- Postgraduate Research Institute, Guangzhou Sport University, Guangzhou, China
| | - Jialin Chen
- Postgraduate Research Institute, Guangzhou Sport University, Guangzhou, China
| | - Quan Liu
- Postgraduate Research Institute, Guangzhou Sport University, Guangzhou, China
| | - Guangqing Xu
- Department of Rehabilitation Medicine, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
| | - Yue Lan
- Department of Rehabilitation Medicine, School of Medicine, The Second Affiliated Hospital, South China University of Technology, Guangzhou, China
- Guangzhou First People's Hospital, Guangzhou Medical University, Guangzhou, China
- Guangzhou Key Laboratory of Aging Frailty and Neurorehabilitation, Guangzhou, China
| |
Collapse
|
3
|
Kaur J, Verma H, Kaur J, Lata P, Dhingra GG, Lal R. In Silico Analysis of the Phylogenetic and Physiological Characteristics of Sphingobium indicum B90A: A Hexachlorocyclohexane-Degrading Bacterium. Curr Microbiol 2024; 81:233. [PMID: 38904756 DOI: 10.1007/s00284-024-03762-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 05/27/2024] [Indexed: 06/22/2024]
Abstract
The study focuses on the in silico genomic characterization of Sphingobium indicum B90A, revealing a wealth of genes involved in stress response, carbon monoxide oxidation, β-carotene biosynthesis, heavy metal resistance, and aromatic compound degradation, suggesting its potential as a bioremediation agent. Furthermore, genomic adaptations among nine Sphingomonad strains were explored, highlighting shared core genes via pangenome analysis, including those related to the shikimate pathway and heavy metal resistance. The majority of genes associated with aromatic compound degradation, heavy metal resistance, and stress response were found within genomic islands across all strains. Sphingobium indicum UT26S exhibited the highest number of genomic islands, while Sphingopyxis alaskensis RB2256 had the maximum fraction of its genome covered by genomic islands. The distribution of lin genes varied among the strains, indicating diverse genetic responses to environmental pressures. Additionally, in silico evidence of horizontal gene transfer (HGT) between plasmids pSRL3 and pISP3 of the Sphingobium and Sphingomonas genera, respectively, has been provided. The manuscript offers novel insights into strain B90A, highlighting its role in horizontal gene transfer and refining evolutionary relationships among Sphingomonad strains. The discovery of stress response genes and the czcABCD operon emphasizes the potential of Sphingomonads in consortia development, supported by genomic island analysis.
Collapse
Affiliation(s)
- Jasvinder Kaur
- Department of Zoology, Gargi College, Siri Fort Road, New Delhi, 110049, India.
| | - Helianthous Verma
- Department of Zoology, Ramjas College, University of Delhi, New Delhi, 110007, India
| | - Jaspreet Kaur
- Department of Zoology, Maitreyi College, University of Delhi, New Delhi, 110021, India
| | - Pushp Lata
- Department of Zoology, University of Delhi, New Delhi, 110007, India
| | - Gauri Garg Dhingra
- Department of Zoology, Kirori Mal College, University of Delhi, New Delhi, 110007, India
| | - Rup Lal
- Acharya Narendra Dev College, University of Delhi, New Delhi, 110019, India.
| |
Collapse
|
4
|
Olenginski LT, Spradlin SF, Batey RT. Flipping the script: Understanding riboswitches from an alternative perspective. J Biol Chem 2024; 300:105730. [PMID: 38336293 PMCID: PMC10907184 DOI: 10.1016/j.jbc.2024.105730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/14/2024] [Accepted: 01/19/2024] [Indexed: 02/12/2024] Open
Abstract
Riboswitches are broadly distributed regulatory elements most frequently found in the 5'-leader sequence of bacterial mRNAs that regulate gene expression in response to the binding of a small molecule effector. The occupancy status of the ligand-binding aptamer domain manipulates downstream information in the message that instructs the expression machinery. Currently, there are over 55 validated riboswitch classes, where each class is defined based on the identity of the ligand it binds and/or sequence and structure conservation patterns within the aptamer domain. This classification reflects an "aptamer-centric" perspective that dominates our understanding of riboswitches. In this review, we propose a conceptual framework that groups riboswitches based on the mechanism by which RNA manipulates information directly instructing the expression machinery. This scheme does not replace the established aptamer domain-based classification of riboswitches but rather serves to facilitate hypothesis-driven investigation of riboswitch regulatory mechanisms. Based on current bioinformatic, structural, and biochemical studies of a broad spectrum of riboswitches, we propose three major mechanistic groups: (1) "direct occlusion", (2) "interdomain docking", and (3) "strand exchange". We discuss the defining features of each group, present representative examples of riboswitches from each group, and illustrate how these RNAs couple small molecule binding to gene regulation. While mechanistic studies of the occlusion and docking groups have yielded compelling models for how these riboswitches function, much less is known about strand exchange processes. To conclude, we outline the limitations of our mechanism-based conceptual framework and discuss how critical information within riboswitch expression platforms can inform gene regulation.
Collapse
Affiliation(s)
| | | | - Robert T Batey
- Department of Biochemistry, University of Colorado, Boulder, Colorado, USA.
| |
Collapse
|
5
|
Hallee L, Rafailidis N, Gleghorn JP. cdsBERT - Extending Protein Language Models with Codon Awareness. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.15.558027. [PMID: 37745387 PMCID: PMC10516008 DOI: 10.1101/2023.09.15.558027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Recent advancements in Protein Language Models (pLMs) have enabled high-throughput analysis of proteins through primary sequence alone. At the same time, newfound evidence illustrates that codon usage bias is remarkably predictive and can even change the final structure of a protein. Here, we explore these findings by extending the traditional vocabulary of pLMs from amino acids to codons to encapsulate more information inside CoDing Sequences (CDS). We build upon traditional transfer learning techniques with a novel pipeline of token embedding matrix seeding, masked language modeling, and student-teacher knowledge distillation, called MELD. This transformed the pretrained ProtBERT into cdsBERT; a pLM with a codon vocabulary trained on a massive corpus of CDS. Interestingly, cdsBERT variants produced a highly biochemically relevant latent space, outperforming their amino acid-based counterparts on enzyme commission number prediction. Further analysis revealed that synonymous codon token embeddings moved distinctly in the embedding space, showcasing unique additions of information across broad phylogeny inside these traditionally "silent" mutations. This embedding movement correlated significantly with average usage bias across phylogeny. Future fine-tuned organism-specific codon pLMs may potentially have a more significant increase in codon usage fidelity. This work enables an exciting potential in using the codon vocabulary to improve current state-of-the-art structure and function prediction that necessitates the creation of a codon pLM foundation model alongside the addition of high-quality CDS to large-scale protein databases.
Collapse
Affiliation(s)
- Logan Hallee
- Center for Bioinformatics and Computational Biology, University of Delaware
| | | | | |
Collapse
|