1
|
Bao LX, Luo ZM, Zhu XL, Xu YY. Automated identification of protein expression intensity and classification of protein cellular locations in mouse brain regions from immunofluorescence images. Med Biol Eng Comput 2024; 62:1105-1119. [PMID: 38150111 DOI: 10.1007/s11517-023-02985-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 11/28/2023] [Indexed: 12/28/2023]
Abstract
Knowledge of protein expression in mammalian brains at regional and cellular levels can facilitate understanding of protein functions and associated diseases. As the mouse brain is a typical mammalian brain considering cell type and structure, several studies have been conducted to analyze protein expression in mouse brains. However, labeling protein expression using biotechnology is costly and time-consuming. Therefore, automated models that can accurately recognize protein expression are needed. Here, we constructed machine learning models to automatically annotate the protein expression intensity and cellular location in different mouse brain regions from immunofluorescence images. The brain regions and sub-regions were segmented through learning image features using an autoencoder and then performing K-means clustering and registration to align with the anatomical references. The protein expression intensities for those segmented structures were computed on the basis of the statistics of the image pixels, and patch-based weakly supervised methods and multi-instance learning were used to classify the cellular locations. Results demonstrated that the models achieved high accuracy in the expression intensity estimation, and the F1 score of the cellular location prediction was 74.5%. This work established an automated pipeline for analyzing mouse brain images and provided a foundation for further study of protein expression and functions.
Collapse
Affiliation(s)
- Lin-Xia Bao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Zhuo-Ming Luo
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Xi-Liang Zhu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China
| | - Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China.
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou, 510515, China.
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510623, China.
| |
Collapse
|
2
|
Ferreira MADM, Silveira WBD, Nikoloski Z. Protein constraints in genome-scale metabolic models: Data integration, parameter estimation, and prediction of metabolic phenotypes. Biotechnol Bioeng 2024; 121:915-930. [PMID: 38178617 DOI: 10.1002/bit.28650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 10/24/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
Genome-scale metabolic models provide a valuable resource to study metabolism and cell physiology. These models are employed with approaches from the constraint-based modeling framework to predict metabolic and physiological phenotypes. The prediction performance of genome-scale metabolic models can be improved by including protein constraints. The resulting protein-constrained models consider data on turnover numbers (kcat ) and facilitate the integration of protein abundances. In this systematic review, we present and discuss the current state-of-the-art regarding the estimation of kinetic parameters used in protein-constrained models. We also highlight how data-driven and constraint-based approaches can aid the estimation of turnover numbers and their usage in improving predictions of cellular phenotypes. Finally, we identify standing challenges in protein-constrained metabolic models and provide a perspective regarding future approaches to improve the predictive performance.
Collapse
Affiliation(s)
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modeling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| |
Collapse
|
3
|
Moura Ferreira MAD, Wendering P, Arend M, Batista da Silveira W, Nikoloski Z. Accurate prediction of in vivo protein abundances by coupling constraint-based modelling and machine learning. Metab Eng 2023; 80:184-192. [PMID: 37802292 DOI: 10.1016/j.ymben.2023.09.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/10/2023] [Accepted: 09/25/2023] [Indexed: 10/08/2023]
Abstract
Quantification of how different environmental cues affect protein allocation can provide important insights for understanding cell physiology. While absolute quantification of proteins can be obtained by resource-intensive mass-spectrometry-based technologies, prediction of protein abundances offers another way to obtain insights into protein allocation. Here we present CAMEL, a framework that couples constraint-based modelling with machine learning to predict protein abundance for any environmental condition. This is achieved by building machine learning models that leverage static features, derived from protein sequences, and condition-dependent features predicted from protein-constrained metabolic models. Our findings demonstrate that CAMEL results in excellent prediction of protein allocation in E. coli (average Pearson correlation of at least 0.9), and moderate performance in S. cerevisiae (average Pearson correlation of at least 0.5). Therefore, CAMEL outperformed contending approaches without using molecular read-outs from unseen conditions and provides a valuable tool for using protein allocation in biotechnological applications.
Collapse
Affiliation(s)
| | - Philipp Wendering
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | - Marius Arend
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany.
| |
Collapse
|
4
|
Ferreira MADM, da Silveira WB, Nikoloski Z. PARROT: Prediction of enzyme abundances using protein-constrained metabolic models. PLoS Comput Biol 2023; 19:e1011549. [PMID: 37856550 PMCID: PMC10617714 DOI: 10.1371/journal.pcbi.1011549] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 10/31/2023] [Accepted: 09/29/2023] [Indexed: 10/21/2023] Open
Abstract
Protein allocation determines the activity of cellular pathways and affects growth across all organisms. Therefore, different experimental and machine learning approaches have been developed to quantify and predict protein abundance and how they are allocated to different cellular functions, respectively. Yet, despite advances in protein quantification, it remains challenging to predict condition-specific allocation of enzymes in metabolic networks. Here, using protein-constrained metabolic models, we propose a family of constrained-based approaches, termed PARROT, to predict how much of each enzyme is used based on the principle of minimizing the difference between a reference and an alternative growth condition. To this end, PARROT variants model the minimization of enzyme reallocation using four different (combinations of) distance functions. We demonstrate that the PARROT variant that minimizes the Manhattan distance between the enzyme allocation of a reference and an alternative condition outperforms existing approaches based on the parsimonious distribution of fluxes or enzymes for both Escherichia coli and Saccharomyces cerevisiae. Further, we show that the combined minimization of flux and enzyme allocation adjustment leads to inconsistent predictions. Together, our findings indicate that minimization of protein allocation rather than flux redistribution is a governing principle determining steady-state pathway activity for microorganism grown in alternative growth conditions.
Collapse
Affiliation(s)
| | | | - Zoran Nikoloski
- Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, Germany
- Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, Germany
| |
Collapse
|
5
|
Wang X, Chen C, Yan J, Xu Y, Pan D, Wang L, Yang M. Druggability of Targets for Diagnostic Radiopharmaceuticals. ACS Pharmacol Transl Sci 2023; 6:1107-1119. [PMID: 37588760 PMCID: PMC10425999 DOI: 10.1021/acsptsci.3c00081] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Indexed: 08/18/2023]
Abstract
Targets play an indispensable and pivotal role in the development of radiopharmaceuticals. However, the initial stages of drug discovery projects are often plagued by frequent failures due to inadequate information on druggability and suboptimal target selection. In this context, we aim to present a comprehensive review of the factors that influence target druggability for diagnostic radiopharmaceuticals. Specifically, we explore the crucial determinants of target specificity, abundance, localization, and positivity rate and their respective implications. Through a detailed analysis of existing protein targets, we elucidate the significance of each factor. By carefully considering and balancing these factors during the selection of targets, more efficacious and targeted radiopharmaceuticals are expected to be designed for the diagnosis of a wide range of diseases in the future.
Collapse
Affiliation(s)
- Xinyu Wang
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
- School
of Pharmacy, Nanjing Medical University, Nanjing 211166, PR China
| | - Chongyang Chen
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
| | - Junjie Yan
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
- School
of Pharmacy, Nanjing Medical University, Nanjing 211166, PR China
| | - Yuping Xu
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
- School
of Pharmacy, Nanjing Medical University, Nanjing 211166, PR China
| | - Donghui Pan
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
| | - Lizhen Wang
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
| | - Min Yang
- NHC
Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular
Nuclear Medicine, Jiangsu Institute of Nuclear
Medicine, Wuxi 214063, PR China
- School
of Pharmacy, Nanjing Medical University, Nanjing 211166, PR China
| |
Collapse
|
6
|
Höllerer S, Jeschek M. Ultradeep characterisation of translational sequence determinants refutes rare-codon hypothesis and unveils quadruplet base pairing of initiator tRNA and transcript. Nucleic Acids Res 2023; 51:2377-2396. [PMID: 36727459 PMCID: PMC10018350 DOI: 10.1093/nar/gkad040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 12/05/2022] [Accepted: 01/13/2023] [Indexed: 02/03/2023] Open
Abstract
Translation is a key determinant of gene expression and an important biotechnological engineering target. In bacteria, 5'-untranslated region (5'-UTR) and coding sequence (CDS) are well-known mRNA parts controlling translation and thus cellular protein levels. However, the complex interaction of 5'-UTR and CDS has so far only been studied for few sequences leading to non-generalisable and partly contradictory conclusions. Herein, we systematically assess the dynamic translation from over 1.2 million 5'-UTR-CDS pairs in Escherichia coli to investigate their collective effect using a new method for ultradeep sequence-function mapping. This allows us to disentangle and precisely quantify effects of various sequence determinants of translation. We find that 5'-UTR and CDS individually account for 53% and 20% of variance in translation, respectively, and show conclusively that, contrary to a common hypothesis, tRNA abundance does not explain expression changes between CDSs with different synonymous codons. Moreover, the obtained large-scale data provide clear experimental evidence for a base-pairing interaction between initiator tRNA and mRNA beyond the anticodon-codon interaction, an effect that is often masked for individual sequences and therefore inaccessible to low-throughput approaches. Our study highlights the indispensability of ultradeep sequence-function mapping to accurately determine the contribution of parts and phenomena involved in gene regulation.
Collapse
Affiliation(s)
- Simon Höllerer
- Department of Biosystems Science and Engineering, Swiss Federal Institute of Technology – ETH Zurich, Basel CH-4058, Switzerland
| | - Markus Jeschek
- To whom correspondence should be addressed. Tel: +49 941 943 3161; Fax: +49 941 943 2403;
| |
Collapse
|
7
|
Korenskaia AE, Matushkin YG, Lashin SA, Klimenko AI. Bioinformatic Assessment of Factors Affecting the Correlation between Protein Abundance and Elongation Efficiency in Prokaryotes. Int J Mol Sci 2022; 23:ijms231911996. [PMID: 36233299 PMCID: PMC9570070 DOI: 10.3390/ijms231911996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/23/2022] [Accepted: 09/30/2022] [Indexed: 11/07/2022] Open
Abstract
Protein abundance is crucial for the majority of genetically regulated cell functions to act properly in prokaryotic organisms. Therefore, developing bioinformatic methods for assessing the efficiency of different stages of gene expression is of great importance for predicting the actual protein abundance. One of these steps is the evaluation of translation elongation efficiency based on mRNA sequence features, such as codon usage bias and mRNA secondary structure properties. In this study, we have evaluated correlation coefficients between experimentally measured protein abundance and predicted elongation efficiency characteristics for 26 prokaryotes, including non-model organisms, belonging to diverse taxonomic groups The algorithm for assessing elongation efficiency takes into account not only codon bias, but also number and energy of secondary structures in mRNA if those demonstrate an impact on predicted elongation efficiency of the ribosomal protein genes. The results show that, for a number of organisms, secondary structures are a better predictor of protein abundance than codon usage bias. The bioinformatic analysis has revealed several factors associated with the value of the correlation coefficient. The first factor is the elongation efficiency optimization type-the organisms whose genomes are optimized for codon usage only have significantly higher correlation coefficients. The second factor is taxonomical identity-bacteria that belong to the class Bacilli tend to have higher correlation coefficients among the analyzed set. The third is growth rate, which is shown to be higher for the organisms with higher correlation coefficients between protein abundance and predicted translation elongation efficiency. The obtained results can be useful for further improvement of methods for protein abundance prediction.
Collapse
Affiliation(s)
- Aleksandra E. Korenskaia
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
- Correspondence: ; Tel.: +7-999-467-7118
| | - Yury G. Matushkin
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Sergey A. Lashin
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Alexandra I. Klimenko
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
| |
Collapse
|
8
|
Amerifar S, Norouzi M, Ghandi M. A tool for feature extraction from biological sequences. Brief Bioinform 2022; 23:6563937. [PMID: 35383372 DOI: 10.1093/bib/bbac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 11/12/2022] Open
Abstract
With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.
Collapse
Affiliation(s)
- Sare Amerifar
- Bioinformatics, Tatbiat Modares University, Jalal Al Ahmad, 14115-111, Tehran, Iran
| | - Mahammad Norouzi
- Computer Science, Technical University of Darmstadt, Hochschulstr. 1, 64293, Hesse, Germany
| | - Mahmoud Ghandi
- Bioinformatics, Monte Rosa Therapeutics, Summer Street, 02210, Boston, United States
| |
Collapse
|