1
|
Odrzywolek K, Karwowska Z, Majta J, Byrski A, Milanowska-Zabel K, Kosciolek T. Deep embeddings to comprehend and visualize microbiome protein space. Sci Rep 2022; 12:10332. [PMID: 35725732 PMCID: PMC9209496 DOI: 10.1038/s41598-022-14055-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 05/31/2022] [Indexed: 12/13/2022] Open
Abstract
Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein sequences are novel and not catalogued, hence the possibility of predicting their function through conventional homology-based approaches is limited, which indicates the need for further research on alignment-free methods. Here, we leverage a deep-learning-based representation of proteins to assess its utility in alignment-free analysis of microbial proteins. We trained a language model on the Unified Human Gastrointestinal Protein catalogue and validated the resulting protein representation on the bacterial part of the SwissProt database. Finally, we present a use case on proteins involved in SCFA metabolism. Results indicate that the deep learning model manages to accurately represent features related to protein structure and function, allowing for alignment-free protein analyses. Technologies that contextualize metagenomic data are a promising direction to deeply understand the microbiome.
Collapse
Affiliation(s)
- Krzysztof Odrzywolek
- Ardigen, Podole 76, 30-394, Krakow, Poland
- Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059, Krakow, Poland
| | - Zuzanna Karwowska
- Malopolska Centre of Biotechnology, Jagiellonian University, Gronostajowa 7A, 30-387, Krakow, Poland
| | - Jan Majta
- Ardigen, Podole 76, 30-394, Krakow, Poland
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387, Krakow, Poland
| | - Aleksander Byrski
- Institute of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Mickiewicza 30, 30-059, Krakow, Poland
| | | | - Tomasz Kosciolek
- Malopolska Centre of Biotechnology, Jagiellonian University, Gronostajowa 7A, 30-387, Krakow, Poland.
| |
Collapse
|