Chojnowski G, Simpkin AJ, Leonardo DA, Seifert-Davila W, Vivas-Ruiz DE, Keegan RM, Rigden DJ. findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM.
IUCrJ 2022;
9:86-97. [PMID:
35059213 PMCID:
PMC8733886 DOI:
10.1107/s2052252521011088]
[Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 10/22/2021] [Indexed: 05/15/2023]
Abstract
Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.
Collapse