Claverie JM, Sauvaget I, Bougueleret L. Computer generation and statistical analysis of a data bank of protein sequences translated from GenBank.
Biochimie 1985;
67:437-43. [PMID:
3927990 DOI:
10.1016/s0300-9084(85)80261-3]
[Citation(s) in RCA: 9] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
We describe PGtrans, a new and freely available protein sequence databank (2625 sequences, 554198 amino-acids). This data bank is routinely produced by automatic computer translation of the nucleotide sequence library GenBank. The information needed for the translation process (transcriptional orientation, location of coding regions, splice sites and pertinent genetic code) is gathered by the translation program through an "intelligent" scanning of the documentary field of each GenBank entry. Inconsistencies resulting in unexpected termination codons are detected and reported thus allowing the correction of data bank errors. PGtrans is intended as a tool for protein similarity searches. Its reasonable overall size (2 Moctets) makes it suitable for micro-computer environments. Up to date amino-acid composition data and relative abundances of di-, tri-, and tetra-peptides in proteins of known sequences are presented and discussed.
Collapse