Abstract
The systematic sequencing of the yeast genome has raised the problem of the biological significance of the open reading frames (ORFs) revealed: it is possible that some of these are fortuitous. To avoid the analysis of such fortuitous ORFs, a minimum length of 100 sense codons was adopted. Nevertheless, the presence of fortuitous ORFs of more than 100 codons cannot be excluded. Thus, in the context of functional analysis, a method for discrimination between fortuitous and biologically active ORFs may be useful. The discrimination method described here is based on multiple criteria: ORF length, codon bias, and both amino-acid and dipeptide composition of the corresponding polypeptide. The thresholds for each criterion are based on the comparison between two learning sets: one drawn from random DNA sequences and the second from known genes. The method was validated by two test sets (one random and one biological) and then applied to the ORFs of chromosomes I, II, III, V, VIII, IX and XI. This method predicts 123 fortuitous ORFs among the 1773 identified on these chromosomes.
Collapse