Abstract
Protein segments that contain few of the possible 20 amino acids, sometimes in tandem repeat arrays, are referred to as containing "simple" or "low-complexity" sequence. Many Plasmodium falciparum proteins are longer than their homologs in other species by virtue of their content of such low-complexity segments that have no known function; these are interspersed among segments of higher complexity to which function can often be ascribed. If there is low complexity at the protein level, there is likely to be low complexity at the corresponding nucleic acid level (departure from equifrequency of the four bases). Thus, low complexity may have been selected primarily at the nucleic acid level and low complexity at the protein level may be secondary. In this case, the amino acid composition of low-complexity segments should be more reflective than that of high complexity segments on forces operating at the nucleic acid level, which include GC-pressure and AG-pressure. Consistent with this, for amino acid determining first and second codon positions, open reading frames containing low-complexity segments show increased contributions to downward GC-pressure (revealed as decreased percentage of G+C) and to upward AG-pressure (revealed as increased percentage A+G). When not countermanded by high contributions to AG-pressure, low-complexity segments can contribute to base order-dependent fold potential; in this respect, they resemble introns. Thus, in P. falciparum, low-complexity segments appear as adaptations primarily serving nucleic acid level functions.
Collapse