Abstract: |
It remains unclear what principles underlie a protein
sequence/structure adopting a given fold. Local properties such as the
arrangement of secondary structure elements adjacent in sequence or global
properties such as the total number of secondary structure elements may act as a
constraint on the type of fold that a protein can adopt. Such constraints might
be considered "signatures" of a given fold and their identification would be
useful for the classification of protein structure. Inductive Logic Programming
(ILP) has been applied to the problem of automatic identification of structural
signatures. The signatures generated by ILP can then be both readily interpreted
by a protein structure expert and tested for their accuracy. A previous
application of ILP to this problem indicated that large insertions/deletions in
proteins are an obstacle to learning rules that effectively discriminate between
positive and negative examples of a given fold. Here, we apply an ILP learning
scheme that reduces this problem by employing the structural superposition of
protein domains with similar folds. This was done in three basic steps. Firstly,
a multiple alignment of domains was generated for each type of fold studied.
Secondly, the alignment was used to determine the secondary structure elements
in each of those domains that can be considered equivalent to one another (the
"core" elements of that fold). Thirdly, an ILP learning experiment was conducted
to learn rules defining a fold in terms of those core elements.
|