Moderated by Erik Sandewall; refereeing chaired by Susanne Biundo.

Erik Sandewall

Logic-Based Modelling of Goal-Directed Behavior

The article mentioned above has been submitted to the Electronic Transactions on Artificial Intelligence, and the present page contains the review discussion. Click for more explanations and for the webpage of theauthor, Erik Sandewall.
Overview of interactions

N:o Question Answer(s) Continued discussion

1 13.4  Rob Miller
13.4  Erik Sandewall

2 Anonymous referee 1
17.7  Erik Sandewall

3 Anonymous referee 2
17.7  Erik Sandewall

4 Anonymous referee 3
17.7  Erik Sandewall
19.8  Anonymous referee 3
19.8  Erik Sandewall

Q1. Rob Miller (13.4):

Dear Erik,
I'm confused about axioms S4, S5 and S8 involving the action test(p) on page 7 of your paper:
S4 seems to state that it is always possible to perform a test to see if a given property is true. But I would have thought that the opposite is nearly always the case - that in most commonsense or robot environments we could almost assume by default that the agent is not in a position to directly test the truth of a given property of the environment. (This is certainly true for me as I type this question at home. I can't check the condition of anything in my office, in the other rooms in my house, etc. without first performing some other actions.) So what is the rationale behind axiom S4? Is the agent perhaps "testing" it's own knowledge, rather than the real environment?
S5 states that test actions are always instantaneous. Why should this be so, at least in a framework where other actions are not instantaneous?
S8 says that, once executed, a test action fails if and only if the corresponding property is false. But this seems to go against the commonsense notion of "testing". For example, when I test to see if I have any new emails (actually by clicking on the "?" button on Netscape) I count this action as successful if the system reports "no new messages on server". I count the action as failed if (as frequently happens with my faulty modem here at home), clicking on "?" crashes my computer. I'm then left with no information as to whether I have new emails or not. So is this not a test action in the sense that you intended?
Regards,
Rob
A1. Erik Sandewall (13.4):

Dear Rob,
The "action" test(p) is classified as an action in the formal sense that it can be combined with other actions using, for example, the sequential operator. In those cases that you describe, where the actual performance of the test takes a nontrivial amount of time compared to other events, I imagine that one would use test(p) merely for the internal decision within the robot and using knowledge that it has already acquired. The corresponding, external action of finding out what is the case and then succeeding or failing would then be represented with something along the lines of findout(p) ; test(p) , where findout(p) is non-instantaneous.
Will test(p) always be immediately preceded by findout(p) in practice? I think of at least two kinds of situations where this will not be the case. When some fluent is being monitored continuously by a lower software level, then the current value of p can safely be assumed to always be available. Also, in those cases where the agent has to plan ahead and make sure in advance that it acquires important information, so that it is there without delay when needed, then again test(p) will not be directly preceded by and tied to findout(p) .
The axioms assume that the argument of test(p) is either true or false. The case where incomplete knowledge is allowed may be handled by any of the well-known methods, for example by having a pair of two propositional fluents p and p' , and where lack of information is represented by making both of them false.
The mnemonic "test" was inspired by an opcode in an assembly language that I learned a long time ago.
Regards, Erik

Q2. Anonymous referee 1:

Generally, the paper has the flavor of a progress report rather than a well-knit article. My understanding is that this perfectly complies with the publication policy of ETAI. However, sometimes such a status of a paper is indicated by adding the qualification "preliminary report" to the title, which the author might consider in the present case as well. The author himself admits that there is still a lot of work to be done. In particular, a number of rather complex axioms are proposed, of which it is claimed that they characterize "obvious properties" (p.5) of relation symbols which have been introduced with a certain meaning in mind. Past experience taught us that it can be dangerous to propose axiomatizations which seem most natural at first glance. I therefore strongly support the author's remark on the need for an "underlying semantics, and a validation of the present approach with respect to that semantics..." (p.16). An essential, most basic property of any useful axiomatization is consistency. Maybe the author could add a result in that direction.
p.7: The purpose of Section 2.6 is not clear. Composite fluents aren't used elsewhere in the paper, and so one wonders, for instance, why logically equivalent composite fluents cannot be treated as equal.
p.8: The operators G and D_s have been introduced as abbreviations for formulas, e.g., G(s, a) <-> H(s, inv(a)) . Why is it, then, that they cannot serve as part of axioms, provided the axiomatization includes the respective definitional formulas?
p.8: The predicate Composite is introduced but (as far as I could see) not used elsewhere in the paper.
A2. Erik Sandewall (17.7):

Generally, the paper has the flavor of a progress report rather than a well-knit article. My understanding is that this perfectly complies with the publication policy of ETAI. However, sometimes such a status of a paper is indicated by adding the qualification "preliminary report" to the title, which the author might consider in the present case as well.

My understanding is that the subtitle "preliminary report" is used in conference publications in order to distinguish an article from a later, revised version of the same article that is to be published elsewhere. The present version of the article is intended to be the final one, except for the corrections that follow from the review comments.

The author himself admits that there is still a lot of work to be done. In particular, a number of rather complex axioms are proposed, of which it is claimed that they characterize "obvious properties" (p.5) of relation symbols which have been introduced with a certain meaning in mind. Past experience taught us that it can be dangerous to propose axiomatizations which seem most natural at first glance. I therefore strongly support the author's remark on the need for an "underlying semantics, and a validation of the present approach with respect to that semantics..." (p.16).

See the first item in the answer to Anonymous Referee 3.

An essential, most basic property of any useful axiomatization is consistency. Maybe the author could add a result in that direction.

I guess this will be a natural corollary of validation results relative to an underlying semantics.

p.7: The purpose of Section 2.6 is not clear. Composite fluents aren't used elsewhere in the paper, and so one wonders, for instance, why logically equivalent composite fluents cannot be treated as equal.

Composite fluents are used in the examples in section 2.5 (although it's possible that order between those two sections ought to be reversed). Treating logically equivalent composite fluents as equal does not have any obvious advantage, and it will complicate things considerably for the occlusion predicate.

p.8: The operators G and D_s have been introduced as abbreviations for formulas, e.g., G(s, a) <-> H(s, inv(a)) . Why is it, then, that they cannot serve as part of axioms, provided the axiomatization includes the respective definitional formulas?

This is for reasons of convenience rather than logical necessity. The tentative specifications in question are easy to follow when they are written in terms of the abbreviations, but not when the abbreviations are expanded. Furthermore, verifying the important properties of the specifications becomes quite messy. All of this becomes clearer if the specifications are written without the use of the abbreviations.

p.8: The predicate Composite is introduced but (as far as I could see) not used elsewhere in the paper.

It is used in axiom S2 on page 5, where it indicates one of the cases where an action does not start executing from the time it is invoked. The reason for making this exception is that an action of the form a;b can not be allowed to start executing at a time when the action a is already executing. Therefore, it is not sufficient to say that an invoked action starts executing unless the same action is already in the midst of execution.

Q3. Anonymous referee 2:

Overall

This is a well-motivated and timely contribution to the field, and, perhaps with some presentational improvements, should be accepted for publication in ETAI. The paper's main original contribution is an axiom-based account of control, focussing on the issue of action failure, retry and replanning. It also addresses the issue of how these axioms can be integrated with logical theories of action.
There follow a number of general questions I would like to raise, some presentational suggestions, and a list of typos.

General questions

Although I'm attracted to the author's work, I am tempted to ask why we should want an axiomatic account of goal-directed behaviour. For example, Axioms G1 to G4 really constitute an algorithm, and perhaps make better sense presented as an algorithm. From my point-of-view, the role of logic in cognitive robotics is to represent the world, not to represent the internal workings of the agent. Perhaps the author would like to address this question, in an electronic discussion and/or in the final version of the paper.
In Section 2.5, the third formula on page 7 entails that the postcondition of an action is false until the action is finished. But consider the example of a compound action WashDishes , which has the effect of making CutleryClean true and CrockeryClean true. Now, it's bound to be the case, assuming one item is washed at a time, that either CutleryClean becomes true before CrockeryClean or the other way around. Either way, one of the action's postconditions becomes true before the whole action is finished.
It's not clear to me what it means for the robot to "state" a formula on page 9. Does this mean it passes that formula to a control module?

Presentational suggestions

I have one serious presentational objection to the paper. I think it would be very much improved if there were some concrete examples showing what inferences can be drawn from the axioms. (By a concrete example, I mean one with meaningful fluent names.) Even a single concrete example illustrating some of the main ideas would be of enormous benefit to the reader's intuition. Otherwise, a hostile reader might think this is all just empty formalism.
Here are some more minor presentational suggestions. At the start of Section 2.6, the author introduces logical connectives for composing fluents. But these have already been used in Section 2.5. Perhaps this material could be reshuffled. Similarly, on page 10, the author declares how the variable symbols a and g will be used, although they have already appeared many times in the paper.

A3. Erik Sandewall (17.7):

Although I'm attracted to the author's work, I am tempted to ask why we should want an axiomatic account of goal-directed behaviour. For example, Axioms G1 to G4 really constitute an algorithm, and perhaps make better sense presented as an algorithm. From my point-of-view, the role of logic in cognitive robotics is to represent the world, not to represent the internal workings of the agent. Perhaps the author would like to address this question, in an electronic discussion and/or in the final version of the paper.

My argument goes in two steps:

Rational agent behavior in a robot needs a precise specification, using logic
Such a specification can gain considerably by being integrated with a logic of actions and change.
First, rational agent behavior is fairly complex in the case of a robot that interacts with the real world. It requires pursuing goals, choosing plans, making appropriate reactions to new information, dealing with uncertainty and temporal constraints, and so on. Even though in the end this behavior is going to be realized by software, it is clear that a high-level specification of that software will be necessary for the actual software engineering as well as for making it possible to validate that software and to prove its properties.
Furthermore, if this software is described in a purely algorithmic style, it is difficult to avoid introducing a lot of administrative machinery. This is exemplified by the article "A Formal Specification of dMARS" by Mark d'Inverno et al. (Proceedings of the ATAL-97 workshop, pages 155-176), where a variant of PRS is specified in the Z notation (which is a software specification language). In my approach, the agent's rational behavior is specified in terms of restrictions: for each scenario, the set of models for the formalization is intended to equal the set of rational behaviors, each interpretation being one possible history in the world. Such a constraint-oriented approach has chances of being much more concise and to the point.
Finally, once we decide that it is appropriate to specify rational robotic behavior in logic, we also have to deal with actions. The robotic behavior consists of actions, so this already defines a connection point. Also, when several agents are involved, the actions of one need to be observed and understood by the others. In the longer perspective, I think we will see a lot more integration between theories for reasoning about actions and for characterizing agent behavior.

In Section 2.5, the third formula on page 7 entails that the postcondition of an action is false until the action is finished. But consider the example of a compound action WashDishes , which has the effect of making CutleryClean true and CrockeryClean true. Now, it's bound to be the case, assuming one item is washed at a time, that either CutleryClean becomes true before CrockeryClean or the other way around. Either way, one of the action's postconditions becomes true before the whole action is finished.

The text preceding the formula was:

As another simple example, consider the case of actions which are described in terms of a precondition, a prevail condition, and a postcondition, where the postcondition is at the same time the termination condition for the action...

The formula in question therefore refers to a limited class of actions, rather than to my approach as a whole. (Besides, in the case you mention, the postcondition of the compound action ought to be the conjunction of the two conditions, that is, CutleryClean_^CrockeryClean . Each action is assumed to have a postcondition, in the singular, in the example).

It's not clear to me what it means for the robot to "state" a formula on page 9. Does this mean it passes that formula to a control module?

Yes. This deductive process, which one may think of as a control module, is modelled as receiving formulae from two sources: both from a higher software layer of the robot itself ("decisions") and from the surrounding world via the sensors ("observations"). Both kinds of "signals" may influence actions of the agent. This is discussed in section 3.5, the last paragraph on page 12.

I have one serious presentational objection to the paper. I think it would be very much improved if there were some concrete examples showing what inferences can be drawn from the axioms. (By a concrete example, I mean one with meaningful fluent names.) Even a single concrete example illustrating some of the main ideas would be of enormous benefit to the reader's intuition. Otherwise, a hostile reader might think this is all just empty formalism.

Such an example is forthcoming.

Here are some more minor presentational suggestions. At the start of Section 2.6, the author introduces logical connectives for composing fluents. But these have already been used in Section 2.5. Perhaps this material could be reshuffled. Similarly, on page 10, the author declares how the variable symbols a and g will be used, although they have already appeared many times in the paper.

This is easily arranged (although there were in fact good reasons for the order in which this material is presently presented).

Q4. Anonymous referee 3:

The paper relates the concept of a "deliberated retry" to the author's earlier work on the logic of actions and change. It is intended for use in an applied project related to controlling an intelligent airborne vehicle. The problems discussed in the paper are important, and its ideas are interesting and original.
On the negative side, the paper does not really prove that its formalism is good for any specific purpose. It does not even make any mathematically precise claim of this kind. It would be good to include a description of goal-directed behavior in a toy domain and prove that some intuitively expected conclusions abour goal-directedness do indeed follow from the given axioms using the entailment methods proposed by the author. This would be more convincing than the "hand-waving" arguments in favor of the proposed approach given in Sec. 4. In the absence of such an example, the paper is reminiscent of the work on actions done in the early days of AI, such as the "frame default" in Reiter's 1980 paper on default logic, or the formalization of the blocks world in McCarthy's 1986 paper on circumscription. The ideas were interesting, but their authors were unable to prove anything about them. As the familiar shooting scenario demonstrated, a nonmonotonic formalization that looks plausible may turn out to be unsatisfactory after all. If the author of this paper tries to check that his theory works for one or two toy examples, he may very well discover bugs that need to be fixed.
It seems to me that Rob Miller was right when he suggested in his message to the author that test(p) is the action of testing the agent's knowledge rather the real environment, and that otherwise the axioms for testing are not convincing. The notation proposed in the paper uses the same symbol p to represent the fluent itself and the agent's knowledge about it, which looks peculiar. Regarding the author's suggestion that the incompleteness of knowledge be represented by distinguishing between p and p' "where the lack of information is represented by making both of them false". I have this question: How would you represent the assertion that p is true but this fact is not known to the agent?
Re Theorem 1: There seems to be an implicit assumption here that the set of intervals in question is finite. Shouldn't it be included in the statement of the theorem?
Re Theorem 2: I am puzzled by the use of the word "conversely" here. It seems to me that both parts say more or less the same thing.
In the 3rd displayed formula on p. 7, conjunction is applied to fluents, which is only explained in the next section, and an interval is used as the first argument of H which, as far as I can see, is not defined at all.
My recommendation is that the paper be accepted for publication in the ETAI after the author makes the changes that he deems appropriate.
A4. Erik Sandewall (17.7):

On the negative side, the paper does not really prove that its formalism is good for any specific purpose. It does not even make any mathematically precise claim of this kind. It would be good to include a description of goal-directed behavior in a toy domain and prove that some intuitively expected conclusions abour goal-directedness do indeed follow from the given axioms using the entailment methods proposed by the author. This would be more convincing than the "hand-waving" arguments in favor of the proposed approach given in Sec. 4.

A proof that correct conclusions are obtained for one or two toy domains does not prove that the formalism is good for any interesting purpose either. As indicated in subsection 5.1 ("Limitations of these results"), it remains to validate the proposed approach using a plausible underlying semantics. This is in line with the methodology for research in this area that I have defended at a number of occasions, beginning in my IJCAI-93 paper. Only so can one identify the range of applicability of the approach in a reliable way.
You might say that the paper ought not to be published until such a validation exists and can be included. This would be an excellent position to take, provided that it were shared by the present community. However, even a quick look at the literature in our field will show that that's not the way things work: Citations to earlier work refer almost exclusively to the approaches proposed by the earlier author, and quite rarely to the theorems that were proved in the earlier paper.
Besides the formal validation, it's the applications that will prove what the formalism is good for, but this would also take too long in the present paper.

In the absence of such an example, the paper is reminiscent of the work on actions done in the early days of AI, such as the "frame default" in Reiter's 1980 paper on default logic, or the formalization of the blocks world in McCarthy's 1986 paper on circumscription. The ideas were interesting, but their authors were unable to prove anything about them. As the familiar shooting scenario demonstrated, a nonmonotonic formalization that looks plausible may turn out to be unsatisfactory after all. If the author of this paper tries to check that his theory works for one or two toy examples, he may very well discover bugs that need to be fixed.

I have of course done a few toy examples for my own sake; the details are too long and boring to merit going into the paper. With respect to the historical comparison: the bugs in those cases were in the entailment method rather than the axiomatization. The approach described here is not dependent on the exact formulation of the axioms, and would not collapse if some of the axioms were to require debugging. Compare subsection 5.1.
Another comparison with the work on the frame problem in the 1980's is more appropriate: those were the days of first exploration of the problem (besides for a small number of articles in the 1970's), and it is quite natural and respectable that one started by looking for approaches that were at least intuitively plausible. After that start it was natural to look for precise and proven properties of those approaches. Ramification and causality have followed suit, and qualification is in its early stages. The characterization of rational robotic behavior is just in its very first stage of development, that is all.

It seems to me that Rob Miller was right when he suggested in his message to the author that test(p) is the action of testing the agent's knowledge rather the real environment, and that otherwise the axioms for testing are not convincing. The notation proposed in the paper uses the same symbol p to represent the fluent itself and the agent's knowledge about it, which looks peculiar.

Well, it wasn't Rob who suggested it; you refer to my answer to his question. Anyway, the full characterization of a rational agent is bound to be fairly extensive, and the distinction between the actual and the believed value of a fluent is one of the aspects of the full problem. I believe it is best to address this complex problem by divide-and-conquer, so several other papers including references [11] and [12] "complement the work reported here", as stated in the second paragraph of section 1.2. Those articles use distinct predicates for actual and estimated feature values, so the problem you mention has indeed been addressed in the context of the present approach and we have shown how to deal with it.
The present paper works with the simplifying assumption that the agent has perfect knowledge of the fluents. This is in particular in the interest of the reader, since it makes for heavy reading to address all problems at once.

Regarding the author's suggestion that the incompleteness of knowledge be represented by distinguishing between p and p' "where the lack of information is represented by making both of them false". I have this question: How would you represent the assertion that p is true but this fact is not known to the agent?

By making both p and p' false; this is a standard technique. However, I don't recognize having mentioned it in this paper?

Re Theorem 1: There seems to be an implicit assumption here that the set of intervals in question is finite. Shouldn't it be included in the statement of the theorem?

No, the proof does not require that assumption.

Re Theorem 2: I am puzzled by the use of the word "conversely" here. It seems to me that both parts say more or less the same thing.

The theorem has the form "In all the models for the axioms, P. Conversely, Q". It is correct that Q follows from P even without the use of those axioms, and the second part of the theorem is merely for showing the conclusion from another point of view. Maybe it would be better to write it as a separate comment.

In the 3rd displayed formula on p. 7, conjunction is applied to fluents, which is only explained in the next section, and an interval is used as the first argument of H which, as far as I can see, is not defined at all.

These bugs will be corrected.

C4-1. Anonymous referee 3 (19.8):

Thank you for your reply to my comments. One point is still not clear to me. You claim that, in Theorem 1, there is no need to assume that the set of intervals is finite. The following seems to be a counterexample:

s_i = 1/(2*i+1), t_i = 1/(2*i)

where i=1,2,... Please explain.

C4-2. Erik Sandewall (19.8):

Theorem 1 states that under the axioms and for some ordering of the intervals, s_i < s_i+1 and t_i < s_i+1 for all i . In your example, there is an infinite sequence of intervals that has a Zeno property and proceeds to the limit from above. This infinite sequence of intervals therefore has to be renumbered so that

s_-i = 1/(2*i+1), t_-i = 1/(2*i)

In other words, the numbering is reversed and you have to consider i = -1, -2, ... With this ordering the consequent of the theorem holds.
Your example also shows that if there is an infinite sequence of intervals, the theorem does not hold in the limit as i tends to infinity. However, this is also not claimed.
Your example does however remind me that if one is going to be very technical, it might be better to phrase the theorem so that it states that if interval i precedes interval j under the ordering, then s_i < s_j and t_i < s_j . In this way one does not give the impression that a numbering of the intervals always exists using natural numbers. Such a numbering does exist in your example, but if one combines several Zeno sequences then it doesn't. Of course all of this is very remote from the situations that arise in the context we're addressing.

Background: Review Protocol Pages and the ETAI
This Review Protocol Page (RPP) is a part of the webpage structure for the Electronic Transactions on Artificial Intelligence, or ETAI. The ETAI is an electronic journal that uses the Internet medium not merely for distributing the articles, but also for a novel, two-stage review procedure. The first review phase is open and allows the peer community to ask questions to the author and to create a discussion about the contribution. The second phase - called refereeing in the ETAI - is like conventional journal refereeing except that the major part of the required feedback is supposed to have occurred already in the first, review phase.
The referees make a recommendation whether the article is to be accepted or declined, as usual. The article and the discussion remain on-line regardless of whether the article was accepted or not. Additional questions and discussion after the acceptance decision are welcomed.
The Review Protocol Page is used as a working structure for the entire reviewing process. During the first (review) phase it accumulates the successive debate contributions. If the referees make specific comments about the article in the refereeing phase, then those comments are posted on the RPP as well, but without indicating the identity of the referee. (In many cases the referees may return simply an " accept" or " decline" recommendation, namely if sufficient feedback has been obtained already in the review phase).