Moderated by Erik Sandewall. |
Tom Costello and John McCarthyUseful Counterfactuals |
The
article
mentioned above has been submitted to the Electronic
Transactions on Artificial Intelligence, and the present page
contains the review discussion. Click here for
more
explanations and for the webpage of theauthors: Tom Costello and John McCarthy.
Overview of interactions
Q1. Judea Pearl (27.6):
The title of this paper, "useful counterfactuals", seems to suggest (1) that plain ordinary counterfactuals are useless, (2) that the paper will teach us how to discriminate useful from useless counterfactuals and (3) that the paper will teach us how useful counterfactuals can be interpreted and put into use. I will start by arguing that all (true) counterfactuals are useful. This implies that (2) is not needed. Finally, I will discuss whether the paper is specific enough to accomplish (3). The word ``counterfactual'' connotes contradiction and/or metaphysical speculation but, in fact, counterfactuals are neither contradictory nor metaphysical. Counterfactuals carry as clear an empirical message as any scientific laws, and are fundamental to them. The essence of any scientific law lies in the claim that certain relationships among variables remain invariant when the values of those variables change relative to our immediate observations. Counterfactuals, likewise, tell us what remains invariant when the world undergoes change, so, they are not different than any scientific knowledge. Every scientific law can be expressed counterfactually; for example, Ohm's law can be stated: ``had the current in the resistor been I instead of I', the voltage would have been V' (=I'R), instead of V." Thus, to say that counterfactuals are useful amounts to saying that scientific knowledge is useful. (In my IJCAI-99 paper I discussed how even a hind-sighted sentence such as "had I bet differently, I would have won a dollar", which refers to a non-repeatable circumstance, conveys useful knowledge. (Paper available on www.cs.ucla.edu/~judea/)) Costello and McCarthy list several usages of counterfactuals, including the conveyance of facts, learning, and prediction, yet when we examine those usages, we find that none is distinct to counterfactuals, and each may equally be served by what we normally call "domain knowledge". Indeed, none of the sentences on pages 2-3 would turn false if we replace the word "counterfactual" with the word "knowledge". The distinct puzzling questions about counterfactuals are (1) why humans resort to this subjunctive mode of expression in conveying simple chunks of knowledge, and (2) how we should interpret counterfactual sentences so as to extract the knowledge they convey. The paper does not touch on question (1) and the answer it provides to question (2) is not presented in a form that is transportable beyond the example in which it is embedded. The paper begins by emphasizing two notions,
By "approximate theories" the authors mean: "not complete theories of the world". I agree with the ubiquity of such theories, but I fail to see what makes approximate theories particularly akin to counterfactuals. It seems to me that EVERY theory is approximate, or else we would call it a "model" i.e., a mathematical object that assigns truth value to every sentence of interest. Moreover, I fail to see what is wrong with the traditional approach of first defining the truth of a sentence in a model (i.e., "complete theory") and then, if we do not have a complete specification of a model, we say that the sentence is true in a theory just in case it is true in every model of the theory. Do the authors suggest that this approach should be abandoned when it comes to counterfactuals? If so, why? And if not, I would have liked the authors to tell us what they consider to be an appropriate MODEL for counterfactual sentences. In other words, what kind of things we must specify before we can assign truth value to a counterfactual sentence q > p , where p and q are arbitrary propositions. I have not found the answer in this paper, and this made the reading very difficult. The second notion emphasized in the paper is "Cartesian space", which is a space of points defined by coordinates, such that we can always change one coordinate while keeping the others unchanged. When we can do that to every point (X, Y), it makes sense to say, "if X were 3, we would be closer to the origin", because it is possible then to infer the final location from the initial one. This Cartesian metaphor corresponds to what philosophers called "ceteris paribus" (keeping everything else constant), first suggested by J S Mills (1843). as a key element in understanding counterfactuals. Given this basic intuition, I concur with the authors that some form of ceteris paribus must govern the interpretation of counterfactuals no matter what formalism we use. However, this is only the first step. The remaining steps are to decide WHERE the Cartesian space is to be found in any given story, how to represent the points in that abstract space, how to compute the coordinate change dictated by a given counterfactual q > p and, finally, how to compute the ramifications of this coordinate change on other propositions. I felt that the paper leaves these questions either unanswered or implicit in the formulation of the skiing example. If the latter is the case, I suggest that the authors cast the answers in the form of generic principles, to permit their transportation across domains. I was unable to understand the skiing example; it is too involved and too skiing-domain-sensitive for me to master. I strongly recommend that the authors choose another example, one that leaves no ambiguities as to what theory resides in the mind of each speaker, and what the right answer is in each theory. In my IJCAI-99 paper I chose, for example, a firing squad scenario, where the entire theory can be communicated unequivocally to skiers and non-skiers alike, and where the truth value of every counterfactual sentence is obvious to all readers (e.g., that if rifleman-1 had not shot, the prisoner would still be dead). I would be glad to comment on the authors' proposed axiomatization, once it is cast in a more familiar domain, and if the authors demonstrate explicitly how counterfactual sentences can be evaluated IN GENERAL. Examples of explicit demonstrations can be found in Charles Ortiz's AIJ-9 paper, using a domino-tiles example, and in my IJCAI-99 paper (using the firing squad scenario). I can comment on section 8, entitled Bayesian Network, with which I am somewhat more familiar. First, the title may be confusing; Bayesian networks cannot support counterfactual reasoning, for reasons described in Balke and Pearl 1994 a b (see also Causality 2000, p.33-37) The authors probably meant to discuss probabilistic Structural Equations Models (SEM), of which Bayesian networks are an abstraction. An SEM is defined as a set of deterministic functions, while a Bayesian Network is defined as a set of conditional probability constraints. Costello and McCarthy identify two differences and one commonality between their formulation and SEM. I will explain why I find these two differences to be illusionary and the commonality to be tangential. I will then discuss what I consider to be the essential difference between the two formulations. Illusionary Difference 1. Costello and McCarthy write:
To illustrate, suppose our approximate theory contains just one (causal) rule: "If A then B". Suppose further A and B are true. Question: Is the counterfactual "not-A > not-B" true? Answer: we dont know. We need to complete the rule "If A then B" into a function, to specify what happens to B when A is false. This completion, whether done explicitly, or my minimization, or by some other principle, turns our theory into a collections of functions, a collection which I called a "structural causal model" . Illusionary Difference 2. The paper states: "The other major difference is that Bayesian networks focus on the probability distribution of certain variables, rather than on facts in general." I would like to believe that the authors did not mean it literally, because this kind of psuedo-differences were used by some AI-ers in the 1980's as an excuse for not reading the probabilistic literature. Those who venture to read that literature would discover quickly that probabilistic reasoning (in SEM) proceeds in two steps: First, reasoning about "facts in general" in a deterministic theory, and second, computing the probability of those "facts in general" when we have additional knowledge on how likely the background (or "frame") facts are. In my IJCAI-99 paper, for example, I first demonstrate how to compute the truth value of the deterministic counterfactual "if rifleman-1 had not shot, the prisoner would still be dead", and ONLY THEN I go to computing the probability of this sentence, assuming that rifleman-1 is somewhat likely pull the trigger out of nervousness. The same sequence is followed in Balke and Pearl (UAI-1995) in Galles and Pearl (1997, 1998) and in my new book Causality, chapter 7 (partly on www.cs.ucla.edu/~judea/) Thus, probabilities are options, not barriers to students of counterfactuals. Tangential commonality Costello and McCarthy write that their approach "can be seen to be similar to modeling systems with structured equations.. or Bayesian networks..." In their Theorem 4, they prove that a counterfactual sentence "is true in a causal model M if and only if [it] is true in the Cartesian frame MF..." What Theorem 4 states is that the evaluation of counterfactual sentences in a causal model M (according to the SEM formalism) involves a Cartesian-product-like assumption, and that one can identify the Cartesian space with the set of functions F. This is correct: structural models indeed assume ceteris paribus relative to the set of equations -- when we change one equation, the others remain intact. The reason I consider this commonality to be tangential is that I (and most people I know) take for granted the invocation of ceteris paribus in counterfactual analysis. The interesting question, in my opinion, is not whether a ceteris paribus assumption is present in a given theory of counterfactuals -- such presence is inevitable -- but rather, what space should we apply ceteris paribus to and how. Balke, Galles and Pearl make a specific commitment in this regard. They claim that ceteris paribus should be applied to the space of MECHANISMS (read: functions), and NOT to the space of propositions and not to the space of variables, and not to some other space that one can dream up. Thus, if Costello and McCarthy buy this commitment, they can safely claim that their approach "can be seen to be similar to modeling systems with structured equations.. ". But the mere existence of ceteris paribus (or Cartesian product space) someplace in a system does not make their approach similar to that system. And this brings me to the main point of my comments: Have Costello and McCarthy made this commitment (i.e., to identify the Cartesian space with a set of mechanisms)? Incidently, is anyone on this Newsletter prepared to make this commitment? A word of caution to those who answer YES or WHY NOT: the commitment to mechanisms does not come cheap. First, it requires that we proclaim certain sentences as "mechanisms", and that we assign to those sentences a different status and a different syntactic representation than that assigned to other sentences (e.g., facts, observations, assumptions, implications) It also requires a one-to-one correspondence between mechanisms and variables (see my IJCAI-99 paper, Sections 4.4-4.5, on www.cs.ucla.edu/~judea/). I have not seen these two elements in the authors' analysis of the skiing example, but I may have overlooked them, given my ignorance of skiing instructions. The reason I emphasize this commitment to mechanisms is that I do not believe counterfactual reasoning (or causal reasoning in general) is feasible without it. The puzzle with a counterfactual sentence, say q > p, is that it involves a relationship between two PROPOSITIONS, q and p, not between an action and a proposition, and yet we treat q as an ACTION. How? In order to change the actual world to satisfy q we need to translate q into some action and to decide which mechanisms are to be altered by that action. Every theory of counterfactuals ought to explicate how these decisions are made in the representational scheme employed. It is quite possible that Costello and McCarthy's theory embeds these decisions implicitly in their analysis of the skiing example (which I missed). If they did, I believe they should make them formal, general and explicit. Here is an example of some general principles that people have proposed for counterfactual analysis. Balke, Galles and Pearl (ijcai-99) identified 3 necessary steps in the evaluation of a counterfactual sentence: 1. Abduction, 2. Action and 3. Prediction. For example, to evaluate the sentence: "if rifleman-1 had not shot, the prisoner would still be dead. we must execute the following three steps:
If Costello and McCarty consider these steps NOT necessary for the evaluation of counterfactuals, I would invite them to posit alternative generic steps which they do consider necessary. (In this case, I would also challenge them to evaluate the sentence above with their alternative steps -- we must be concrete.) If they do consider these steps to be necessary, then I would ask them to identify where in the Costello-McCarthy formalization we find traces of these steps, and how we should go about deciding (in step (2)), what changes to make to the theory so as to accommodate the counterfactual antecedant "had rifleman-1 not shot". Have I left out another possibility? Yes, that Costello and McCarthy consider these three steps to be necessary but not sufficient. In this case, the readers of this paper would want to learn what additional principles they deem necessary, and in what kind of theories the need for the new principles will become urgent. The difficulty I had in reading this paper stemmed from not knowing where the authors stand on these possibilities and, consequently, I could not see the principles that the paper advocates for the analysis of counterfactuals . I hope the authors will provide this information in a revised version. Judea Pearl PS. Transcript and slides of my IJCAI-99 lecture are now available on http://www.cs.ucla.edu/~sunshine/pres/ijcai99.htm A1. Tom Costello and John McCarthy (27.6):
1. Judea Pearl considers all counterfactuals as useful and therefore has no use for our singling out useful counterfactuals. We thought our distinction was clear, but we have added some material to the article to make it more clear. An example of a not demonstably useful counterfactual is "If Caesar was in charge in Korea, he would have used catapults". It is difficult to see how the truth of this might teach us something. Pearl's example of the firing squad is probably (at least 0.8) useless in most Americans daily lives. No use of this counterfactual was offered. We wrote that a counterfactual is useful if believing it can affect behavior. Our example was "If another car had come over the hill when you passed, there would have been a head-on collision." If the driver believes it, he will be more conservative about passing in the future. While some theory is involved in accepting the counterfactual, it is basically about a single experience and an associated almost experience (i.e. the collision). Contrast this counterfactual with Pearl's examples about the firing squad, e.g. "If A had not fired the prisoner would have died anyway." Pearl offers no suggestion about how believing it would help design better firing squads or would help the victim escape. Indeed the counterfactual is entirely derived from theory - no specific experience plays any role. Useful counterfactuals like the car example have another property. They have non-counterfactual consequences, e.g. "Passing under the conditions of the example is unsafe." This is in contrast with David Lewis's theories, where counterfactuals have no non-tautologcal non-counterfactual consequences. We have modified the article to emphasize this aspect of useful counterfactuals. 2. Pearl considers our skiing example exotic. We admit to more experience with skiing than with firing squads. We doubt Pearl's experience is otherwise. More to the point, the counterfactual sentences about skiing have many useful non-counterfactual consequences. 3. Very likely, we should have compared our useful common sense counterfactuals with those involved in stating scientific laws. However, so far as we know, the literature relating them does not describe drawing non-counterfactual conclusions from the counterfactuals themselves. It concerns the interpretation of counterfactuals rather than their use. Maybe the defenders of historical counterfactuals discuss non counterfactual consequences. 4. Pearl asks why humans resort to counterfactuals. We think it is because of their useful non-counterfactual consequences. 5. Pearl suggests that the two differences we point out between Structured Equational Models and Cartesian Counterfactuals are illusionary. We agree that they are not major technical differences, but are mainly differences in subject matter, and presentation. Cartesian Counterfactuals are applied to commonsense theories expressed in First Order Logic, in contrast to SEMs which are applied to other domains, and are expressed in equations. Pearl also suggests that the theorem we give is tangential. We agree that it does not address the underlying question of where Cartesian Frames or Causal Mechanisms come from. Tom Costello and John McCarthy Q2. Graham White (31.12):
Question 1. Costello and McCarthy, 10.1: "Theories given by differential equations give some clearcut examples. The solutions are determined by boundary conditions. If the theory includes both the differential equations and the boundary conditions, the most obvious counterfactual is to keep the differential equations and change the boundary conditions to a different set of admissible boundary conditions. A simple example is provided by the equations of celestial mechanics regarding the planets as point masses." Here is a question. Consider a classical, deterministic dynamical system. In such a system, the future evolution is given by the positions, and velocities, of its components at a particular time. The space of all position-velocity pairs -- usually known as phase space -- will be our "space of possible states" (Costello and MacCarthy 3.2). So the question is: can we set up coordinates on phase space, which are coordinates in MacCarthy and Costello's sense (that is, which allow for their sort of counterfactual reasoning) and for which time is one of the coordinates? Suppose so. Then time would be one of the coordinates, and would thus vary along trajectories of the system. The other coordinates would, by the orthogonality requirement, differentiate between trajectories. MacCarthy and Costello's orthogonality requirements would thus amount to the following: i) if we change the time coordinate, but keep the other coordinates fixed, then we move along a fixed trajectory, and ii) if we fix the time coordinate, but change the others, then we vary the trajectory. Let us call the other coordinates the trajectory coordinates. We can regard the trajectory coordinates as functions on phase space. The trajectory coordinates, by requirement i), would have to be constant along trajectories: by requirement ii) -- and by the fact that they are genuinely coordinates -- there would have to be enough of them so that any two distinct trajectories differed in at least one of the trajectory coordinates. Now functions like these trajectory coordinates are well known in classical mechanics: they arose in the study of a similarly natural problem in dynamics, namely the problem of finding closed-form solutions to systems of differential equations. Functions on the phase space which satisfy requirement i) are known as integrals of a dynamical system ([1], pp. 79ff.; cf. [2]). Integrals are, however, very rare: most systems do not have enough of them to differentiate between trajectories. Consider the example which Costello and MacCarthy use, that of "the equations of celestial mechanics regarding the planets as point masses." Even with only three particles, the equations do not have enough integrals: the only constants of the motion are the momentum, and angular momentum, of the system, together with the total energy. Trajectories with fixed values of these quantities lie in seven-dimensional subspaces of the phase space ([1], pp. 98ff.): there is still far more variation possible than can be described by values of the integrals. (The fact that integrals are hard to find can be explained by Noether's theorem, which relates integrals to symmetries of the dynamical system: most systems are not symmetrical enough to have many integrals, and it is only those which are exceptionally symmetrical which are completely integrable.) So, for most dynamical systems, we cannot find a set of Costello-MacCarthy coordinates in which time is a coordinate. Which means that (strangely enough) we cannot apply their approach to the semantics of assertions like "if the ball is here now, it will be there in a second": that is, assertions which, on the basis of observations of a state of the system, predict another state of the system which differs only in distance along the trajectory, all other things being fixed. Question 2a. Does Costello and MacCarthy's theory have a semantics, in Quine's sense? Quine, in [3], outlines an approach to semantics which is presented as a story about language acquisition (either by an infant, learning from the adults around her, or by an anthropologist, learning from a strange tribe); the semantics, however, tells us about more than merely language acquisition. Quine distinguishes between two sorts of terms: there are those, such as 'water' and 'red', which are purely phenomenal and do not involve any grasp of conditions of identity. Correct application involves merely being able to respond to the appropriate set of stimuli [3, p. 7]. On the other hand, there are terms, such as 'apple', the use of which involves a grasp of conditions of identity: "To learn 'apple' it is not sufficient to learn how much of what goes on counts as apple; we must learn how much counts as an apple, and how much as another" [3, p. 8]. Terms such as these, says Quine, are those "whose ontological involvement runs deep" [3, p. 8], the terms from the use of which a speaker's implicit ontological commitments can be recovered. Mastery of these terms involves being able to "ask whether this is the same so-and-so as that, and whether one so-and-so is present or two." [3, p. 2] Crucial among these abilities is being able to re-identify the same object under two different descriptions. How does this apply to Costello and MacCarthy's talk of the possible states in their "space of possible states"? Well, if we were to encounter the same state again, under another description, we may well do so using another descriptive vocabulary, which would put another coordinate system on the space of possible states. So -- if we follow Quine's line of reasoning -- we seem to be asking whether there are any allowable coordinate transformations on the space of possible states. Costello and MacCarthy argue, it is true, that some coordinate transformations on this space are inadmissible (because they alter the counterfactual properties of the states in this space): but, if we are to follow Quine, there must also be admissible transformations. For if not, then we cannot really be talking of objects when we talk of these states: they would have no coherent conditions of identity. Question 2b. Related to Question 2a is the folowing, more practical question: it uses an example of Davidson [6], who gets it from Austin [7]. It is best given by an extended quote. " `I didn't know that it was loaded' belongs to one standard pattern of excuse. I do not deny that I pointed the gun and pulled the trigger, nor that I shot the victim. My ignorance explains how it happens that I pointed the gun and pulled the trigger intentionally, but did not shoot the victim intentionally. ... What is the relation between my pointing the gun and pulling the trigger, and my shooting the victim? The natural ... answer is that the relation is that of identity. The logic of this sort of excuse includes, it seems, at least this much structure: I am accused of doing b, which is deplorable. I admit I did a, which is excusable. My excuse for doing b rests upon my claim that I did not know that a = b." [6, p. 109] Thus: an important part of commonsense reasoning is to be able to describe the same events using different vocabularies. Our descriptions of those events may, of course, apprehend certain features or other of them: but there is a notion of identity of events, which we seem to use in everyday reasoning, which transcends difference of vocabulary. So how can we accomodate this sort of reasoning in Costello and MacCarthy's framework? Question 3a. Einstein writes "If, relative to K, K' is a uniformly moving coordinate system devoid of rotation, then natural phenomena run their course with respect to K' according to exactly the same general laws as with respect to K". [5, p. 18] Principles like these -- which assert that the laws of physics are invariant under transformations of coordinates -- are, according to the story which Einstein tells, fundamental to modern physics: progress in physical understanding is accompanied by the enlargement of the group of transformations which the laws of physics are invariant under (first galilean transformations, then Lorenz transformations, then, with general relativity, curvilinear coordinate transformations which preserve the Lorenz metric). Is a similar outlook possible with Costello and MacCarthy's framework? Are we allowed to speak of coordinate transformations? Would an increase in our understanding of the commonsense laws of nature be accompanied by a parallel increase in the number of allowed coordinate transformations? Question 3b. This same point has practical consequences: in physics, we want to change coordinates, either because we can solve the equations more easily thereby, or because the new coordinate system illuminates some feature of the situation which we want to concentrate on. Thus, the study of celestial mechanics was accompanied by the intensive use of coordinate transformations: partly these were to permit solutions (one would try to find coordinate systems in which the motions could be expressed as sums of trigonometric functions) and partly they were for conceptual reasons (one wanted to investigate the stability of the solar system, and thus one would try to find coordinate systems which separated periodic changes from the so-called "secular", non-periodic changes). [8, pp. 15ff.] If Costello and MacCarthy's counterfactuals are to be "useful", are we allowed, in searching for solutions, to change our vocabulary to one which is useful for us -- either because it permits an easier solution, or because it lays bare some feature of the situation which we want to concentrate on? [1] V.I. Arnol'd, V.V. Kozlov, and A.I. Neistadt, "Dynamical Systems III: Mathematical Aspects of Classical and Celestial Mechanics", vol. 3 of the Encyclopaedia of Mathematical Sciences, Springer 1988. [2] G. White, "Lewis, Causality and Possible Worlds", Dialectica 54 (2000), pp. 133-137. [3] W.v.O. Quine, "Speaking of Objects", in [4, pp. 1-25]. [4] W.v.O. Quine, Ontological Relativity and Other Essays (Columbia 1969). [5] A. Einstein, Relativity: The Special and the General Theory (London 1962) [6] D. Davidson, "The Logical Form of Action Sentences", in D. Davidson, Essays on Actions and Events (Oxford 1980), pp. 105-122. [7] J.L. Austin, "A Plea for Excuses". [8] J. Barrow-Green, Poincare and the Three Body Problem (American Mathematical Society 1997) A2. Tom Costello and John McCarthy (4.4.2001):
Because of the length of Graham White's comments on our "Useful Counterfactuals", we decided to summarize those aspects of his questions for which we have possibly interesting answers. 1. White asks about the applicability of Cartesian counterfactuals to celestial mechanics. As we said in the paper and as White expounds, one can take the initial phase space co-ordinates (positions and velocities) of the planets together with time. Counterfactuals can be formulated in terms of varying these co-ordinates. I don't see that our theory has much to add in this case. White goes on to ask about integrals of the motion, pointing out that there aren't enough to make a co-ordinate systems when there are more than two bodies. Nevertheless, it may be possible to formulate some interesting counterfactuals in terms of approximate integrals of the motion. Thus we may ask, "Would the solar system be unstable if Jupiter had 50 percent more angular momentum?, and there may be some sense in which this question can be answered. 2a. White asks if our theory has a semantics in Quine's sense. We haven't looked. In particular, we haven't looked at conditions of identity. 3a. White asks about invariance under co-ordinate transformations. Many common sense theories are effectively invariant under time translations, e.g. the effects of actions in blocks world theories don't depend on when the action was performed. Some common sense theories also have some kinds of spatial invariance. In common sense theories of trading, total money may be conserved and in common sense physics, total mass is conserved. Nothing as high-powered as Noether's theorem is involved. Maybe a Hamiltonian is required for that. Almost no common sense reasoning involves invariance under change to a relatively moving co-ordinate system, first of all because many common sense thories don't use co-ordinate systems in which relative motion is formulatable. Moreover, common sense physics gives the local earth a privileged position, since lots of the objects aren't systematically moving relative to one another. Invariance of physical laws relative to such transformations was the big discovery of Galileo and Newton. 3b. White asks whether changes of vocabulary to make reasoning more convenient are allowed. Yes. |