Thoughts About A Science Of Evidence

Thoughts About A Science Of Evidence0%

Thoughts About A Science Of Evidence Author:
Publisher: www.ucl.ac.uk
Category: Western Philosophy

Thoughts About A Science Of Evidence

This book is corrected and edited by Al-Hassanain (p) Institue for Islamic Heritage and Thought

Author: DAVID A. SCHUM
Publisher: www.ucl.ac.uk
Category: visits: 8325
Download: 3156

Comments:

Thoughts About A Science Of Evidence
search inside book
  • Start
  • Previous
  • 17 /
  • Next
  • End
  •  
  • Download HTML
  • Download Word
  • Download PDF
  • visits: 8325 / Download: 3156
Size Size Size
Thoughts About A Science Of Evidence

Thoughts About A Science Of Evidence

Author:
Publisher: www.ucl.ac.uk
English

This book is corrected and edited by Al-Hassanain (p) Institue for Islamic Heritage and Thought

Bayes' Rule and the Force of Evidence

Reasoning from evidence is a dynamic process in which we revise our existing or prior beliefs about hypotheses or propositions on the basis of relevant evidence to form new or posterior beliefs about these hypotheses. There is a consequence called Bayes' rule that tells us how this dynamic process should occur when our beliefs are expressed probabilistically in accordance with three basic rules most of us learn in school. These rules are:

Probabilities are positive numbers or zero [i.e. there are no negative probabilities]

The probability of a "sure" event [one certain to happen] is 1.0,

If two or more events cannot happen together [i.e. they are mutually exclusive], the probability that one or the other of these events occurs is equal to the sum of their separate probabilities.

All probabilities are dependent, or conditional upon, what else we know or find out. Here is an event E whose probability we are interested in determining. But we also learn that event F has occurred; so we have an interest in determining the probability of E, given or conditional on F. Conditional probabilities obey the same three rules given above. There is a consequence of rules for conditional probabilities, called Bayes' rule, that tells us how much, and in what direction, we should revise orprior beliefs about some hypothesis based on new evidence we obtain. The result of this determination is what is called aposterior probability . This rule originated in the work of Thomas Bayes [1702 - 1761] a dissenting clergyman who lived in Tunbridge Wells. Those interested in finding more about Bayes can consult a recent biography of him[136].  Bayes' rule as been called the first mathematical canon for inductive reasoning.

Now, there are terms in Bayes' rule calledlikelihoods that wlll tell us by how much, and in what direction, we should revise our prior beliefs into posterior beliefs, given some new evidence.  We often consider ratios of these likelihoods. In any case, these terms are indications of the inferential force of the evidence we have obtained. Figure 5 is a picture of what these likelihoods express. Suppose defendant Dick is on trial for shooting victim Vick. Here is an item of evidence received during Dick's trial: Dick owned the revolver used in the shooting of Vick.

The force of this evidence on whether Dick shot Vick is given by the ratio of Likelihood 1 to Likelihood 2. If you believe Likelihood 1 is greater than Likelihood 2, you are saying that this evidence favours the proposition that Dick shot Vick by an amount indicated by the size of this ratio. Likelihood ratios express both the inferential force and direction of evidence. If you said that this ratio was 5, you are saying that this evidence is five times more likely if Dick shot Vick than if he didn't shoot Vick. Directionally, this evidence points to Dick shooting Vick. Bayes' rule would say that you are entitled to increase your prior belief that Dick shot Vick by a factor of 5. Thus, in accordance with FRE 401, discussed above concerning relevance, this evidence would indeed be relevant since it allowed you to change your belief in the probability of a material or consequential issue. This is what Richard Lempert observed to be a virtue of the way Bayes' rule grades the inferential force of evidence[137].

For many years now I have studied likelihood ratio formulations for the force of every form and combination of evidence I could think of. I have reviewed many of these studies in another work[138]. Bayes' rule is a marvellous device for capturing a very wide array of evidential subtleties or complexities for study and analysis. Likelihood ratios can be expressed for collections of evidence and not only for individual items as shown in Figure 5. Bayes' rule incorporates a property calledconditional dependence that is the finest property I know of for capturing evidential subtleties or complexities. I will return to Bayes' rule again when I discuss the discovery of evidence. We normally view probability theories as being involved just in the inductive justification of hypotheses. But application of this rule can prompt us to ask questions we may never have thought of asking. These questions may open up new lines of inquiry and new lines of evidence. But as rich as it is, Bayes' rule does not have all there is to say about the force, weight or strength of evidence.

Evidential Support and Evidential Weight: Nonadditive Beliefs

The formal system leading to Bayes' rule rests on axioms taken by many to be self-evident. The person who first formed the three axioms I mentioned above was the Russian mathematician A. N. Kolmogorov[139]. In his works Kolmogorov makes clear that his axioms assume situations involving replicable events in which probabilities can be determined by counting. The two basic examples arealeatory probabilities in games of chance andrelative frequencies in statistics. But there are many situations in which we have doubt or uncertainty about events that are the result of processes that cannot be repeated and in which no counting is possible. I refer here to unique, singular, or one-of-a-kind events. These situations are very common in a variety of contexts such as history, law, intelligence analysis, and in everyday experience. I have my own belief about the probability that Nicola Sacco was guilty of killing Berardelli, but I cannot play the world over again 1000 times to observe the number of occasions on which he did it. Various attempts have been made to apply probabilistic concepts in these non-enumerative situations.

Many persons take the view that probabilities can be epistemic, subjective, or judgmental in form and rest on whatever information we happen to have that we believe to be  relevant to an assessment of the probability of interest. Applications of Bayes' rule requires at least one epistemic probability; we need an initial prior probability in order to get the dynamic probability revision process started. Regarding some hypothesis or proposition H, we need to assess how likely is H before we begin to gather relevant evidence. Many persons have no hesitation in supplying epistemic judgments of prior probabilities and other ingredients of Bayes' rule, including likelihoods, provided that these probabilities conform to the Kolmogorov axioms. As I mentioned at the close of Section 3.2, this is what led Professor Mario Bunge to refer to colleagues as "charlatans", engaged in "pseudoscience", who are willing to assess the prior probability ingredients of Bayes rule in the form of epistemic or subjective judgements. In short, Bunge and others rejectany view of probability as making sense when we have nothing to count. Bunge would really come unstuck if he read the works of the person whose views of the force of evidence I now mention.

Professor Glenn Shafer [Rutgers University]  has given very careful thought to epistemic or judgmental probabilities necessary in situations in which we have nothing to count[140]. He begins by denying the self-evident nature of the third of Kolmogorov's axioms; it is called theadditivity axiom . Recall that this axiom says that if event E and F cannot occur together, then the probability that one or the other occurs is always equal to the sum of their separate probabilities. But there is an added consequence of this axiom. When we have mutually exclusive events that are also exhaustive [one or the other must occur] then the sum of their probabilities is 1.0, in accordance with the second axiom for "sure" events. Thus, if we have two hypotheses H and not-H, their probabilities must sum to 1.0, a priori, or a posteriori, given any evidence. In short, if you believe the probability of H is p, you must also believe the probability of not-H is (1 - p). Thus, if you increase the probability if H, you must decease the probability of not-H. Shafer says this is an unfortunate property in many situations involving epistemic probability judgments. He offers several reasons why this additivity property causes trouble.

Shafer is well aware of some important ideas that have been around for a long time, such as Jakob Bernoulli's distinction between  "mixed" and "pure" evidence that he described in his treatise:Ars Conjectandi in 1713. Mixed evidence offers some degree of support to every hypothesis being considered. But pure evidence says nothing about certain hypotheses and offers them no support at all. As an example of pure evidence, suppose that Tom, Dick and Harry are suspects in the theft of a valuable object from the home of the owner.  There were no signs that the house had been broken into. Tom is found with a key to this house. This would be pure evidence since it offers support for Tom's having stolen the object; but it says nothing about Dick or Harry.

Shafer says we need a different measure of the support that evidence may provide hypotheses. So, he defines a measure ofevidential support, S, which he equates to theinferential weight of evidence. Like ordinary probabilities, 0 ≤ S ≤ 1.0. But S has a different meaning than do the likelihoods discussed above that indicate the force of evidence in Bayes' rule. When S = 0, all this means is that evidence provides no support to some hypotheses. But when a likelihood has zero value this means that this hypothesis is impossible. It says the the probability of the evidence we have is zero, given this hypothesis. Bayes' rule then assigns zero probability to this hypothesis. There is an entirely different meaning of the role of zero in ordinary probability and Shafer's S scale. On the ordinary probability scale, zero indicates impossibility ordisbelief , On the S scale, S = 0 meanslack of support or lack of belief . Our belief in hypothesis H can be revised away from zero when we do have evidence that supports it to some degree. But we cannot revise the probability of a hypothesis away from zero that has been determined to be impossible. Disbelief and lack of belief are different judgmental conditions.

There is a very important consequence associated with the manner in which Shafer support S is assigned. We are allowed towithhold support assigned to hypotheses in various ways when we cannot decide what the evidence means. As I will illustrate in an example, this characteristic of Shafer's system leads to conditions in which our beliefs are nonadditive, as they must be using Bayes' rule. Here is how S is assigned.  Suppose we have some number n of hypotheses that are disjoint or mutually exclusive, but they are not necessarily exhaustive. We might think of others later on, or revise the ones we are considering at the moment. All the set of n hypotheses represents is how we see or inferential situation at the moment. Shafer call this collection of n hypotheses ourframe of discernment , F. We do not assign S to just these n hypotheses by themselves, as we must do in assigning likelihoods in Bayes' rule, but we assign S to subsets of these hypotheses in our frame F. When there are n hypotheses in F, then there are 2n possible subsets of hypotheses in our frame. The set of all 2n hypotheses is called apower set . Here is the simplest case in which we have two mutually excusive hypotheses that are also exhaustive, H and not-H. The power set of hypotheses in our frame F consists of: {H}, {not-H}, {H, not-H} and Ø, where Ø is the set of none of them [Ø is called the "empty set"]. Also, read the set {H, not-H} as: "either H or not-H". We are allowed to assign S in any way we please across the non-empty subsets of a power set, except that they must sum to 1.0 and with the additional provision that Ø always gets S = 0.

Here is an example of support assignment that involves a very good instance in which being indecisive about what evidence means, and being able to reflect our indecision in our beliefs, is a major virtue of Shafer's evidential reasoning system. This example involves William Twining's favorite law case:Rex v. Bywaters and Thompson , that was tried at the Old Bailey, on December 6 -11, 1922[141]. Edith Thompson was charged with either conspiring with Freddy Bywaters to kill her husband Percy Thompson on the particular occasion when he did it [October 3, 1922], or she incited Freddy to kill Percy whenever an occasion presented itself.  A classic love triangle appears in this case. Freddy boarded in the Thompson's home; but he was frequently away; he worked aboard ships. Freddy and Edith became lovers and carried on their affair until the time of Percy's death. Edith and Freddy corresponded daily when Freddy was away, either through the mails or by what were then called Marconigrams. Freddy kept all of the correspondence he received from Edith, but Edith kept none of those she received from Freddy.

I have never encountered finer examples of ambiguous evidence than the letters Edith wrote to Freddy. What is clear is that these letters appear to have convinced the twelve male jurors of her guilt. She was hanged on January 9, 1923 at Holloway; Freddy was hanged the same day at Pentonville. Some of these letters mention poisons of various sorts, some mention broken glass, others contain comments suggesting that Edith had tried to kill Percy herself. Other letters seem to give the impression that Edith and Freddy had made plans to do away with Percy. But the Shakespearean scholar Professor Rene Weis [also at UCL] puts a different interpretation on her letters that he provides in a very careful analysis of Edith's case[142].  Twining and Weis agree that Edith was innocent but do so from different standpoints and using different methods[143]. Twining uses this case to give examples of the truly complex situations in which Wigmore's argument structuring methods can be employed.

Using Shafer's method for assigning evidential support, or weight, here is how I view Edith's letters as supporting her being guilty, G , or not guilty, not-G, as she was charged. Let SL represent the support I have assigned to the entire collection of her letters to Freddy. The power set of these hypotheses is: {G}, {not-G}. {G, not-G] and Ø

{G}            {not-G}                   {G, not-G}                Ø

   SL:     0.3          0.2                  0.4              0

Here is what my S assignment means. I think the letter evidence supports her guilt to degree 0.3, and her being not guilty to degree 0.2. But I am undecided to degree 0.4 about what this letter evidence says, and so I assign this amount to the set {G, not-G} because I cannot tell whether this ambiguous evidence specifically supports G or not-G. This setting of S represents the amount of support I have withheld from either {G} or {not-G}.

The above assignment of support corresponds to my beliefs [Bel] in a way that Shafers system allows I have Bel{G} = 0.3 and Bel{not-G} = 0.2. My beliefs in this case are non-additive since Bel{H} = 0.3 + Bel{not-G} = 0.2 = 0.5, which is less than 1.0. If I had used a Bayesian approach, I would be required to say that Bel{H}  + Bel{not-G} = 1.0 since G and not-G are mutually exclusive and exhaustive. In short, Bayes' rule does not allow me to be indecisive about what I think the evidence means.

Shafer's system, often called a system ofbelief functions , is very useful in capturing elements of our probabilistic beliefs that are difficult, or impossible to capture with ordinary probabilities. Because one of the Kolmogorov axioms is violated, Bayes' rule does not appear is Shafer's belief function system. It is replaced by what is calledDempster's rule , which allows us to combine support assessments S for successions of evidence. This rule allows to calculate what is called theorthogonal sum of S assignments for different items of evidence. This system has found application in a number of important contexts in which epistemic judgmental assessments are necessary in the reasoning tasks at hand.

Evidential Completeness and the Weight of Evidence in Baconian Probability

Francis Bacon [1561 - 1626] is usually credited with being the first to argue that we can never justify hypotheses about how nature works just by compiling instances that are favourable to them. What he argued was that negative instances are at least as informative as positive instances. In fact, what we should do in testing hypotheses is to perform experiments designed toeliminate possible hypotheses. The one or ones that resist our best efforts to eliminate any of the hypotheses we are considering are the ones in which we should have the most confidence. This view has been calledeliminative induction . But Bacon was never specific about what eliminative methods could be employed. As I noted in Section 3.1, John Stuart Mill is usually credited with being the first to identify methods designed to eliminate possible causes for the effects we observe in nature. But such methods were known much earlier to the four Oxford scholars I mentioned.

But there is another important element in the eliminative testing of hypotheses. The tests we perform must bevariative in the sense that we must establish the array of conditions under which we may expect a hypothesis to continue to remain valid. We cannot do this by performing the same test over and over again. The only thing this repetitive testing would accomplish is to increase our confidence in the reliability of this single test's results.  The more varied are the conditions under which some hypothesis holds, the more confidence we can place in it. But this variative testing raises another important question, namely:how complete has been our eliminative testing of our hypotheses? There may be other important tests of our hypotheses that we have not performed whose results might serve to eliminate hypotheses we are still considering.

Neither Bacon, Mill, Popper, nor anyone else was successful in relating problems associated with the eliminative and variative testing of hypotheses to ordinary probabilistic concepts. The first person to study this relation was L. Jonathan Cohen [now emeritus, Queen's College, Oxford]. In a work that had a great influence on probabilistic thinking in law and philosophy, Cohen was the first to generate a theory of probability expressly congenial to the eliminative and variative testing of hypotheses[144]. He refers to this theory asBaconian probability to acknowledge its roots in the works of Francis Bacon. On occasion, he also calls it a theory ofinductive probability . In his works Cohen takes a decidedly ecumenical [or "polycriterial", as he calls it] view of probability in evidence-based reasoning. He allows that conventional views of probability make perfect sense in some but not all situations. He further argues that conventional views of evidence-based reasoning, such as Bayes' rule, overlookhow much evidence has been considered and how complete is this evidential coverage of matters believed to be relevant in the inference at hand. Eliminative and variative inference requires special considerations. In fact,evidential completeness , in Cohen's view, is the major factor associated with the weight of evidence.

In Figure 6 below is a diagram I have used to illustrate some of Cohen's key ideas in Baconian probability. I have tried my best to generate interest in the importance of Cohen's views among persons in a variety of contexts who should be aware of his ideas regarding evidential completeness. I have gone to great lengths in some contexts, but not always with any great success[145]. There are two basic questions that arise in Cohen's views about the weight of evidence: (i) How much uncounteracted favourable evidence do we have on some hypothesis that has arisen in answer to relevant questions we have asked?, and (ii) How many relevant questions, that we know about, remain for which we have no evidential answers? In short, the weight of evidence in Cohen's view depends not only on answers to questions we have asked, but also upon how many questions remain unanswered. Cohen's Baconian views about the weight and amount of evidence bring to mind ideas expressed by John M. Keynes in his very influential treatise on probability[146]. Keynes's ideas about the amount and the weight of evidence have often been misunderstood.  Cohen has written on various questions that have arisen regarding the views of Keynes on the weight of evidence[147].

Here are some details of the cover story surrounding Figure 6. Some time ago we were asked to assess which of three hypotheses, H1, H2 and H3, is most likely because it will have an important bearing on a decision we must make.  Initial evidence pointed very strongly to H1 being true, so we took an action based on H1.  What we are now doing is engaging in a post mortem analysis trying to see what went wrong; H3 happened to occur and our decision miscarried. Our decision produced a disastrous result. Someone says: "How could we have gone wrong? We used Bayes' rule to aggregate what our assessments of likelihoods for the evidence we had, and we all agreed that the prior probabilities we were using made perfect sense. Bayes' rule said that the posterior probability of H1 was 0.997 based on the evidence we incorporated in our inference".

If Jonathan Cohen happened to be present during our post mortem, here is what he might have said: "You were out on an inferential limb that was much longer and more slender than you believed it to be, just based on the answers your existing evidence provided. How many relevant questions do you now realise to have been unanswered in your analysis?" We begin to make a list of questions we believe also relevant that we did not attempt to answer; this list grows quite large. It also contains questions we knew about at the time of our analysis. However, we believed the evidence we did take account of was sufficiently strong that we did not hesitate to conclude that H1 was true. Here is a picture of the actual inferential limb we were on.

Jonathan Cohen goes on to explain the two parts of this inferential limb on which we found ourselves. He says: "The strong part consists of the evidence you had that was favourably relevant to H1. The weak part consists of relevant questions that remained unanswered. What you did in concluding that H1 was true was to assume essentially that the answers to all of the questions that you did not ask would have been favourable to H1. The problem is that a very high Bayesian posterior probability is not a good indicator of the weight of evidence because it does not grade the completeness or sufficiency of evidence".

In another work I have compared Baconian and Bayesian approaches when we encounter chains of reasoning in arguments we construct[148]. There is nothing incompatible about these two  approaches to evidence based reasoning. The reason is that they each respond to different, but equally important, considerations. Bayes' rule provides very useful measures of how strong is the evidence you do have, but Cohen's Baconian probabilities allow us to grade the completeness of our evidence. I ended up concluding that both forms of hedging conclusions would be necessary on many occasions.

Verbal Assessments of the Force of Evidence: Fuzzy Probabilities

In so many situations we talk about the force of evidence, and express the strength of our conclusions, in words rather than in numbers. There are no better examples than those occurring in the field of law. Forensic standards of proof such as, 'beyond reasonable doubt", "clear and convincing evidence", "probable cause", and so on, are verbal assessments that seem to defy efforts to translate them into numerical probabilities. In his analysis of what we now call inference networks, Wigmore understood perfectly well that the arrows linking evidence and probanda, such as those illustrated in Figure 2A, are probabilistic in nature. But he always used words rather than numbers to indicate the force with which one element of an argument is linked to others[149].  He used terms such as "strong force", "weak force", and "provisional force" to indicate the strength of these linkages. The use of words rather than numbers to indicate the force of evidence appears in many other contexts, especially when there is no attempt to employ and combine any of the views of evidential force described above.

There are algorithms for combining numerical probabilities, such as Bayes' rule and Dempster's rule, but how do we combine assessments of the force of evidence that are given in words? Wigmore gave no hint about how we should combine his verbal assignments of evidential force in order to grade the force of an overall mass of evidence. Verbal assessments of probabilities in grading the force of evidence, and in stating the strength of an overall conclusion, are today referred to asfuzzy probabilities , in part to acknowledge their imprecision. But thanks to the work of Lotfi Zadeh and his many colleagues worldwide, there is a logic that underlies the expression and combination of verbal or fuzzy probabilities[150]. This system of fuzzy logic and probabilities has found wide acceptance in many situations in which persons must perform a variety of tasks based on fuzzy or imprecise ingredients. But it does have its detractors[151].

I have now completed my comments on the essential properties or credentials of evidence: relevance, credibility, and inferential force, weight, or strength. I have taken some care in discussing these properties in order to illustrate how study of them involves the classificatory, comparative, and quantitative concepts that both Poincaré and Carnap said were involved in science. I next comment on the uses of evidence and will show how these same concepts arise.

4.3 On the Uses of Evidence

We all use evidence every day of our lives in connection with our inferences and decisions, whatever their substance and objectives might be. William Twining has provided a characterization of evidence that seems to cover the use of evidence in any context you can think of. He says[152]:

'Evidence' is a word of relation used in the context of argumentation (A is evidence of B). In that context information has a potential role as relevant evidence if it tends to support or tends to negate, directly or indirectly, a hypothesis or probandum. One draws inferences from evidence in order to prove or disprove a hypothesis or probandum. The framework is argument, the process is proof, the engine is inferential reasoning.

I am going to provide two examples of uses of evidence. The first will illustrate Poincaré's assertion that science relies upon classifications and is the study of relations, some of which can be expressed in quantitative terms. The second involves Carnap's comparative and quantitative concepts and their importance in science and in our everyday lives.