CHAPTER I-2

                    STATISTICAL INFERENCE AND RANDOM SAMPLING

             Continuity and sameness is the first fundamental concept in

        inference in general, as discussed in Chapter I-1.  Random

        sampling is the second of the great concepts in inference, and it

        distinguishes probabilistic statistical inference from non-

        statistical inference as well as from non-probabilistic inference

        based on statistical data.  When the data of interest are not the

        result of random sampling, a sample drawn at random is the ideal

        to which the actual sample is compared.  And the properties of a

        randomly-drawn sample are utilized on the assumption that the

        actual sample is sufficiently close to the ideal.<1>

             The usual goal of a statistical inference is a decision

        about which of two or more hypotheses one will thereafter choose

        to believe and act upon.  The strategy is to consider the

        behavior of a given universe in terms of the samples it is likely

        to produce, and if the observed sample is not a likely outcome we

        then proceed as if the sample did not in fact come from that

        universe. (The previous sentence is a restatement in somewhat

        different form of the core of statistical analysis.)

             At a more technical level now:  Probably the most important

        task of statistical inference is to determine the existence (or

        extent) of sameness when intuition alone does not provide a

        satisfactory answer.  Two common cases are a) the extent of

        overlap between two distributions, and b) the probability that a

        sample should be said to be the same as a universe in the sense

        of having been drawn from it.  The statistical inference may be

        thought of as an operational specification that makes more

        precise a previously-vague notion about sameness.

             Let's begin the discussion with a simple though unrealistic

        situation.  Your friend Arista a) looks into a cardboard carton,

        b) reaches in, c) pulls out her hand, and d) shows you a green

        ball.  What might you reasonably infer?

             You might at least be fairly sure that the green ball came

        from  the carton, though you recognize that Arista might have had

        it concealed in her hand when she reached into the carton.  But

        there is not much more you might reasonably conclude at this

        point except that there was at least one green ball in the carton

        to start with.  There could be no more balls; there could be many

        green balls and no others; there could be a thousand red balls

        and just one green ball; and there could be one green ball, a

        hundred balls of different colors, and two pounds of mud - given

        that she looked in first, it is not improbable that she picked

        out the only green ball among other material of different sorts.

           There is not much you could say with confidence about the

        likelihood of yourself reaching into the same carton with your

        eyes closed and pulling out a single green ball.  To use other

        language (which some philosophers might say is not appropriate

        here because the situation is too specific), there is little

        basis for induction about the contents of the box.  Nor is the

        situation very different if your friend three times in a row

        reaches in and then hands you a green ball each time.

             So far we have put our question rather vaguely.  Let us

        frame a more precise inquiry:  What do we predict about the next

        item(s) we might draw from the carton?  If we assume - based on

        who-knows-what information or notions - that another ball will

        emerge, we could simply use the principle of sameness and (until

        we see a ball of another color) predict that the next ball will

        be green, whether one or three or 100 balls is (are) drawn.

             But now what about if Arista pulls out nine green balls and

        one red ball?  The principle of sameness cannot be applied as

        simply as before.  Based on the last previous ball, the next one

        will be red.  But taking into account all the balls we have seen,

        the next will "probably" be green.  We have no solid basis on

        which to go further. There cannot be any "solution" to the

        "problem" of reaching a general conclusion on the basis of these

        specific pieces of evidence.

             Now consider what you might conclude if you were told  that

        a single green ball had been drawn with a random sampling

        procedure from a box containing nothing but balls.  Knowledge

        that the sample was drawn randomly from a given universe is

        grounds for belief that one knows much more than if a sample were

        not drawn randomly.  First, you would be sure - if you had

        reasonable basis to believe that the sampling really was random,

        which is not easy to guarantee - that the ball came from the box.

        Second, you would guess that the proportion of green balls is not

        very small, because if there are only a few green balls and many

        other-colored balls, it would be unusual - that is, the event

        would have a low probability - to draw a green ball.  Not

        impossible, but unlikely.  And we can compute the likelihood of

        drawing a green ball - or any other combination of colors - for

        different assumed compositions within the box.  So the knowledge

        that the sampling process is random greatly increases our ability

        - or our confidence in our ability - to infer the contents of the

        box.

             Let us note well the strategy of the previous paragraph:

        Ask about the probability that one or more various possible

        contents of the box (the "universe") will produce the observed

        sample, on the assumption that the sample was drawn randomly.

        This is the central strategy of all statistical inference, though

        I do not find it so stated elsewhere.  We shall come back to this

        idea shortly.

             There are several kinds of questions one might ask about the

        contents of the box.  One general category includes questions

        about our best guesses of the box's contents - that is, questions

        of estimation; another category includes questions about our

        surety of that description, and our surety that the contents are

        similar or different from the contents of other boxes.  The

        estimation questions can be subtle and unexpected (Savage,

        1915/1972, Chapter 15), but do not cause major controversy about

        the foundations of statistics.  Hence I shall merely mention that

        the method of moments and the method of maximum likelihood serve

        most of our needs, and often agree in their conclusions;

        furthermore, we often know when the former may be inappropriate.

        So we can quickly move on to questions about the extent of surety

        in our estimations.

             Consider your reaction if the sampling produces 10 green

        balls in a row, or 9 out of 10. If you had no other information

        (a very important assumption that we will leave aside for now),

        your best guess would be that the box contains all green balls,

        or a proportion of 9 of 10, in the two cases respectively.  This

        estimation process seems natural enough.

             You would be surprised if someone told you that instead of

        the box containing the proportion in the sample, it contained

        just half green balls.  How surprised?  Intuitively, the extent

        of your surprise would depend on the likelihood that a half-green

        "universe" would produce 10 or 9 green balls out of 10.  This

        surprise is a key element in the logic of the hypothesis-testing

        branch of statistical inference.

             We learn more about the likely contents of the box by asking

        about the probability that various specific populations of balls

        within the box would produce the particular sample that we

        received.  That is, we can ask how likely a collection of 25

        percent green balls is to produce (say) 9 of 10 greens, and how

        likely collections of 50 percent green, 75 percent green, 90

        percent green (and any other collections of interest) are to

        produce the observed sample.  That is, we ask about the

        consistency between any particular hypothesized collection within

        the box and the sample we observe. And it is reasonable to

        believe that those universes which have greater consistency with

        the observed sample - that is, those universes that are more

        likely to produce the observed sample - are more likely to be in

        the box than other universes.

             What we have just one (to repeat, as I shall repeat many

        times) is the basic strategy of statistical investigation.  If we

        observe 9 of 10 green balls, we then determine that universes

        with (say) 9/10 and 10/10 green balls are more consistent with

        the observed evidence than are universes of 0/10 and 1/10 green

        balls. So by this process of considering specific universes that

        the box might contain, we make possible more specific inferences

        about the box's contents based on the sample evidence than we

        could without this process.

             Please notice the role of the concept of probability and the

        atcual assessment of probabilities here:  By one technical means

        or another (either resampling or formulas), we assess the

        probabilities that a particular universe will produce the

        observed sample, and other samples as well.

             It is of the highest importance to recognize that without

        additional knowledge (or assumption) one cannot make any

        statements about the probability of the sample having come from

        any particular universe, on the basis of the sample evidence.

        (Better read that last sentence again.)  We can only speak about

        the probability that a particular universe will produce (in

        contrast to did produce) the observed sample, a very different

        matter.  This issue will arise again very sharply in the context

        of confidence intervals.

             Let us generalize the steps in statistical inference:

             1.  Frame the original question as: What is the chance of

        getting the observed sample s from population S?  That is, what

        is probability of (If s then S)?

             2.  Proceed to this question: What kinds of samples does the

        postulated[<2> universe S produce, with which probability?  That

        is, what is the probability  of this particular s coming from S?

        That is, what is p(s!S)?

             3.  Actually investigate the behavior of S with respect to s

        and other samples.  One can do this in two ways:

             a.  One can use the calculus of probability, perhaps

        resorting to Monte Carlo methods if an appropriate formula does

        not exist.  Or,

             b.  Or one can use resampling (in the larger sense); the

        domain resampling is meant here to equal all Monte Carlo

        experimentation except for the use of Monte Carlo methods for i)

        approximations, ii) investigation of complex functions in

        statistics and other theoretical mathematics, and iii) uses

        elsewhere in science.  Resampling in its more restricted sense

        includes i) the bootstrap, ii) permutation tests, and iii) other

        non-parametric simulation methods of statistics.

             4.  Interpretation of the probabilities that result from

        step 3 in terms of i) acceptance or rejection of hypotheses, ii)

        surety of conclusions, or iii) inputs to decision theory.

             Here is the short definition of statistical inference:  The

        selection of a probabilistic model that might resemble the

        process you wish to investigate, the investigation of that

        model's behavior, and the interpretation of the results.

             We will get even more specific about the procedure when we

        discuss the canonical procedures for hypothesis testing and for

        the finding of confidence intervals in the chapters on those

        subjects.

             The discussion so far has been in the spirit of what is

        known as hypothesis testing.  The result of a hypothesis test is

        a decision about whether or not one believes that the sample is

        likely to have come from the "benchmark [postulated] universe" S.

        The logic is that if the probability of such a sample coming from

        that universe is low, we will then choose to believe the

        alternative - to wit, that the sample came from the universe that

        resembles the sample.  The underlying idea is that if an event

        would be very surprising if it really happened - as it would be

        very surprising if the dog had really eaten the homework - we are

        inclined not to believe in that possibility. (This logic will be

        explored further in Chapter 00 on hypothesis testing).

             We have so far assumed that our only relevant knowledge is

        the sample.  And though we almost never lack some additional

        information, this can be a sensible way to proceed when we wish

        to suppress any other information or speculation.  This

        suppression is controversial; those known as Bayesians or

        subjectivists want us to take into account all the information we

        have.  But even they would not dispute suppressing information in

        certain cases - such as a teacher who does not want to know

        students' SAT scores because s/he might want avoid the

        possibility of unconsciously being affected by that score, or by

        an employer who wants not to know the potential employee's ethnic

        or racial background even though it might improve the hiring

        process, or by a sports coach who refuses to pick the starting

        team each year until the players have competed for the positions.

        If the Bayesians will admit the reasonability of suppressing

        information in at least some situations, it will be a major step

        in accommodation and in bringing all views into greater harmony.

        (More about this topic in Chapter 00).

             Now consider a variant on the green-ball situation discussed

        above.  Assume that you are told that there is a (say) equal

        probability of the sample of nine green and one red balls being

        drawn from one of two specified universes - for example, two urns

        of balls, one with 50 percent green balls and the other with 80

        percent green balls.  On the basis of your sample you can then

        say how probable it is that the sample came from one or the

        other.  You proceed by computing the probabilities (often called

        the likelihoods in this situation) that each of those two

        universes would individually produce the observed samples -

        probabilities that you could arrive at with resampling, with

        Pascal's Triangle, or with a table of binomial probabilities, or

        with the Normal approximation and the Z distribution, or yet

        other devices.  Those probabilities are .01 and .27,  and the

        ratio of the two is between .03 and .04.  That is, fair betting

        odds are about 1 to 27.<3>

             Actual situations that fit this Neyman-Pearson model are not

        frequently found.  Let us consider a genetics problem on this

        model.  Plant A produces 3/4 black seeds and 1/4 reds; plant B

        produces all reds.  You get a red seed.   Which plant would you

        guess produced it?  You surely would guess plant B.  Now, how

        about 9 reds and a black, from Plants A and C, the latter

        producing 50 percent reds on average?

             To put the question more precisely: What betting odds would

        you give that the one red seed came from plant B?  Let us reason

        this way:  If you do this again and again, 4 of 5 of the red

        seeds you see will come from B.  Therefore, reasonable (or

        "fair") odds are 4 to 1, because this is in accord with the

        ratios with which red seeds are produced by the two plants - 4/4

        to 1/4.

             How about the sample of 9 reds and a black, and plants A and

        C?  It would make sense that the appropriate odds would be

        derived from the probabilities of the two plants producing that

        particular sample, probabilities which we computed above.

             Now let us move to a bit more complex problem:  Consider two

        urns - urn G with 2 red and 1 black balls, and urn H with 100 red

        and 100 black balls.  Someone flips a coin to decide which urn

        will be drawn from, reaches into that urn, and chooses two balls

        without replacing the first one before drawing the second.  Both

        are red.  What are the odds that the sample came from urn G?

        Clearly, the answer should derive from the probabilities that the

        two urns would produce the observed sample.<4>

             Let's restate the central issue.  One can assess the

        probability that a particular plant which produces on average 1

        red and 3 black seeds will produce one red seed, or 5 reds among

        a sample of 10. But without further assumptions - such as the

        assumption above that the possibilities are limited to two

        specific universes - one cannot say how likely a given red seed

        is to have come from a given plant, even if we know that that

        plant produces only reds.  (For example, it may have come from

        other plants producing only red seeds.)

             When we limit the possibilities to two universes (or to a

        larger set of specified universes) we are able to put a

        probability on one hypothesis or another.  But to repeat, in many

        or most cases, one cannot reasonably assume it is one or the

        other.  And then we cannot state any odds that the sample came

        from a particular universe.  This is a very difficult point to

        grasp, experience shows, but a crucial one.  (It is the sort of

        subtle issue that makes statistics so difficult.)

             The additional assumptions necessary to talk about the

        probability that the red seed came from a given plant are the

        stuff of statistical inference.  And they must be combined with

        such "objective" probabilistic assessments as the likelihood that

        a 1-red-3-black plant will produce one red, or 5 reds of 10.

             Now let us move one step further.  Instead of stating as a

        fact under our control that there is a .5 chance of the sample

        being drawn from each of the two urns in the problem above, let

        us assume that we do not know the probability of each urn being

        picked, but instead we estimate a probability of .5 for each urn,

        based on a variety of other information that all is uncertain.

        But though the facts are now different, the most reasonable

        estimate of the odds that the observed sample was drawn from one

        or the other urn will still be the same - because in both cases

        we were working with a "prior probability" of .5.  (The term

        "prior probability" is Bayesian.)  And when we view the situation

        this way, the Neyman-Pearson model may be seen perfectly well in

        a Bayesian framework.

             Now let us go a step further by allowing the universes from

        which the sample may have come to have different assumed

        probabilities as well as different compositions.  That is, we now

        consider prior probabilities other than .5.

             It was the contribution of Thomas Bayes that he showed how

        to formally incorporate into a computation the "prior"

        information (which we may choose to call speculation or belief)

        about the probabilities of drawing from the urns so as to derive

        a "posterior" probability.  But in some or many cases, it is not

        possible to specify anything further about the "prior

        distribution" - not even to assume that all possibilities over a

        given range are of equal probability - and in such a case, you

        cannot make any reasonable statement about the probability of one

        or another population based on the sample alone.  (People known

        as "strict Bayesians" say that it is always possible to make

        meaningful statements about the prior distributions.  Whether one

        can or cannot do so in a particular case seems to me an issue of

        judgment, however.)

             How do we decide which universe(s) to investigate for the

        likelihood of producing the observed sample, as well as producing

        samples that are even less likely, in the sense of being more

        surprising?  That judgment depends upon the purpose of your

        analysis, upon your point of view of how statistics ought to be

        done, and upon some other factors.  This decision is discussed in

        Section 00.

             It should be noted that the logic described so far applies

        in exactly the same fashion whether we do our work estimating

        probabilities with the resampling method or with conventional

        methods.  We can figure the probability of nine or more green

        chips from a universe of (say) p = .7 with either approach.

             So far we have discussed the comparison of various

        hypotheses and possible universes.  We must also mention where

        the consideration of the reliability of estimates comes in.  This

        leads to the  concept of confidence limits, which will be

        discussed in Chapter 00.

        Samples Whose Observations May Have More Than Two Values

             So far we have discussed samples and universes that we can

        characterize as proportions of elements which can have only one

        of two characteristics - green or other, in this case, which is

        equivalent to "1" or "0".  This expositional choice has been

        solely for clarity.  All the ideas discussed above pertain just

        as well to samples whose observations may have more than two

        values, and that may be either discrete or continuous.


                             SUMMARY AND CONCLUSIONS

             A statistical question asks about the probabilities of

        possible generating universes in light of the evidence of a

        sample.  In every case, the statistical answer comes from

        considering the behavior of particular specified universes in

        relation to the sample evidence and to the behavior of other

        possible universes.  That is, a statistical problem is an

        exercise in postulating universes of interest and interpreting

        the probabilistic distributions of results of those universes.

        The preceding sentence is the key operational idea in statistical

        inference, though I do not seem a find a statement like this one

        in the literature.

             Different sorts of realistic contexts call for different

        ways of framing the inquiry.  For each of the established models

        there are types of problems that that model fits better than do

        the other models, and other types of problems for which the model

        is quite inappropriate.  Limiting the domain of application in

        this fashion, together with using the operational definition of

        probability discussed in Chapter 00, removes the apparent

        conflicts between the Fisherian, Neyman-Pearson, and Bayesian

        models of statistical inference.

             Fundamental wisdom in statistics, as in all other contexts,

        is to carry and use a large tool kit rather than just applying

        only a hammer, screwdriver, or wrench no matter what the problem

        is at hand.  (Philosopher Abraham Kaplan once stated Kaplan's Law

        of scientific method:  Give a small boy a hammer and there is

        nothing that he will encounter that does not require pounding.)

        Studying the text of a poem statistically to infer whether

        Shakespeare or Bacon is the more likely author is quite different

        than inferring whether bioengineer Smythe can produce an increase

        in the proportion of calves, and both are different from

        decisions about whether to remove a basketball player from the

        game or choose to produce a new product.

             Some key points:  1) In statistical inference as in all

        sound thinking, one's purpose is central.  All judgments should

        be made relative to that purpose, and in light of costs and

        benefits.  (This is the spirit of the Neyman-Pearson approach).

        2) One cannot avoid making judgments; the process of statistical

        inference cannot ever be perfectly routinized or objectified.

        Even in science, fitting a model to experience requires judgment.

        3) The best ways to infer are different in different situations -

        economics, psychology, history, business, medicine, engineering,

        physics, and so on.  4) Different tools must be used when the

        situations call for them - sequential vs. fixed sampling, Neyman-

        Pearson vs. Fisher, and so on.  5) In statistical inference it is

        wise not to argue about the proper conclusion when the data and

        procedures are ambiguous.  Instead, whenever doing so is

        possible, one should go back and get more data, hence lessening

        the importance of the efficiency of statistical tests.  In some

        cases one cannot easily get more data, or even conduct an

        experiment, as in biostatistics with cancer patients.  And with

        respect to the past one cannot produce more historical data.  But

        one can gather more and different kinds of data, e.g. the history

        of research on smoking and lung cancer.


                                     ENDNOTES


        **ENDNOTES**

             <1>: In the course of editing the first two editions of my

        text on research methods, my friend the late Hanan Selvin never

        ceased to brace me on writing about a "randomly drawn sample"

        rather than a random sample, because randomness refers to the

        process rather than to the outcome.  I still slip occasionally

        into the lazy term, however.  When I do so, please note that it

        is a mistake.


             <2>: The postulated universe S bears some likeness to the

        Kantian-Einsteinian model created by the researcher against

        which to test the observed data.  But instead of deriving from

        theory or insight or hunch or whatever, in statistical

        inference the model derives from the sample (plus perhaps a

        Bayesian prior distribution, about which more shortly).

             Another difference from the original "scientific" model is

        that the postulated universe S has no causal connection to the

        sample except through the process of sampling.

             Statistical inference resembles the scientific model in that

        it is assumed not to be a perfect picture of nature.  But unlike

        a scientific model, in the case of a finite universe we assume

        that larger and larger samples can approach the actual universe.


             <3>: Using RESAMPLING STATS, a program to find the

        probabilities is as follows.  Ask:  What is the probability of

        drawing a sample of nine green and one red ball from a) a 50/50

        universe, and b) a universe that is 80% green, 20% red?


            REPEAT 15000
              GENERATE 10 1,2 a      Let 1= red, 2 = green
              COUNT a =1 b
             SCORE b z-one
           END
           COUNT z-one =9 k-one
           DIVIDE k-one 15000 kk-one
           REPEAT 15000
             GENERATE 10 1,10 a
             COUNT a <=8 b          Let 1-8 = red
             SCORE b z-two
           END
           COUNT z-two =9 k-two
           DIVIDE k-two 15000 kk-two
           DIVIDE kk-two kk-two k
           PRINT kk-two kk-two k

        kk-one      =     0.0092

        kk-two      =    0.27247

        k        =   0.033766

        [source:  program redball.sta]


             <4>: Just for fun, how about if the first ball drawn is

        thrown back after examining?  What are the appropriate odds now?

