Chapter 4: Generating Functions - Auckland

6m ago
12 Views
1 Downloads
254.80 KB
33 Pages
Last View : 1m ago
Last Download : 5m ago
Upload by : Jamie Paz
Transcription

74Chapter 4: Generating FunctionsThis chapter looks at Probability Generating Functions (PGFs) for discreterandom variables. PGFs are useful tools for dealing with sums and limits ofrandom variables. For some stochastic processes, they also have a special rolein telling us whether a process will ever reach a particular state.By the end of this chapter, you should be able to: find the sum of Geometric, Binomial, and Exponential series; know the definition of the PGF, and use it to calculate the mean, variance,and probabilities; calculate the PGF for Geometric, Binomial, and Poisson distributions; calculate the PGF for a randomly stopped sum; calculate the PGF for first reaching times in the random walk; use the PGF to determine whether a process will ever reach a given state.4.1 Common sums1. Geometric Series231 r r r . Xx 0This formula proves thatP(X x) p(1 p)xP rx 1,1 r x) 1 when X Geometric(p): XP(X x) p(1 p)xx 0 P(X Xx 0when r 1.x 0 p Xx 0 (1 p)xp1 (1 p) 1.(because 1 p 1)

752. Binomial TheoremFor any p, q R, and integer n,(p q)n n Xnxx 0px q n x . nn!Note that(n Cr button on calculator.) x(n x)! x!PnThe Binomial Theoremprovesthatx 0 P(X x) 1 when X Binomial(n, p): n xP(X x) p (1 p)n x for x 0, 1, . . . , n, soxnn XXn xP(X x) p (1 p)n xxx 0x 0 n p (1 p) 1n 1.3. Exponential Power Series XλxFor any λ R,This proves thatP x 0 P(Xx 0x! eλ . x) 1 when X Poisson(λ):λx λP(X x) e for x 0, 1, 2, . . ., sox! XXXλx λλx λP(X x) e ex!x!x 0x 0x 0 e λ eλ 1.Note: Another useful identity is:eλ limn λ1 n nfor λ R.

764.2 Probability Generating FunctionsThe probability generating function (PGF) is a useful tool for dealingwith discrete random variables taking values 0, 1, 2, . . . Its particular strengthis that it gives us an easy way of characterizing the distribution of X Y whenX and Y are independent. In general it is difficult to find the distribution ofa sum using the traditional probability function. The PGF transforms a suminto a product and enables it to be handled much more easily.Sums of random variables are particularly important in the study of stochasticprocesses, because many stochastic processes are formed from the sum of asequence of repeating steps: for example, the Gambler’s Ruin from Section 2.7.The name probability generating function also gives us another clue to the roleof the PGF. The PGF can be used to generate all the probabilities of thedistribution. This is generally tedious and is not often an efficient way ofcalculating probabilities. However, the fact that it can be done demonstratesthat the PGF tells us everything there is to know about the distribution.Definition: Let X be a discrete random variable taking values in the non-negativeintegers {0, 1, 2, . . .}. The probability generating function (PGF) of X isGX (s) E(sX ), for all s R for which the sum converges.Calculating the probability generating functionXGX (s) E s Xsx P(X x).x 0Properties of the PGF:1. GX (0) P(X 0): GX (0) 00 P(X 0) 01 P(X 1) 02 P(X 2) . . .GX (0) P(X 0).

772. GX (1) 1 :GX (1) Xx1 P(X x) x 0 XP(X x) 1.x 0Example 1: Binomial Distribution n x n xLet X Binomial(n, p), so P(X x) p qfor x 0, 1, . . . , n.xnX n x n xsxp qGX (s) xx 0n Xn (ps)x q n xxx 0 (ps q)nby the Binomial Theorem: true for all s.Thus GX (s) (ps q)n for all s R.X Bin(n 4, p 0.2)150100050G(s)GX (0) (p 0 q)n qn P(X 0).200Check GX (0): 20Check GX (1):GX (1) (p 1 q)n (1)n 1. 100s10

78Example 2: Poisson Distributionλx λLet X Poisson(λ), so P(X x) e for x 0, 1, 2, . . .x!GX (s) Xx 0xx λ λsx! λe e X(λ s)xx 0 e λ e(λs)ThusGX (s) eλ(s 1)x!for all s R.for all s R.3001020G(s)4050X Poisson(4) 1.0 0.50.00.51.01.52.0sExample 3: Geometric Distribution5Let X Geometric(p), so P(X x) p(1 p)x pq x for x 0, 1, 2, . . .,where q 1 p. XX Geom(0.8)to infinityG(s)GX (s) sx pq x1(qs)x0 p X234x 0x 0 Thus 5p1 qsGX (s) for all s such that qs 1.p1 qs1qfor s .05s

794.3 Using the probability generating function to calculate probabilitiesThe probability generating function gets its name because the power series canbe expanded and differentiated to reveal the individual probabilities. Thus,given only the PGF GX (s) E(sX ), we can recover all probabilities P(X x).For shorthand, write px P(X x). ThenXGX (s) E(s ) Xpx sx p0 p1s p2s2 p3 s3 p4s4 . . .x 0Thus p0 P(X 0) GX (0).First derivative:G′X (s) p1 2p2s 3p3s2 4p4s3 . . .Thus p1 P(X 1) G′X (0).Second derivative: G′′X (s) 2p2 (3 2)p3s (4 3)p4s2 . . .1Thus p2 P(X 2) G′′X (0).2Third derivative:G′′′X (s) (3 2 1)p3 (4 3 2)p4 s . . .Thus p3 P(X 3) 1 ′′′G (0).3! XIn general:pn P(X n) n 11 d(n)(GX (s))GX (0) n!n! dsn.s 0

80sExample: Let X be a discrete random variable with PGF GX (s) (2 3s2 ).5Find the distribution of X.3 3s :59 2s :518G′′X (s) s :518G′′′:X (s) 52GX (s) s 52G′X (s) 5GX (0) P(X 0) 0.2G′X (0) P(X 1) .51 ′′G (0) P(X 2) 0.2 X1 ′′′3GX (0) P(X 3) .3!51 (r)G (s) P(X r) 0 r 4.r! X(r)GX (s) 0 r 4 :ThusX 1 with probability 2/5,3 with probability 3/5.Uniqueness of the PGF 1(n)The formula pn P(X n) GX (0) shows that the whole sequence ofn!probabilities p0 , p1, p2, . . . is determined by the values of the PGF and its derivatives at s 0. It follows that the PGF specifies a unique set of probabilities.Fact: If two power series agree on any interval containing 0, however small, thenall terms of the two series are equal.P Pnnas,B(s) Formally: let A(s) and B(s) be PGFs with A(s) nn 0 bn s .n 0If there exists some R′ 0 such that A(s) B(s) for all R′ s R′ , thenan bn for all n.Practical use: If we can show that two random variables have the same PGF insome interval containing 0, then we have shown that the two random variableshave the same distribution.Another way of expressing this is to say that the PGF of X tells us everythingthere is to know about the distribution of X .

814.4 Expectation and moments from the PGFAs well as calculating probabilities, we can also use the PGF to calculate themoments of the distribution of X. The moments of a distribution are the mean,variance, etc.Theorem 4.4: Let X be a discrete random variable with PGF GX (s). Then:1. E(X) G′X (1).nodk GX (s)(k)2. E X(X 1)(X 2) . . . (X k 1) GX (1) dsk(This is the k th factorial moment of X .).s 1Proof: (Sketch: see Section 4.8 for more details)sx px ,G′X (s) xsx 1 pxG(s)so6x 0 Xx 0G′X (1) xpx E(X)x 00 X4GX (s) X Poisson(4)21. X0.00.51.0s2.(k)GX (s) Xdk GX (s)x(x 1)(x 2) . . . (x k 1)sx k px kds(k)so GX (1) x k Xx kx(x 1)(x 2) . . . (x k 1)pxno E X(X 1)(X 2) . . . (X k 1) . 1.5

82Example: Let X Poisson(λ). The PGF of X is GX (s) eλ(s 1) . Find E(X)and Var(X).X Poisson(4)2G(s)G′X (s) λeλ(s 1)0 E(X) G′X (1) λ.For the variance, considerSo46Solution:0.0noE X(X 1) G′′X (1) λ2 eλ(s 1) s 1 λ2 .0.51.01.5sVar(X) E(X 2) (EX)2no E X(X 1) EX (EX)2 λ2 λ λ2 λ.4.5 Probability generating function for a sum of independent r.v.sOne of the PGF’s greatest strengths is that it turns a sum into a product: (X1 X2 )X1 X2E s E s s.This makes the PGF useful for finding the probabilities and moments of a sumof independent random variables.Theorem 4.5: Suppose that X1 , . . . , Xn are independent random variables, andlet Y X1 . . . Xn . ThenGY (s) nYi 1GXi (s).

83Proof:GY (s) E(s(X1 . Xn ) ) E(sX1 sX2 . . . sXn ) E(sX1 )E(sX2 ) . . . E(sXn )(because X1 , . . . , Xn are independent) nYGXi (s).as required. i 1Example: Suppose that X and Y are independent with X Poisson(λ) andY Poisson(µ). Find the distribution of X Y .Solution:GX Y (s) GX (s) · GY (s) eλ(s 1) eµ(s 1) e(λ µ)(s 1) .But this is the PGF of the Poisson(λ µ) distribution. So, by the uniqueness ofPGFs, X Y Poisson(λ µ).4.6 Randomly stopped sumRemember the randomly stopped sum model fromSection 3.4. A random number N of events occur,and each event i has associated with it a cost orreward Xi . The question is to find the distributionof the total cost or reward: TN X1 X2 . . . XN .TN is called a randomly stopped sum because it has a random number of terms.Example: Cash machine model. N customers arrive during the day. Customer iwithdraws amount Xi . The total amount withdrawn during the day is TN X1 . . . XN .

84In Chapter 3, we used the Laws of Total Expectation and Variance to showthat E(TN ) µ E(N ) and Var(TN ) σ 2 E(N ) µ2 Var(N ), where µ E(Xi)and σ 2 Var(Xi).In this chapter we will now use probability generating functions to investigatethe whole distribution of TN .Theorem 4.6: Let X1 , X2, . . . be a sequence of independent and identically distributed random variables with common PGF GX . Let N be a random variable,Pindependent of the Xi ’s, with PGF GN , and let TN X1 . . . XN Ni 1 Xi .Then the PGF of TN is: GTN (s) GN GX (s) .Proof: GTN (s) E(sTN ) E sX1 . XN on X1 . XNN EN E snX1 EN E snXN o.sX1 EN E sn N.sX1 EN E snXN EN (GX (s)) GN GX (s) oXN.E sNo(conditional expectation)(Xi ’s are indept of N ) o(Xi ’s are indept of each other)(by definition of GN ).

85Example: Let X1 , X2 , . . . and N be as above. Find the mean of TN .E(TN ) G′TN (1) dGN (GX (s))dss 1 G′N (GX (s)) · G′X (s)s 1 G′N (1) · G′X (1)Note: GX (1) 1 for any r.v. X E(N ) · E(X1),— same answer as in Chapter 3.Example: Heron goes fishingMy aunt was asked by her neighbours to feed the prizegoldfish in their garden pond while they were on holiday.Although my aunt dutifully went and fed them every day,she never saw a single fish for the whole three weeks. Itturned out that all the fish had been eaten by a heronwhen she wasn’t looking!Let N be the number of times the heron visits the pondduring the neighbours’ absence. Suppose that N Geometric(1 θ),so P(N n) (1 θ)θn, for n 0, 1, 2, . . . When the heron visits the pondit has probability p of catching a prize goldfish, independently of what happenson any other visit. (This assumes that there are infinitely many goldfish to becaught!) Find the distribution ofT total number of goldfish caught.Solution:LetXi 1 if heron catches a fish on visit i,0 otherwise.Then T X1 X2 . . . XN (randomly stopped sum), soGT (s) GN (GX (s)).

86NowGX (s) E(sX ) s0 P(X 0) s1 P(X 1) 1 p ps.Also,GN (r) Xnr P(N n) Xn 0n 0rn (1 θ)θn (1 θ) 1 θ.1 θr X(θr)nn 0(r 1/θ).SoGT (s) 1 θ1 θ GX (s)(putting r GX (s)),giving:GT (s) 1 θ1 θ(1 p ps) 1 θ1 θ θp θps[could this be Geometric? GT (s) 1 θ(1 θ θp) θps 1 θ1 θ θp (1 θ θp) θps1 θ θp1 πfor some π?]1 πs

87 1 θ θp θp1 θ θp θps1 1 θ θp θp1 1 θ θp . θps1 1 θ θp θpThis is the PGF of the Geometric 1 1 θ θpness of PGFs, we have: distribution, so by unique- 1 θ.T Geometric1 θ θpWhy did we need to use the PGF?We could have solved the heron problem without using the PGF, but it is muchmore difficult. PGFs are very useful for dealing with sums of random variables,which are difficult to tackle using the standard probability function.Here are the first few steps of solving the heron problem without the PGF.Recall the problem: Let N Geometric(1 θ), so P(N n) (1 θ)θn; Let X1, X2 , . . . be independent of each other and of N , with Xi Binomial(1, p)(remember Xi 1 with probability p, and 0 otherwise); Let T X1 . . . XN be the randomly stopped sum; Find the distribution of T .

88Without using the PGF, we would tackle this by looking for an expression forP(T t) for any t. Once we have obtained that expression, we might be ableto see that T has a distribution we recognise (e.g. Geometric), or otherwise wewould just state that T is defined by the probability function we have obtained.To find P(T t), we have to partition over different values of N :P(T t) Xn 0P(T t N n)P(N n).( )Here, we are lucky that we can write down the distribution of T N n: if N n is fixed, then T X1 . . . Xn is a sum of n independentBinomial(1, p) random variables, so (T N n) Binomial(n, p).For most distributions of X, it would be difficult or impossible to write down thedistribution of X1 . . . Xn :we would have to use an expression likeP(X1 . . . XN t N n) t t xXX1x1 0 x2 0.t (x1 . xn 2 ) nXP(X1 x1) xn 1 0oP(X2 x2 ) . . . P(Xn 1 xn 1) P[Xn t (x1 . . . xn 1)] .Back to the heron problem: we are lucky in this case that we know the distribution of (T N n) is Binomial(N n, p), so n tP(T t N n) p (1 p)n t for t 0, 1, . . . , n.tContinuing from ( ):P(T t) Xn 0P(T t N n)P(N n)

89 Xnn ttpt (1 p)n t(1 θ)θn p (1 θ)1 p .? t X hinnθ(1 p)tn t( )As it happens, we can evaluate the sum in ( ) using the fact that NegativeBinomial probabilities sum to 1. You can try this if you like, but it is quitetricky. [Hint: use the Negative Binomial (t 1, 1 θ(1 p)) distribution.] 1 θ, butOverall, we obtain the same answer that T Geometric1 θ θphopefully you can see why the PGF is so useful.Without the PGF, we have two major difficulties:1. Writing down P(T t N n);2. Evaluating the sum over n in ( ).For a general problem, both of these steps might be too difficult to do withouta computer. The PGF has none of these difficulties, and even if GT (s) does notsimplify readily, it still tells us everything there is to know about the distributionof T .4.7 Summary: Properties of the PGFDefinition:GX (s) E(sX )Used for:Discrete r.v.s with values 0, 1, 2, . . .no(k)′E X(X 1) . . . (X k 1) GX (1)E(X) GX (1)Moments:Probabilities:Sums:1 (n)G (0)n! XGX Y (s) GX (s)GY (s) for independent X, YP(X n)

904.8 Convergence of PGFsWe have been using PGFs throughout this chapter without paying much attention to their mathematical properties. For example, are we sure that thePxpower series GX (s) x 0 s P(X x) converges? Can we differentiate andintegrate the infinite power series term by term as we did in Section 4.4? Whenwe said in Section 4.4 that E(X) G′X (1), can we be sure that GX (1) and itsderivative G′X (1) even exist?This technical section introduces the radius of convergence ing from state i? Is there a chance that we will never reach state j, starting from state i?In Chapter 3 we saw how to find expected reaching times: the expectednumber of steps taken to reach a particular state. We used the law of totalexpectation and first-step analysis (Section 3.5).However, the expected or average reaching time doesn’t tell the whole story.Think back to the model for gene spread in Section 3.7. If there is just oneanimal out of 100 with the harmful allele, the expected number of generations tofixation is quite large at 10.5: even though the allele will usually die out after oneor two generations. The high average is caused by a small chance that the allelewill take hold and grow, requiring a very large number of generations before iteither dies out or saturates the population. In most stochastic processes, theaverage is of limited use by itself, without having some idea about the varianceand skew of the distribution.With our tool of PGFs, we can characterise the whole distribution of the timeT taken to reach a particular state, by finding its PGF. This will give us themean, variance, and skew by differentiation. In principle the PGF could evengive us the full set of probabilities, P(T t) for all possible t 0, 1, 2, . . .,though in practice it may be computationally infeasible to find more than thefirst few probabilities by repeated differentiation.However, there is a new and very useful piece of information that the PGF cantell us quickly and easily:what is the probability that we NEVER reach state j , starting from state i?For example, imagine that the random walk represents the share value for aninvestment. The current share price is i dollars, and we might decide to sellwhen it reaches j dollars. Knowing how long this might take, and whether thereis a chance we will never succeed, is fundamental to managing our investment.

97To tackle this problem, we define the random variable T to be the time taken(number of steps) to reach state j, starting from state i. We find the PGF ofT , and then use the PGF to discover P(T ). If P(T ) 0, there is apositive chance that we will NEVER reach state j, starting from state i.We will see how to determine the probability of never reaching our goal inSection 4.11. First we will see how to calculate the PGF of a reaching time Tin the random walk.Finding the PGF of a reaching time in the random walk1/2 21/21/2 11/21/201/21/211/21/221/231/21/21/2Define Tij to be the number of steps taken to reach state j , starting at state i.Tij is called the first reaching time from state i to state j .We will focus on T01 number of steps to get from state 0 to state 1. Problem: Let H(s) E sT01 be the PGF of T01. Find H(s).Arrived!

98Solution:Let Yn be the step taken at time n: up or down. For the symmetric random walk,1 with probability 0.5,Yn 1 with probability 0.5,and Y1 , Y2, . . . are independent.Recall Tij number of steps to get from state i to state j for any i, j ,and H(s) E sT01 is the PGF required. Use first-step analysis, partitioning over the first step Y1 : H(s) E sT01 E sT01 Y1 1 P(Y1 1) E sT01 Y1 1 P(Y1 1) o 1nT01T01 E s Y1 1 E s Y1 1. 2Now if Y1 1, then T01 1 definitely, so E sT01 Y1 1 s1 s. If Y1 1, then T01 1 T 1,1: one step from state 0 to state 1, then T 1,1 steps from state 1 to state 1.But T 1,1 T 1,0 T01, because the process must pass through 0 to get from 1to 1.Now T 1,0 and T01 are independent (Markov property). Also, they have thesame distribution because the process is translation invariant (i.e. all states arethe same):1/21/2 1 21/21/2101/21/21/21/21/221/21/231/21/2

99ThusE sT01 Y1 1 E s1 T 1,1 E s1 T 1,0 T0,1 sE sT 1,0 E sT01by independences(H(s))2 because identically distributed. ThusH(s) 1 s s(H(s))22by .This is a quadratic in H(s):11s(H(s))2 H(s) s 022 H(s) 1 q1 4 21 s 12 ss 1 1 s2.sWhich root? We know that P(T01 0) 0, because it must take at least one stepto go from 0 to 1. With the positive root, lims 0 H(0) lims 0we take the negative root instead.Thus H(s) 1 1 s2.sCheck this has lims 0 H(s) 0 by L’Hospital’s Rule: f (s)lims 0 g(s) f ′ (s) lims 0 g ′ (s))(12 1/2(1 s) 2s lim 2s 01 0.2s ; so

100Notation for quick solutions of first-step analysis for finding PGFsAs with first-step analysis for finding hitting probabilities and expected reachingtimes, setting up a good notation is extremely important. Here is a goodnotation for finding H(s) E sT01 .Let T T01. Seek H(s) E(sT ).NowT (with probability 1/2,11 T ′ T ′′ with probability 1/2,where T ′ T ′′ T and T ′ , T ′′ are independent.Taking expectations:(E s1H(s) (s H(s) (s H(s) 1s2 21 sH(s)2.H(s) E(sT ) w. p. 1/2 ′′′w. p. 1/2E s1 T Tw. p. 1/2′sE sT E sT ′′ w. p. 1/2(by independence of T ′ and T ′′ )w. p. 1/2sH(s)H(s) w. p. 1/2(because T ′ T ′′ T )

101Thus:sH(s)2 2H(s) s 0.Solve the quadratic and select the correct root as before, to getH(s) 1 1 s2sfor s 1.4.10 Defective random variablesA random variable is said to be defective if it can take the value .In stochastic processes, a reaching time Tij is defective if there is a chance thatwe NEVER reach state j , starting from state i.The probability that we never reach state j, starting from state i, is the sameas the probability that the time taken is infinite: Tij :P(Tij ) P(we NEVER reach state j , starting from state i).In other cases, we will always reach state j eventually, starting from state i.In that case, Tij can not take the value :P(Tij ) 0if we are CERTAIN to reach state j , starting from state i.Definition: A random variable T is defective, or improper, if it can take the value . That is,T is defective if P(T ) 0.

102Thinking ofP P(T t) as 1 P(T )PAlthough it seems strange, when we write t 0 P(T t), we are not includingthe value t .PThe sum t 0 continues without ever stopping: at no point can we say we have‘finished’ all the finite values of t so we will now add on t . We simplyPnever get to t when we take t 0 .t 0For a defective random variable T , this means that XP(T t) 1,t 0because we are missing the positive value of P(T ).All probabilities of T must still sum to 1, so we have1 Xt 0in other words Xt 0P(T t) P(T ),P(T t) 1 P(T ).PGFs for defective random variablesWhen T is defective, the PGF of T is defined as the power seriesH(s) Xt 0P(T t)stfor s 1.The term for P(T )s is missed out. The PGF is defined as the generatingfunction of the probabilities for finite values only.

103Because H(s) is a power series satisfying the conditions of Abel’s Theorem, weknow that: H(s) is left-continuous at s 1, i.e. lims 1 H(s) H(1).This is different from the behaviour of E(sT ), if T is defective: E(sT ) H(s) for s 1 because the missing term is zero: i.e. becauses 0 when s 1. E(sT ) is NOT left-continuous at s 1. There is a sudden leap (discontinuity) at s 1 because s 0 as s 1, but s 1 when s 1.Thus H(s) does NOT represent E(sT ) at s 1. It is as if H(s) is a ‘train’ thatE(sT ) rides on between 1 s 1. At

74 Chapter 4: Generating Functions This chapter looks at Probability Generating Functions (PGFs) for discrete random variables.