Summary


Problem

An NBA series (best of 7) was played between the Boston Celtics and LA Lakers (BC-LA). It was played in the following locations: LA-LA-LA-BOS-BOS-BOS-BOS. Teams have a 55% chance of winning home games. What team has an advantage?

Definition

A function 𝑋:Ω is called a random variable.

Example

Rolling a die twice. Ω={(𝑖,𝑗)|𝑖{1,,6},𝑗{1,,6}}. 𝑋=number of first die𝑋((𝑖,𝑗))=𝑖. 𝑌=number of second die𝑌((𝑖,𝑗))=𝑗. 𝑍=sum of the two𝑍((𝑖,𝑗))=𝑖+𝑗.

Note that 𝑋+𝑌=𝑍 (𝑋(𝜔)+𝑌(𝜔)=𝑍(𝜔)𝜔Ω)

Consider a random variable 𝑋. Fix a number 𝑥. We can consider the following event:

{𝑋=𝑥}={𝜔Ω|𝑋(𝜔)=𝑥}{𝑋=𝑥}=𝑋1(𝑥)({𝑋=𝑥})=({𝜔Ω|𝑋(𝜔)=𝑥})=𝜔Ω𝑠.𝑡.𝑋(𝜔)=𝑥({𝜔})

We can replace 𝑥 with any subset 𝐴.

(𝑋𝐴)=({𝜔Ω|𝑋(𝜔)𝐴})=𝜔Ω𝑠.𝑡.𝑋(𝜔)𝐴(𝜔)

Example

We have a machine throwing out (at random) either a banana (=110) or an apple (=510) or an orange (=410). Every fruit has a price. 𝑆(banana)=1,𝑆(apple)=0.5,𝑆(orange)=1.

𝑆 is a map from Ω={banana,apple,orange}to that is deterministic (fixed), but the actual price you pay is random because the fruit you get is random.

What is the probability that we sell a fruit at 1? (𝑆=1)=({banana,orange})=(banana)+(orange)=0.5.

If we have two R.V. 𝑋 and 𝑌, and two sets 𝐴,𝐵, I can consider the events {𝑋𝐴}and{𝑌𝐵} but also {𝑋𝐴}{𝑌𝐵}{𝑋𝐴,𝑌𝐵}={𝜔Ω|𝑋(𝜔)𝐴,𝑌(𝜔)𝐵}.

Definition

A probability mass function (p.m.f.) of a random variable 𝑋 is 𝑓:[0,1] defined by

𝑓(𝑥)=(𝑋=𝑥)

Definition

The cumulative distribution function (c.d.f.) is defined by

𝐹(𝑥)=(𝑋𝑥)

Example

𝑋=# of Hs in n coin tosses. We compute the p.m.f. As soon as 𝑥0, then

𝑓(𝑥)=(𝑋=𝑥)=0

If 𝑥0,

𝑓(𝑥)=(𝑋=𝑥)=(𝑛𝑥)2𝑛

Definition

If 𝑋:Ω is a RV taking only integer values, 𝑋 is a discrete random variable.

Notice that for discrete RVs, we have the property

𝑥𝑓(𝑥)=1

This is because 𝑥𝑓(𝑥)=𝑥(𝑋=𝑥)=1 because ({𝑋=𝑥}𝑋) is a partition of Ω (every 𝜔 maps to exactly one integer, so the events are disjoint and cover all of Ω).

Note

If 𝑋 has p.m.f. 𝑓, then we often write 𝑋𝑓.

Independence for Discrete Random Variables

Definition

A collection of random variables is called independent if

(𝑋1=𝑥1,,𝑋𝑛=𝑥𝑛)=(𝑋1=𝑥1)(𝑋𝑛=𝑥𝑛)𝑥1,,𝑥𝑛

Proposition

If 𝑋1,,𝑋𝑛 are independent RVs, then any subcollection of these RVs is a collection of independent RVs.

Definition

An infinite sequence of RVs 𝑋1,,𝑋𝑛, is called independent if for every choice of 𝑛, the RVs 𝑋1,,𝑋𝑛 are independent.

Bernoulli & Binomial RVs

Suppose we have a coin with probability 𝑝 of turning up heads, where 𝑝(0,1). Then Ω={𝐻,𝑇}, (𝐻)=𝑝, (𝑇)=1𝑝.

Consider the RV 𝑋:Ω with 𝐻1 and 𝑇0. Then (𝑋=1)=(𝐻)=𝑝 and (𝑋=0)=(𝑇)=1𝑝.

Definition

A RV 𝑋 is called a Bernoulli(𝑝) RV if for 𝑝(0,1),

(𝑋=1)=𝑝and(𝑋=0)=1𝑝[𝑋Ber(𝑝)]

Now, we toss the 𝑝-coin 𝑛 times. Let 𝑋=the number of times the coin turns up heads. We showed that

(𝑋=𝑘)=(𝑛𝑘)𝑝𝑘(1𝑝)𝑛𝑘

Definition

A RV 𝑋 is called a Binomial(𝑛,𝑝) RV if for some 𝑛, 𝑝(0,1),

(𝑋=𝑘)=(𝑛𝑘)𝑝𝑘(1𝑝)𝑛𝑘𝑘=0,,𝑛[𝑋Bin(𝑛,𝑝)]

Infinite Sequences of Coin Tosses

Consider a fair coin toss, and let 𝑋=the number of times we have to toss the coin in order to see the 1st H.

What is Ω? It is an infinite space: we need to consider all possible infinite sequences of coin tosses, e.g. Ω={𝑇𝑇𝑇𝐻𝑇𝐻𝑇𝐻𝐻𝑇,}.

If we fix 𝜔Ω, what is (𝜔)? We’d get (𝜔)=lim𝑛(12)𝑛=0. But we also want (𝜔)=1, yet every (𝜔)=0.

The solution to this kind of problem is given using “measure theory”: we define a sub-collection (𝜎-algebra) of subsets of Ω where it makes sense to assign a probability. In any case, it’s possible to define probabilities for any event involving a finite number of coin tosses.

Definition

We say that the RVs 𝑋1,,𝑋𝑛, are i.i.d. (independent and identically distributed) if they are independent and they all have the same distribution.

Geometric & Poisson Random Variables

Consider an infinite sequence of tosses of a 𝑝-coin. Let 𝑋=the number of times I have to wait to see the 1st H. Define 𝑋𝑖=1 if the 𝑖-th coin is 𝐻, and 𝑋𝑖=0 if the 𝑖-th coin is 𝑇. Then:

(𝑋=𝑘)=(𝑋1=0,,𝑋𝑘1=0,𝑋𝑘=1)indep(𝑋1=0)(𝑋𝑘1=0)(𝑋𝑘=1)=(1𝑝)𝑘1𝑝

Definition

We say that 𝑋 is a geometric RV of parameter 𝑝(0,1) if

(𝑋=𝑘)=(1𝑝)𝑘1𝑝,𝑘1[𝑋Geo(𝑝)]

Definition

We say that 𝑋 is a Poisson RV of parameter 𝜆>0 if

(𝑋=𝑘)=𝑒𝜆𝜆𝑘𝑘!,𝑘0[𝑋Pois(𝜆)]

Proposition (Poisson Limit Theorem)

Fix 𝜆>0. Take for every 𝑛, 𝑋𝑛Bin(𝑛,𝜆𝑛). Then for every fixed 𝑘0,

(𝑋𝑛=𝑘)𝑛𝑒𝜆𝜆𝑘𝑘!

Example

Suppose at a call center, in each time interval of one second we receive a call with probability 0.001. We want to know: what is the probability that we receive 𝑘 calls in one hour?

𝑋=# of calls in one hour, 𝑋Bin(3600,0.001). By the Poisson limit theorem, (𝑋=𝑘)𝑒3.6(3.6)𝑘𝑘! because 𝑋 is approximately Pois(3.6).

Joint Probability Mass Functions

Given 𝑋1,,𝑋𝑛 RVs defined on the same probability space, we consider the function

𝑓(𝑥1,,𝑥𝑛)=(𝑋1=𝑥1,,𝑋𝑛=𝑥𝑛)

Definition

The function 𝑓(𝑥1,,𝑥𝑛) defined above is called the joint probability mass function (joint p.m.f.).

Definition

The marginal p.m.f. of the RV 𝑋𝑖 is defined as

𝑓𝑖(𝑥)=(𝑋𝑖=𝑥)

Note that the marginal can be recovered from the joint p.m.f. by summing out all other variables:

𝑓𝑖(𝑥)=𝑥1,,𝑥𝑖1,𝑥𝑖+1,,𝑥𝑛𝑓(𝑥1,,𝑥𝑖1,𝑥,𝑥𝑖+1,,𝑥𝑛)

This works because {𝑋𝑖=𝑥}=𝑥1,,𝑥𝑖1,𝑥𝑖+1,,𝑥𝑛{𝑋1=𝑥1,,𝑋𝑖1=𝑥𝑖1,𝑋𝑖=𝑥,𝑋𝑖+1=𝑥𝑖+1,,𝑋𝑛=𝑥𝑛}.

Example

Fix 𝑛>0. Let (𝑋,𝑌)=uniform element of𝐴𝑛 where 𝐴𝑛={(𝑥,𝑦)20|𝑥+𝑦𝑛}. (For example if 𝑛=3, 𝐴3={(1,1),(1,2),(2,1),}.)

We want to determine (𝑋=𝑥,𝑌=𝑦), i.e. the joint p.m.f. Since the element is uniform:

(𝑋=𝑥,𝑌=𝑦)=1|𝐴𝑛|

To compute |𝐴𝑛|: for fixed 𝑘, 2𝑘𝑛, the number of pairs (𝑥,𝑦) s.t. 𝑥+𝑦=𝑘 is 𝑘1. Therefore |𝐴𝑛|=𝑛𝑘=2(𝑘1)=𝑛1𝑗=1𝑗=𝑛(𝑛1)2.

So the joint p.m.f. is:

𝑓(𝑥,𝑦)={2𝑛(𝑛1)if(𝑥,𝑦)𝐴𝑛0otherwise

The marginal p.m.f. of 𝑋:

𝑓1(𝑥)=𝑦𝑓(𝑥,𝑦)=𝑛𝑥𝑦=12𝑛(𝑛1)=2(𝑛𝑥)𝑛(𝑛1)

Similarly, 𝑓2(𝑦)=2(𝑛𝑦)𝑛(𝑛1).

Proposition

Let 𝑋1,,𝑋𝑛 be RVs with joint p.m.f. 𝑓(𝑥1,,𝑥𝑛). Assume that

𝑓(𝑥1,,𝑥𝑛)=1(𝑥1)𝑛(𝑥𝑛)

where each 𝑖 is a p.m.f. Then 𝑋1,,𝑋𝑛 are independent. Moreover, 𝑋𝑖𝑖 for all 𝑖=1,,𝑛.

Example

Consider a sequence of tosses of a 𝑝-coin. Let 𝑋𝑖=# of tosses required to see the i-th H after seeing the (i-1)-th H. Then:

(𝑋1=𝑥1,,𝑋𝑛=𝑥𝑛)=[(1𝑝)𝑥11𝑝][(1𝑝)𝑥21𝑝][(1𝑝)𝑥𝑛1𝑝] =𝑓(𝑥1)𝑓(𝑥𝑛)

where 𝑓 is the p.m.f. of a Geo(𝑝). So 𝑋𝑖Geo(𝑝) and (𝑋𝑖)𝑖1 are independent by the above proposition; that is, they are i.i.d. Geo(𝑝).

Conditional Probability Mass Functions

Let 𝑋 and 𝑌 be two RVs, and assume (𝑋=𝑥)>0 for some fixed 𝑥.

Definition

The conditional p.m.f. of 𝑌 given 𝑋 is defined as

𝑔𝑥(𝑦)(𝑌=𝑦|𝑋=𝑥)=(𝑌=𝑦,𝑋=𝑥)(𝑋=𝑥)=𝑓𝑋,𝑌(𝑥,𝑦)𝑓𝑋(𝑥)

where 𝑋𝑓𝑋 and (𝑋,𝑌)𝑓𝑋,𝑌 (i.e. (𝑋=𝑥)=𝑓𝑋(𝑥) and (𝑋=𝑥,𝑌=𝑦)=𝑓𝑋,𝑌(𝑥,𝑦)).

Example

𝑋=# of H in n space p coin tosses, 𝑌=# of H in the first m space p coin tosses where 𝑚𝑛.

Fix 0𝑥𝑛, 0𝑥𝑦𝑛𝑚. Note that 𝑌 and 𝑋𝑌 are independent (exercise). Then:

𝑓𝑋,𝑌(𝑥,𝑦)=(𝑋=𝑥,𝑌=𝑦)=(𝑌=𝑦,𝑋𝑌=𝑥𝑦)=(𝑚𝑦)(𝑛𝑚𝑥𝑦)𝑝𝑥(1𝑝)𝑛𝑥

Since 𝑋Bin(𝑛,𝑝), 𝑓𝑋(𝑥)=(𝑛𝑥)𝑝𝑥(1𝑝)𝑛𝑥. Therefore:

𝑔𝑥(𝑦)=(𝑌=𝑦|𝑋=𝑥)=𝑓𝑋,𝑌(𝑥,𝑦)𝑓𝑋(𝑥)=(𝑚𝑦)(𝑛𝑚𝑥𝑦)(𝑛𝑥)

for all 𝑦 s.t. 0𝑥𝑦𝑛𝑚. This p.m.f. is called the Hypergeometric(𝑛,𝑚,𝑥) distribution. As a free bonus, we get the identity:

𝑦:0𝑥𝑦𝑛𝑚(𝑚𝑦)(𝑛𝑚𝑥𝑦)(𝑛𝑥)=1