From a normal population
Probability that a randomly selected datum is
>1σ above μ is P(z>1)=15.87%
>2σ above μ is P(z>2)= 2.28%
(same for <σ below μ)
From a normal population, a large sample (~n>100)
Probability that a randomly selected datum of the sample is
>1s above x̄ is ≈ 15.87% same as the population case above
>2s above x̄ is ≈ 2.19% almost same
A small sample (n=10), changes this to:
>1s above x̄ is ≈ 15.99% almost same
>2s above x̄ is ≈ 1.16% "badly" underestimated
because s tends to be smaller than σ, especially in small samples,
compressing the distro, so fewer datums appear in the tails.
From a normal population
Probability that a sample of size n has at least one datum above a
= 1-P(all below a)
= 1-P(x1<a∩x2<a∩...∩xn<a)
= 1-[P(xi<a)]n
a=μ+σ → P(x>μ+σ)=.1587 P(x<μ+σ)=.8413
All below: (.8413)n At least one datum above 1σ: P=1-(.8413)n
n=1 (.8413)1 =.8413 1-.8413 = 15.9%
n=5 (.8413)5 =.4214 1-.4214 = 57.8% Even in a sample of size 5 you can half expect a datum >μ+σ
More than half of these samples will have such a datum.
n=10 (.8413)10=.1776 1-.1776 = 82.2%
n=20 (.8413)20=.0316 1-.0316 = 96.8% Close to certain will have a datum more than 1σ above μ
Ditto for datum less than 1σ below μ
a=μ+2σ → P(x>μ+2σ)=.0228 P(x<μ+2σ)=.9772
All below: (.9772)n At least one datum above 2σ: P=1-(.9772)n
n=1 (.9772)1 =.9772 1-.9772 = 2.3%
n=5 (.9772)5 =.8911 1-.8911 = 10.9%
n=10 (.9772)10=.7940 1-.7940 = 20.6%
n=20 (.9772)20=.6305 1-.6305 = 40.0% 2/5 chance that will have a datum >2SD above mean.
Also, 2/5 chance that will have a datum <2SD below mean.
Demo/experiment using Generate random data distributions
normal, μ=100 σ=10 n=5 or 10 whole numbers
Look at the generated numbers, see if/how many >110 or >120 (or <90, <80)
For a uniform population, the 1σ number is .7887 instead of .8413
n=1 (.7887)1 =.7887 1-.7887 = 21.1% 1/5 chance the number is >μ+σ
n=5 (.7887)5 =.3052 1-.3052 = 69.5%
n=10 (.7887)10 =.0931 1-.0931 = 90.7%
n=20 (.7887)20 =.0087 1-.0087 = 99.1% Certain that there will be a number >μ+σ
For a normal population
For random sample of size n
Probability that a sample mean x̄ is >1σ above μ: P(z>√n)
n=4 z=√4=2 P(z>2)= 2.3%
n=10 z=√10=3.2 P(z>3.2)= .07%
n=16 z=√16=4 P(z>4)= 0%
Nil chance that any sample of size 10 or larger has a mean x̄ that is more than 1 σ above the population's mean μ.
Same for a sample having a mean less than -1 σ below μ
In our μ=100 σ=10 normal population, no random sample of 10 or more will have
its x̄ less than 90 or more than 110.
For a normal population
Sample means are normally distributed: mean=μ SD is SEM=σ/√n
"Empirical rule" for sample means: 68% within 1SEM, 95% within 2SEM, 99.7% within 3SEM
In our μ=100 σ=10 normal population
For n=10, SEM=10/√10≈ 3.2
68% of sample means will be within 1SEM≈3.2 of 100: [96.8,103.2]
(vs.68% of datums will be within 1σ=10 of 100: [90,110])
95% of sample means will be within 2SEM≈6.4 of 100: [93.6,106.4]
(vs.95% of datums will be within 2σ=20 of 100: [80,120])
For n=30, SEM=10/√30≈ 1.8
68% of sample means will be within 1SEM≈1.8 of 100: [98.2,101.8]
95% of sample means will be within 2SEM≈3.6 of 100: [96.7,103.6]
For n=100, SEM=10/√100= 1.0
68% of sample means will be within 1SEM=1.0 of 100: [99,101]
95% of sample means will be within 2SEM=2.0 of 100: [98,102]
Run experiments for yourself.
For every population, if sample size n is large enough (n≥~30),
sample means are normally distributed.
For the standard normal population (μ=0 σ=1) SEM=1/√n
For n=10, SEM=1/√10≈ .3162
68% of sample means x̄ will be within .3162 of μ=0
95% .6324
99.7% .9486
For n=30, SEM=1/√30≈ .1826
68% of sample means x̄ will be within .1826 of μ=0
95% .3652
99.7% .5478
For n=100, SEM=1/√100= .1
68% of sample means x̄ will be within .1 of μ=0
95% .2
99.7% .3