Why is X bar an unbiased estimator of MU?

Let's consider a population of boolean values [0,1]. In the population, the mean (or frequency of 1) is $\mu$. We take a sample of size $n$, which mean $\bar x$ is

$$\bar x = \frac{\sum_i^n x_i}{n}$$

and sample variance

$$s^2 = \frac{\sum_i^n (x_i - \bar x)^2}{n-1}$$

I would like to estimate the parameter $D=\mu (1-\mu)$. It appears from the below small simulations (coded in R) that the unbiased estimator of $D$ is

$$\hat D = \bar x(1-\bar x) + \frac{s^2}{n}$$

Can you help me to figure out why this is true?


nbtrials = 5000
popSize = 200
pop = 0:1
sampleSize = 10 

out = numeric(nbtrials)
for (trial in 1:nbtrials)
{
    s = sample(pop,size=sampleSize, replace=TRUE) 
    xbar = sum(s) / sampleSize
    out[trial] = xbar * (1-xbar) + var(s) / sampleSize
}
xbar=sum(pop) / length(pop)
print(paste("True value of D = ",xbar *(1-xbar)))
print(paste("Average estimated value of D = ",mean(out)))

Theorem

Let $X_1, X_2, \ldots, X_n$ form a random sample from a population with mean $\mu$ and variance $\sigma^2$.

Then:

$\ds \bar X = \frac 1 n \sum_{i \mathop = 1}^n X_i$

is an unbiased estimator of $\mu$.

Proof

If $\bar X$ is an unbiased estimator of $\mu$, then:

$\ds \expect {\bar X} = \mu$

We have:

\(\ds \expect {\bar X}\) \(=\) \(\ds \expect {\frac 1 n \sum_{i \mathop = 1}^n X_i}\)
\(\ds \) \(=\) \(\ds \frac 1 n \sum_{i \mathop = 1}^n \expect {X_i}\) Linearity of Expectation Function
\(\ds \) \(=\) \(\ds \frac 1 n \sum_{i \mathop = 1}^n \mu\) as $\expect {X_i} = \mu$
\(\ds \) \(=\) \(\ds \frac n n \mu\) as $\ds \sum_{i \mathop = 1}^n 1 = n$
\(\ds \) \(=\) \(\ds \mu\)

So $\bar X$ is an unbiased estimator of $\mu$.

$\blacksquare$

If \(X_i\) are normally distributed random variables with mean \(\mu\) and variance \(\sigma^2\), then:

\(\hat{\mu}=\dfrac{\sum X_i}{n}=\bar{X}\) and \(\hat{\sigma}^2=\dfrac{\sum(X_i-\bar{X})^2}{n}\)

are the maximum likelihood estimators of \(\mu\) and \(\sigma^2\), respectively. Are the MLEs unbiased for their respective parameters?

Answer

Recall that if \(X_i\) is a normally distributed random variable with mean \(\mu\) and variance \(\sigma^2\), then \(E(X_i)=\mu\) and \(\text{Var}(X_i)=\sigma^2\). Therefore:

\(E(\bar{X})=E\left(\dfrac{1}{n}\sum\limits_{i=1}^nX_i\right)=\dfrac{1}{n}\sum\limits_{i=1}^nE(X_i)=\dfrac{1}{n}\sum\limits_{i=1}\mu=\dfrac{1}{n}(n\mu)=\mu\)

The first equality holds because we've merely replaced \(\bar{X}\) with its definition. Again, the second equality holds by the rules of expectation for a linear combination. The third equality holds because \(E(X_i)=\mu\). The fourth equality holds because when you add the value \(\mu\) up \(n\) times, you get \(n\mu\). And, of course, the last equality is simple algebra.

In summary, we have shown that:

\(E(\bar{X})=\mu\)

Therefore, the maximum likelihood estimator of \(\mu\) is unbiased. Now, let's check the maximum likelihood estimator of \(\sigma^2\). First, note that we can rewrite the formula for the MLE as:

\(\hat{\sigma}^2=\left(\dfrac{1}{n}\sum\limits_{i=1}^nX_i^2\right)-\bar{X}^2\)

because:

\(\displaystyle{\begin{aligned}
\hat{\sigma}^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} &=\frac{1}{n} \sum_{i=1}^{n}\left(x_{i}^{2}-2 x_{i} \bar{x}+\bar{x}^{2}\right) \\
&=\frac{1}{n} \sum_{i=1}^{n} x_{i}^{2}-2 \bar{x} \cdot \color{blue}\underbrace{\color{black}\frac{1}{n} \sum x_{i}}_{\bar{x}} \color{black} + \frac{1}{\color{blue}\cancel{\color{black} n}}\left(\color{blue}\cancel{\color{black}n} \color{black}\bar{x}^{2}\right) \\
&=\frac{1}{n} \sum_{i=1}^{n} x_{i}^{2}-\bar{x}^{2}
\end{aligned}}\)

Then, taking the expectation of the MLE, we get:

\(E(\hat{\sigma}^2)=\dfrac{(n-1)\sigma^2}{n}\)

as illustrated here:

\begin{align} E(\hat{\sigma}^2) &= E\left[\dfrac{1}{n}\sum\limits_{i=1}^nX_i^2-\bar{X}^2\right]=\left[\dfrac{1}{n}\sum\limits_{i=1}^nE(X_i^2)\right]-E(\bar{X}^2)\\ &= \dfrac{1}{n}\sum\limits_{i=1}^n(\sigma^2+\mu^2)-\left(\dfrac{\sigma^2}{n}+\mu^2\right)\\ &= \dfrac{1}{n}(n\sigma^2+n\mu^2)-\dfrac{\sigma^2}{n}-\mu^2\\ &= \sigma^2-\dfrac{\sigma^2}{n}=\dfrac{n\sigma^2-\sigma^2}{n}=\dfrac{(n-1)\sigma^2}{n}\\ \end{align}

The first equality holds from the rewritten form of the MLE. The second equality holds from the properties of expectation. The third equality holds from manipulating the alternative formulas for the variance, namely:

\(Var(X)=\sigma^2=E(X^2)-\mu^2\) and \(Var(\bar{X})=\dfrac{\sigma^2}{n}=E(\bar{X}^2)-\mu^2\)

The remaining equalities hold from simple algebraic manipulation. Now, because we have shown:

\(E(\hat{\sigma}^2) \neq \sigma^2\)

the maximum likelihood estimator of \(\sigma^2\) is a biased estimator.

What it means to say that X is an unbiased estimator of μ?

An estimator of a given parameter is said to be unbiased if its expected value is equal to the true value of the parameter. In other words, an estimator is unbiased if it produces parameter estimates that are on average correct.

Is X

Certainly, our intuition tells us that the best estimator for the population mean (mu, µ) should be x-bar, and the best estimator for the population proportion p should be p-hat.

How do you prove something is an unbiased estimator?

In order for an estimator to be unbiased, its expected value must exactly equal the value of the population parameter. The bias of an estimator is the difference between the expected value of the estimator and the actual parameter value. Thus, if this difference is non-zero, then the estimator has bias.

Which statistic is the best unbiased estimator for μ?

Which statistic is the best unbiased estimator for μ? The best unbiased estimated for μ is . For the same sample statistics, which level of confidence would produced the widest confidence interval? ​90%, because as the level of confidence​ decreases, zc increases.