[통계] Estimation (3) - CRLB (Cramer-Rao Lower Bound)

5 minute read

앞서 우리는 Estimator를 찾을 때, Bias가 0이고, 최소 Variance를 가지는 Estimator를 찾는 것 즉, MVUE 를 찾는 것이 우리의 목표라는 것을 알아보았다. MVUE를 찾는 방법에 여러가지가 있다는 데, 그 중 CRLB (Cramer-Rao Lower Bound)를 결정하여 Estimator가 이를 만족하는지 확인하는 방법을 알아보겠다. 하지만, CRLB와 같은 variance를 가지는 Estimator가 존재하지 않을 수가 있다는 것을 기억하자.

[1] Why CRLB?

다양한 variance bound가 존재하지만, CRLB가 가장 구하기 쉽다.
CRLB와 같은 variance를 가지는 estimator가 존재하는 즉시 MVUE를 결정할 수 있다.
이런 estimator가 존재하지 않는다면, 다음 두 가지 방법으로 MVUE를 찾을 수 있다.
1. Unbiased estimator가 linear하다고 제한하여 MVUE를 찾는다.
2. RBLS를 이용해 Unbiased estimator 중 a better estimator를 찾을 수 있다.
3. MLE (Maximum Likelihood Estimation)를 이용해 CRLB에 근사하는(in approximate sense) estimator를 찾을 수 있다.

[2] Estimator Accuracy Considerations (Estimator의 정확도)

우리가 얻을 수 있는 모든 정보는 observed data에 담겨있고, 이 데이터들이 PDF를 형성하고 있다. 따라서, estimation accuracy는 PDF로부터 직접적인 영향을 받는다. 다시 말해, PDF가 unknown parameter에 영향을 많이 받을수록, 이 parameter를 더 잘 estimation할 수 있다.

Likelihood function (가능도 함수)

PDF가 observation $\textbf{x}$가 고정되어 있고 unknown parameter 에 대한 함수로 보여질 때, likelihood function이라고 지칭한다.
Estimator accuracy는 variance가 감소할수록 향상된다.
단일 샘플이 관측된 예시 $x[0] = A + w[0], \text{ where } w[0] \sim \mathcal{N}(0,\sigma^2)$ 그림에서 보면 (a)의 경우가 (b)의 경우보다 variance가 더 작기 때문에 $x[0]=3$ 데이터에 더욱 dependent하여 정확한 estimator라는 것을 볼 수 있다. 즉, $A>4$ 일 때의 likelihood 값이 거의 미미하기 때문에 “Highly unlikely” 하다고 볼 수 있다.

그림. 1: PDF dependent on UNK parameter

Curvature of the log-likelihood (Fisher Information) : 로그 가능도의 곡률

Likelihood의 “Sharpness”로 얼마나 정확하게 unknown parameter를 estimation할 수 있는지를 말해준다. 즉, log-likelihood function의 곡률(curvature)를 구해주면 이를 알 수 있다.
이를 수학적으로 나타내면, log-likelihood function를 unknown parameter에 대해 두번 미분한 것에 음수를 취해주면 peak지점에서의 likelihood function의 sharpness를 얻을 수 있다.
정확하게는 데이터가 random variable이기 때문에 average를 구해주기 위해 pdf에 대해 expectation을 취해준 값이 curvature이다. 이를 Fisher Information이라고도 부른다.

\[-E\left[\frac{\partial^2 \ln p(\textbf{x};\theta)}{\partial \theta^2} \right]\]

Curvature는 variance $\sigma^2$가 감소할수록 증가한다.

[3] Cramer-Rao Lower Bound for scalar parameter

CRLB Theorem

(1) PDF $p(\textbf{x};\theta)$는 아래 식과 같은 “Regularity” 조건을 만족한다고 가정한다.
\[E\left[\frac{\partial \ln p(\textbf{x};\theta)}{\partial \theta} \right] = 0 \quad \textbf{for all }\theta\] \[E[\hat\theta]=\theta\]
※ PDF가 이 조건을 만족하지 못하면 CRLB를 적용할 수 없다.
(2) 어떤 unbiased estimator $\hat{\theta}$ 이든 다음 조건을 만족한다. (즉, unbiased estimator라고 깔고 들어간다.)
\[\text{var}(\hat{\theta}) \ge \frac{1}{-E\left[\frac{\partial^2 \ln p(\textbf{x};\theta)}{\partial \theta^2} \right]} = \frac{1}{I(\theta)}\]
※ 일반적으로, second derivative는 random variable $\text{x}$에 dependent하며, 이의 음의 역수에 expectation을 취해준 값인 CRLB는 $\text{x}$가 사라지기 때문에 $\theta$에 dependent하다.
CRLB와 같은 variance를 가지는 unbiased estimator $\hat{\theta}$은 다음 식과 필요충분조건의 관계이다.
\[\frac{\partial \ln p(\textbf{x};\theta)}{\partial \theta}=I(\theta)(g(\text{x})-\theta)\]
이 estimator는 MVUE이며, $\hat{\theta}=g(\text{x})$와 minimum variance 로 $1/I(\theta)$를 가진다.

Efficient Estimator

Estimator가 unbiased하고 variance가 CRLB를 따른다면, “efficient”하다고 한다.
Data를 efficient하게 사용했기 때문이다.
MVUE는 efficient할 수도 있고 아닐 수도 있다.

그림. 2: Efficiency vs MVUE

(b) 그림의 $\hat{\theta_1}$의 분산은 다른 unbiased estiamtor들보다 모든 $\theta$에 대해서 작아 MVUE지만 CRLB를 따르지 않으므로 efficient 하지는 않다.

Fisher information properties

더 많은 정보(Fisher Information이 클수록)가 있을수록, 더 낮은 bound를 갖는다.
Non-negative 하다.

\[E\left[ \left( \frac{\partial \ln p(\textbf{x};\theta)}{\partial \theta} \right)^2 \right]=-E\left[\frac{\partial^2 \ln p(\textbf{x};\theta)}{\partial \theta^2} \right]\]

Independent한 observations에 대해서 Additive하다.

\[-E\left[\frac{\partial^2 \ln p(\textbf{x};\theta)}{\partial \theta^2} \right]=-\sum_{n=0}^{N-1}E\left[\frac{\partial^2 \ln p(x[n];\theta)}{\partial \theta^2} \right]\] \[I(\theta)=Ni(\theta)\quad \text{where }\quad i(\theta)=-E\left[\frac{\partial^2 \ln p(x[n];\theta)}{\partial \theta^2} \right]\]

Completely dependent 한 샘플들 ( $\text{i.e. }x[0]=x[1] \cdots = x[N-1]$ ) 에 대해서 아무리 observation이 많아도 information이 없기 때문에 CRLB는 줄어들지 않는다. 즉, $I(\theta)=i(\theta)$

[4] General CRLB for signals in White Gaussian Noise

A deterministic signal with an unknown parameter $\theta$ observed in WGN

\[x[n] = s[n;\theta]+w[n] \quad \quad n=0,1, \ldots ,N-1\]

Likelihood function is
\[p(\text{x};\theta)=\frac{1}{(2\pi \sigma^2)^{\frac{N}{2}}} \text{exp} \left( -\frac{1}{2\sigma^2} \sum_{n=0}^{N-1}(x[n]-s[n;\theta])^2 \right)\]
First derivative w.r.t. $\theta$
\[\frac{\partial \ln p(\text{x};\theta)}{\partial \theta} = \frac{1}{\sigma^2}\sum_{n=0}^{N-1}(x[n]-s[n;\theta])\frac{\partial s[n;\theta]}{\partial\theta}\]
Second derivative w.r.t. $\theta$
\[\frac{\partial^2 \ln p(\text{x};\theta)}{\partial \theta^2} = \frac{1}{\sigma^2}\sum_{n=0}^{N-1} \left[(x[n] - s[n;\theta])\frac{\partial^2 s[n;\theta]}{\partial \theta^2} -\left(\frac{\partial s[n;\theta]}{\partial \theta}\right)^2\right]\]
Expectation of the second derivative

\[E \left[ \frac{\partial^2 \ln p(\text{x};\theta)}{\partial \theta^2} \right] = -\frac{1}{\sigma^2}\sum_{n=0}^{N-1} \left( \frac{\partial s[n;\theta]}{\partial \theta}\right)^2\]

CRLB

\[\text{var}(\hat{\theta}) \ge \frac{\sigma^2}{\sum_{n=0}^{N-1} \left( \frac{\partial s[n;\theta]}{\partial \theta}\right)^2 }\]

Importance of the signal dependence on $\theta$

Signals change rapidly as the unknown parameter changes → Accurate Estimator

[5] Transformation of parameters

$\theta^2$ 를 추정하거나 신호의 power를 추정할 때 쓴다.
Let $\alpha =g(\theta)$, then the CRLB is

\[\text{var}(\hat{\alpha}) \ge \frac{\left( \frac{\partial g}{\partial\theta}\right)^2}{-E \left[ \frac{\partial^2 \ln p(\text{x};\theta)}{\partial \theta^2} \right]}\]

Estimator에 linear (실제로는 Affine) 변환을 했을 때에는 efficiency가 유지된다.
Nonlinear transformation을 하면 efficiency가 깨질 수 있다. 하지만 nonlinear transformation에서도 데이터가 매우 크다면, efficiency를 approximately maintained할 수 있다.

[6] Cramer-Rao Lower Bound for vector parameter

Vector parameter 추정
\[\bf{\theta} = [\theta_1\ \theta_2 \ldots \theta_p ]^T\]
Estimator $\bf{\hat\theta}$는 covariance matrix $\bf{C_{\hat\theta}}$에 CRLB를 가진다.
For an unbiased estimator has the CRLB with the $p\times p\ Fisher\ information\ matrix \ \ \bf{I(\theta)}$

\[[I(\theta)]_{ij}=-E\left[\frac{\partial^2 \ln p(\text{x};\theta)}{\partial \theta_i\partial \theta_j} \right] \quad \quad i \ =\ 1,2,\ldots,p \ \quad j\ =\ 1,2,,\ldots,p\]

CRLB theorem

(1) “Regularity” conditions, $\mathbf{\hat\theta}$ is unbiased

\[E\left[\frac{\partial \ln p(\text{x};\theta)}{\partial \theta} \right]=\bf{0} \quad \text{for all }\ \theta\] \[E[\mathbf{(g(x)}]=E[\mathbf{\hat\theta}]=\mathbf{\theta}\]

어떠한 unbiased estimator $\hat\theta$ 의 The covariance matrix는 다음을 만족한다.

\[\mathbf{C}_{\mathbf{\hat\theta}}-\mathbf{I}^{-1}(\mathbf{\theta})\ge \mathbf{\theta}\] \[\text{var}(\hat\theta_i) = [\mathbf{C}_{\hat\theta}]_{ii}\ge[\mathbf{I}^{-1}(\mathbf)]_{ii}\]

CRLB와 같은 covariance matrix를 가지는 unbiased estimator $\mathbf{\hat{\theta}}$ ($s.t.\ \mathbf{C}_{\mathbf{\hat\theta}}=\mathbf{I}^{-1}(\mathbf{\theta})$) 은 다음 식과 필요충분조건의 관계이다.

\[\frac{\partial \ln p(\mathbf{x};\mathbf{\theta})}{\partial \mathbf{\theta}}=\mathbf{I}(\mathbf{\theta})(\mathbf{g(x)-\theta})\]

CRLB는 estimation 하고자 하는 parameter 수가 늘어남에 따라 항상 증가한다.

Reference

[1] S. Kay. Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall International Editions, Englewood Cliffs, NJ, 1993.
[2] GIST EC7204 Detection and Estimation Lecture from Prof. 황의석

Share on

Twitter Facebook LinkedIn

deeesp