Lecture 27 - ML Delay Estimation, Non-Coherent Detection

We worked through a few simple examples of maximum likelihood detection in the last lecture, and today we’ll continue that process. Recall that we have several impairments in our receiver, such as timing offset, frequency offset, phase offset, etc. Our goal is to estimate and try to correct these impairments so that we can accurately detect our data.

1 Examples

1.1 Unknown Delay of a Known Signal in Noise

Suppose \(r(t) = s(t-\theta) + w(t)\) where \(\theta\) is the unknown delay and \(w\) is AWGN. Assuming \(s(t)\) is bandlimited, we can apply signal space concepts with the sampling basis (i.e. a unit energy sinc pulse in the time domain).

We can think about the projections of \(r(t)\) onto the basis to form our signal space representation:

\[ r_k = \langle r(t), \varphi_k(t) \rangle \] \[ s_k(\theta) = \langle s(t-\theta), \varphi_k(t) \rangle \] \[ w_k = \langle w(t), \varphi_k(t) \rangle \]

Each has a corresponding vector form:

\[ \vec{r} = [r_1 \; r_2 \; ... \; r_n]^T \] \[ \vec{s}(\theta) = [s_1(\theta) \; s_2(\theta) \; ... \; s_n(\theta)]^T \] \[ \text{iid Gaussian } N(0, \sigma^2) \]

We are using signal space here to convert our continuous time model into a discrete time one, so we can look at the joint distribution to find the likelihood function.

As we saw last time, after some manipulations on the likelihood function, we convert to the maximization problem to a minimization one. We’ve skipped some steps here (they are spelled out in the previous lecture), but we ultimately arrive at teh following form:

\[ \hat{\theta}_{ML}(\vec{r}) = \operatorname*{arg min}_\theta \sum_{k=1}^n \frac{(r_k - s_k(\theta))^2}{2 \sigma^2} \]

Last time we took this as a function of \(\theta\) and differentiated to find a minimum. This worked because \(s\) didn’t depend on theta. Because in this problem \(s\) is some general signal that we don’t know the form of, we cannot use the derivative method. We would instead use some kind of grid search here to find the optimum value of \(\theta\).

We can continue simplifying to make this problem easier:

\[ \begin{aligned} \hat{\theta}_{ML} &= \operatorname*{arg min}_\theta || \vec{r} - \vec{s} (\theta)||^2 \\ &= \operatorname*{arg min}_\theta || \vec{r} ||^2 - 2 \operatorname{Re}\{ \langle \vec{r}, \vec{s}(\theta) \} + || \vec{s} (\theta)||^2 \\ \end{aligned} \]

Now, notice a few things. First, notice that the first term does not depend on theta. We can remove it from the minimization over \(\theta\). Second, notice that the third term is really a computation of the signal’s energy. Because a delay of the signal will not change the energy of the signal over an infinite time period (i.e. \(n \rightarrow \infty\)), this term also does not depend on \(\theta\).

Thus, the problem has become a minimization over \(\theta\) on the middle term, which (because of the negative sign) is actually a maximization.

\[ \hat{\theta}_{ML}(\vec{r}) = \operatorname*{arg max}_\theta \operatorname{Re} \{ \langle \vec{r}, \vec{s}(\theta) \} \]

Because we’d eventually like to do this operation in hardware, we can go back to continuous time and solve

\[ \hat{\theta}_{ML}(\vec{r}) = \operatorname*{arg max}_\theta \operatorname{Re} \{ \langle r(t), s(t - \theta) \} \]

In integral form, this equation becomes

\[ \operatorname*{arg max}_\theta \operatorname{Re} \left\{ \int_{-\infty}^\infty r(t) s(t - \theta) dt \right\} \]

which is a correlation. We would like to find the value of \(\theta\) that causes the correlation of our received signal and a delayed version of the real signal to be as large as possible.

In principle, \(\theta\) could take on any one of an infinite number of values. In practice, we can implement this by searching for a \(\theta\) over a range of predetermined values (i.e grid search).

We know that any correlation can also be implemented with a matched filter:

Block diagram for matched-filter detector.

Note

We keep seeing that when we have Gaussian noise, minimum distance is often the design criteria for our receiver. In these cases, we keep coming back to the use of a matched filter. This is a property inherent to Gaussian problems, so we will continue to see it as a solution to these problems.

1.2 Detection of a Discrete-Valued Parameter

Suppose \(\theta\) takes on one of \(M\) values \(\{ \theta_1, \theta_2, ..., \theta_M \}\), then \(\hat{\theta}_{ML}(\vec{r}) = \operatorname*{arg max}_{\{\theta_1, ..., \theta_M\}} f_{\vec{r} | \theta} (\vec{r} | \theta)\) is a search over the \(M\) values.

If, for example, \(M=2\) (the binary case) we can do the following:

\[ f_{\vec{r} | \theta} (\vec{r} | \theta_2) \]

\[ f_{\vec{r} | \theta} (\vec{r} | \theta_1) \]

We compute the likelihood function for each value of theta and then compare them when we plug in \(r\). Whichever function returns a larger likelihood gives the correct \(\theta\). For instance, if \(f_{\vec{r} | \theta} (\vec{r} | \theta_2) > f_{\vec{r} | \theta} (\vec{r} | \theta_1)\), then \(\hat{\theta} = \theta_2\). Note that if the likelihoods are the same, then either \(\theta\) will work.

We can get the same result by computing the ratio of the two functions and comparing that ratio to 1. This ratio is called the likelihood ratio and the rule that generates it is called a likelihood ratio test. For the above example, the likelihood ratio would look like

\[ \frac{f_{\vec{r} | \theta} (\vec{r} | \theta_2)}{f_{\vec{r} | \theta} (\vec{r} | \theta_1)} \gtreqless 1 \]

This can be used to simplify algorithms because oftentimes factors present in each function will cancel each other out. In essence, we compute some function of our data, compare to some threshold, and then decide which paramter is correct.

This system can be generalized to cases where \(M > 2\). For \(M > 2\) we can implement the search as \(M-1\) binary tests. Suppose \(\hat{\theta}_m = \operatorname*{arg max}_{\{\theta_1, ..., \theta_M\}} f_{\vec{r} | \theta} (\vec{r} | \theta)\). Then we perform a likelihood ratio test, \(\frac{f_{\vec{r} | \theta} (\vec{r} | \theta_{m+1})}{f_{\vec{r} | \theta} (\vec{r} | \theta_{m})} \gtreqless 1\), and we update the most likely option as we go down the line, keeping the highest value as we move. After evaluating the whole sequence, we would have the largest value.

1.3 Non-Coherent OOK Detection

Suppose our received signal is modeled as \(r_k = \theta e^{j \varphi} + w_k\) where \(\theta \in \{0, 1\}\) and there is some unknown phase. \(w_k\) is complex Gaussian iid, which means that real and imaginary parts are Gaussian and indpendent of each other. \(\varphi\) is uniformly distributed between \(-\pi\) and \(\pi\) (in lab we won’t try to estimate this, just detect around it).

After a lengthy derivation, the received energy \(||r||^2\) is a so-called sufficient statistic. In other words, we can rewrite the likelihood ratio test in the form

\[ ||r||^2 \gtreqless \gamma \]

where \(\gamma\) is some threshold. We then assign either ON or OFF to values above or below that threshold. We want to pick \(\theta\) such that the probability of either outcome is the same, i.e.

\[ \mathbb{P}[||r||^2 > \gamma \mid \theta = 0] = \mathbb{P}[||r||^2<\gamma \mid \theta=1] \]

Often, we set the value of \(\gamma\) to be at the halfway point, i.e. \(\frac{||r||^2}{2}\).

For small noise (high SNR), the noise term goes away, and thus \(|r_k|^2 \simeq |\theta e^{j \varphi}|^2 = \theta^2\). This says that when noise is small relative to signal, the phase difference should not matter enough to affect our decision, and thus we can just detect as if it is not there.