Grid-Free Harmonic Retrieval and Model Order Selection Using Deep Convolutional Neural Networks
Disclaimer This blog article provides additional information and insights to the paper published at EUCAP 2024. The content in this post did not undergo a peer review process. Please read it accordingly and kindly let me know when you find errors. All views presented here are my own.
Introduction
The estimation of wireless channel parameters is an omnipresent task in many application of wireless propagation, for example communication, sensing, and channel sounding. Due to the variety of applications, different solutions have been presented, usually engineered for the application-specific constraints. Lets look at the three example applications in more detail:
- RADAR sensing applications require real-time estimates of the observed objects in the propagation channel to be useful, e.g. in automotive or airspace surveillance. Hence, algorithm runtime complexity is often favored over estimation accuracy.
- Wireless communication also requires real time estimate of the channel (usually performed via pilot symbols) to properly demodulate the received symbols. Note, that this is different from estimating the propagation parameters of individual waves. The estimate is required to equalize the received symbols and find the transmitted symbols. But the same can be achieved by estimating the parameters (as we do in our work), which can then also be used for transmit signal predistortion to improve the SNR at the receiver.
- Channel sounding doesn’t require real-time constraints, and hence, the processing delay not critical. The processing is is usually carried out offline. Here, a more accurate estimation is preferred to properly model the distinct channel characteristics, such as the distribution and clustering of multipath components.
In summary, there is a trade-off between the high-resolution algorithms, such as iterative ML, which achieve a high accuracy but also come at a high computational cost, and real-time capable methods, such as Periodogram-based estimation, that can be run very fast, but at worse performance. Our work attempts to provide a new approach based on machine-learning, which can be combined with model-based estimation to reduce algorithm runtime to near real-time domain.
Our approach is a combination of well-known approaches from signal processing and wireless propagation, as well as recent algorithms developed in the computer vision community. Before delving into further details, here is a comprehensive summary.
We introduce a deep Convolutional Neural Network (CNN) approach, that
- estimates the propagation parameters and number of paths (model order) directly from the complex-valued basedband data
- uses a well-known signal model to generate synthetic measurement data
- encodes the parameters with a grid-relative mapping that allows continuous estimates
Noteworthy features and properties of the approach are that it does not require any specific prior knowledge of the data, such as SNR or number of paths (model-order). Also, compared to other works (see the paper), it can estimate continuous values for the parameters, while the number of training parameters still scales feasibly for multiple dimension. We show 2D signals in this work, but have examples for 4D as well.
Propagation Model
Here is a neat plot of our propagation model, illustrating the relationship between the propagation time (delay) and velocity (bistatic Doppler-shift) of the sensing target position.
Note, that the velocity of the target causes a Doppler-shift in the corresponding path. The magnitude of the Doppler-shift depends on the velocity (a vector), the bistatic angle, and the carrier frequency (strictly speaking, additional factors, such as the refractive index of the medium also play a role).
Signal Model
Now that we have clarified the physics, we need a mathematical description for the received signal under this propagation model.
We use the well-known narrowband channel model, which is a fancy way to say, that our signal bandwidth is much larger compared to the carrier frequency. With it, we can model the delay received discrete signal from a single target in frequency domain as $$ \boldsymbol S(\tau_p) = e^{-j2\pi \boldsymbol f \tau_p}. $$ Where $\boldsymbol f$ is the support (sample positions) of the sampling process in frequency.
Lets further assume the channel is underspread by assuming, the product of delay spread $\tau_{\text{rms}}$ and the Doppler bandwidth $B_{\alpha}$ satisfies $$ \tau_{\text{rms}} \cdot B_{\alpha} « 1. $$ Then we can write $$ \boldsymbol S(\alpha_p) = \exp{j2\pi \boldsymbol t \alpha_p}, $$ where $\boldsymbol t$ denotes the supports of the sampling process in time domain.
With the prior assumptions (narrowband + underspread), we can write the received signal model in frequency- and time-domain using the Kronecker-product (denoted by $\otimes$) as $$ \boldsymbol S(\boldsymbol\gamma, \boldsymbol\tau, \boldsymbol\alpha) = \sum_{p=1}^{P} \gamma_p \cdot \boldsymbol S(\tau_p) \otimes \boldsymbol S(\alpha_p) \quad \in \mathbb{C}^{N_f \times N_t} $$.
By writing the summation, we account for all signal received from the $P$ different targets. In other words: The signal observed at the receiver is the superposition of signals from the individual targets.
Finally, we add AWGN to the signal to complete our model of the received signal $\boldsymbol Y$ as
$$ \boldsymbol Y = \boldsymbol S(\boldsymbol\gamma, \boldsymbol\tau, \boldsymbol\alpha) + \boldsymbol N \quad \in \mathbb{C}^{N_f \times N_t}. $$
We use the signal model to generate training, validation, and test data for our approach. This is achieved by drawing random values for the number of paths, parameters, and SNR in each snapshot. Specifically we use the random distributions outlined in the table below.
Parameter Encoding (Labels)
Now that we have clarified how we can generate data, we need to design the labels for the supervised training. Unfortunately, simply using the parameters directly won’t do, mainly due to one reason: ordering.
Any snapshot can contain 1 to 20 paths, which can be arranged in any order. A loss must account for a different ordering. Consider the set of predicted parameters ${\hat{\tau}, \hat{\alpha}}$ and the set of groundtruth parameters ${\hat{\tau}, \hat{\alpha}}$, and assume that the first is just a permutation of the latter. If we compute the MSE of the two, we would get different results depending for every permutation, even though we only care about the parameters of each path individually. The order does not matter. We could come up with an ordering, but that must then be learned by the CNN as well.
With this in mind, we decided to use a different approach well known from similar problems in Computer Vision, namely YOLO. We divide the normalized space of results into smaller sub-cells, and then match the parameters to their closest cell-center. At this stage, we could formulate the task as a classification issue and just try to light up those cells that contain parameters. However, that would mean we could not retrieve continuous estimates, as we are bound to the grid resolution. To circumvent this issue, we encode the distance from the cell center into two additional values, $\Delta\tau_p$ and $\Delta\alpha_p$. To allow for multiple paths per cell, the encoding for each cell is a vector with space for $C$ parameters. E.g., if we wanted to allow up to 3 paths per cell, the vector $$ \boldsymbol\eta_{i,j} = \left[ \mu_{1}^{[i,j]}, \Delta\tau_{1}^{[i,j]}, \Delta\alpha_{1}^{[i,j]}, …, \mu_{C}^{[i,j]}, \Delta\tau_{C}^{[i,j]}, \Delta\alpha_{C}^{[i,j]} \right]^T $$
would be of size 9, specifically $\boldsymbol\eta_{i,j} \in \mathbb{R}^{3C}$. In each cell the parameters are ordered by magnitude (descending).
Network Architecture
Here is a nice plot of our network architecture. The paper contains an in depth description for the individual parts. For now, I am just leaving you with the figure.
Note, that the demapping that converts the encoded parameters into the actual ones is only required for inference. During the training, the loss is computed based on the encoded representations.
Dataset and Training Hyperparameters
Name | Value |
---|---|
Datasets | |
Distribution $\tau_p, \alpha_p$ | $\mathfrak{U}_{[0,1]}$ |
Min. separation $\tau_p, \alpha_p$ | 0.003125 |
Magnitudes | $\mathfrak{U}_{[0.001, 1]}$ |
Phases | $\mathfrak{U}_{[0,2\pi]}$ |
SNR | 0…50 dB |
Number of Paths | $\mathfrak{U}_{[1,20]}$ |
Trainingset Size | 400000 |
Validationset Size | 1000 |
Testset Size | 1000 |
Training | |
Optimizer | Adam, $\gamma=0.0003, \beta_1=0.9, \beta_2=0.999$ |
Mini-Batchsize | 32 |
Epochs | 20 |
Trainable Parameters | ~ 25.000.000 for $N_f=N_t=64$ |
Demo
If you want to check out more results, e.g., to see how the results depend on the SNR, I recommend to check out the online demo on Huggingface.
Note Huggingface shuts down the demo after two days of inactivity. Starting it back up usually takes around 5 minutes.