# EFFICIENT FPGA IMPLEMENTATION OF AN ADAPTIVE *IQ*-IMBALANCE CORRECTOR FOR COMMUNICATION RECEIVERS USING REDUCED RANGE MULTIPLIERS

Ediz Cetin, Suleyman S. Demirsoy, Izzet Kale and Richard C. S. Morling

Applied DSP and VLSI Research Group, Department of Electronic Systems, University of Westminster, 115 New Cavendish Street, London, W1W 6UW, United Kingdom phone: + (44)2079115083, fax: + (44)2079115089, e-mail: {e.cetin, demirss, kalei, morling}@wmin.ac.uk web: www.advrg.wmin.ac.uk

## ABSTRACT

Digital signal processing techniques for compensating the IQ-imbalances in quadrature receivers are paving the path towards software-configurable-radio-receivers. Unsupervised signal processing algorithms operating at the baseband have been developed to deal with these impairments. This paper deals with an efficient FPGA implementation of an adaptive IQ-imbalance corrector using reduced range multipliers. Use of reduced-range multipliers result in 40% reduction in area and power consumption without a compromise in performance when compared with an efficiently designed general purpose multiplier approach.

## 1. INTRODUCTION

Receivers utilising IQ-signal processing are vulnerable to mismatches between the in-phase (I) and quadrature (Q) channels. IQ-imbalances can cause large degradation in communications receiver's performance. Furthermore, with large signal constellations of M-QAM/PSK even modest IQimbalances results in detrimental performance degradation. Both analog and digital methods for dealing with IQimbalances have been reported in the literature [1] – [4]. All of the reported digital approaches are software based and thus not suitable for direct hardware implementation. This paper deals with efficient low-complexity FPGA implementation of such software based IQ-compensation algorithms developed and analysed in [5] and [6] utilising Reduced-Range-Multipliers (RRM) developed in [7].

The paper is organized as follows: Section 2 gives a brief description of the adaptive *IQ*-imbalance compensation algorithm. Section 3 details the architectural design of the algorithm along with performance analysis and comparison, while concluding remarks are given in Section 4.

## 2. BACKGROUND OF ADAPTIVE *IQ*- IMBALANCE CORRECTION ALGORITHM

This section is a brief summary of [5] and [6] which introduces the *IQ*-imbalances and *Blind-Source-Separation* (BSS) based adaptive compensation scheme.

## 2.1 Influence of IQ-Imbalances

Sources of *IQ*-imbalances in the receiver are: the RF splitter used to divide the incoming RF signal equally between the *I* and *Q* paths which may introduce phase and gain

differences as well as the differences in the length of the two RF paths can result in phase imbalance. The quadrature 90° phase-splitter used to generate the *I* and *Q Local-Oscillator* (LO) signals that drive the *I* and *Q* channel mixers may not be exactly 90°. Furthermore, there might be differences in conversion losses between the output ports of the *I* and *Q* channel mixers. In addition to these, filters and ADCs in the *I* and *Q* paths are not perfectly matched. The receiver model of Fig. 1 incorporates *IQ*-imbalances as impaired LO signals.



Figure 1 Receiver model incorporating IQ-imbalances

The *IQ*-imbalances can be characterized by two parameters: the amplitude mismatch,  $\alpha_{\varepsilon}$  and the phase orthogonality mismatch,  $\varphi_{\varepsilon}$  between the *I* and *Q* branches. The complex baseband equation for the *IQ*-imbalance's effects on the ideal received signal  $r_{IO}(k)$  is given as:

$$r_{IQ}(k) = g_1[u_1(k)\cos(\varphi_{\varepsilon}/2) + u_Q(k)\sin(\varphi_{\varepsilon}/2)] + jg_2[u_1(k)\sin(\varphi_{\varepsilon}/2) + u_Q(k)\cos(\varphi_{\varepsilon}/2)] = \frac{1}{2}[(2\cos\frac{\varphi_{\varepsilon}}{2} - j\alpha_{\varepsilon}\sin\frac{\varphi_{\varepsilon}}{2})u(t) + (\alpha_{\varepsilon}\cos\frac{\varphi_{\varepsilon}}{2} + j2\sin\frac{\varphi_{\varepsilon}}{2})u^*(t)] = h_1u(t) + h_2u^*(t)$$

where  $g_1=(1+0.5\alpha_{\epsilon})$ ,  $g_2=(1-0.5\alpha_{\epsilon})$  and  $(\bullet)^*$  is the complex conjugate. As can be seen there is a cross-talk between the *I* and *Q* channels. The amplitude-imbalance,  $\beta$ , in decibels is obtained from the amplitude mismatch,  $\alpha_{\epsilon}$  as:

$$\beta = 20 \log_{10} \left[ 1 + 0.5 \alpha_{\varepsilon} / 1 - 0.5 \alpha_{\varepsilon} \right]$$

Fig. 2 demonstrates the effects of varying the IQ phase and gain mismatches on the raw *Bit-Error-Rate* (BER) performances of the systems using (a) 32-PSK and (b) 256-QAM modulation formats. As can be observed the IQ-imbalances degrade the system's BER performance greatly.



Figure 2 The effects of *IQ*-imbalances on BER of (a) 32-PSK and (b) 256-QAM modulated signals.

This degradation in performance is surely not desirable and must be compensated. Section 2.2 outlines an adaptive algorithm developed for compensating for these impairments.

#### 2.2 Blind-Source-Separation-Based Adaptive Solution

Our approach to the problem is to develop an adaptive BSS based system that can operate without pilot/test tones, by simply processing the received signals. The only assumption we make is that the *I* and *Q* components of the received signal,  $r_{I}(k)$  and  $r_{Q}(k)$ , in the absence of impairments are orthogonal and not correlated witch each other. Hence, this assumption implies that:

$$E[r_I(k) \times r_O(k-n)] = 0, \qquad \forall n,$$

where  $E[\bullet]$  denotes expectation. The overall structure of the proposed approach is depicted in Fig. 3, with *IQ*-imbalances modeled as the unknown scalar mixing matrix with elements  $h_1$  and  $h_2$ .



Figure 3 Overall structure for BSS based Adaptive Corrector

In the proposed approach the filter block consists of 2-taps,  $w_1$  and  $w_2$ . Output signals  $c_1$  and  $c_Q$  can be expressed as a function of transmitted signals as:

$$c_{I}(k) = (1 - w_{1}h_{2})s_{I}(k) + (h_{1} - w_{1})s_{Q}(k)$$
  
$$c_{O}(z) = (h_{2} - w_{2})s_{I}(k) + (1 - w_{2}h_{1})s_{O}(k)$$

when the filters converge, i.e.  $w_1=h_1$  and  $w_2=h_2$  then the source estimates become:

$$c_{I}(k) = (1 - h_{1}h_{2})s_{I}(k)$$
  
$$c_{O}(k) = (1 - h_{2}h_{1})s_{O}(k)$$

As can be observed the influence of the *IQ*-imbalances have been removed. Also,  $(1-h_1h_2)\approx 1$  and can be safely ignored. The coefficient update can be done with any algorithm depending on the desired performance. *Least-Mean-Square* (LMS) and *Recursive-Least-Squares* (RLS) algorithms being the most obvious ones resulting in different convergence speeds and computational complexities. The LMS [8] algorithm is used in this paper due to its low-complexity making it suitable for real-time systems and practical for integration into the receiver signal processing chains.

## 3. ARCHITECTURAL DESIGN

It is desirable to keep the size and power consumption of a portable device to be as small and as low as possible. In Section 2.2 we have proposed an approach that improves the performance of the receiver with some hardware overhead. It is our aim to reduce the hardware complexity and powerconsumption as much as possible. As the power consumption and the area of the multiplier is a key factor, the hardware design strategy proposed here for achieving this reduction in area and power is the use of reduced-complexity multiplication through RRM [7].

Fig. 4 depicts the basic processing structure for the BSS-based adaptive algorithm. As the adaptive algorithm is symmetric only the basic processing element is shown.



Figure 4 Basic processing element structure for IQ-Corrector

For our application the number of bits used to represent the data ( $r_1$ ,  $r_0$ ), wdDP, is 16-bits two's complement. The number of bits used to represent the coefficients ( $w_1$ ,  $w_2$ ), wdCF, is 8-bits fractions. The value used for the LMS stepsize is  $\mu=2^{-13}$ . This value was specifically chosen to be a power of two because it can be implemented in hardware as a simple right shift by 13 bits through hardwiring as opposed to an actual multiply hence eliminating the need for an extra multiplier.

## 3.1 Reduced Range Multipliers

The RRM has been developed in [7] to utilize the fixed resource environment of the Xilinx FPGAs. It is implemented by making use of the reconfigurable arithmetic structures proposed in [9]. These structures were used in [9] for efficient implementation of multiple constant multiplications in time-multiplexed filters. By utilizing them to their full extend, it is possible to have reconfigurable multipliers that can replace General Purpose Multipliers (GPM) in adaptive filters [7].

RRMs are particularly useful for adaptive filter implementations where not all parts of the dynamic range are needed for coefficient multiplications. They can also be designed to prioritise certain parts of the dynamic range for more accurate multiplications and have higher quantization errors on the other parts. Fig. 5 shows dynamic ranges of several RRM for 8-bit coefficient values along with the dynamic range of a GPM. Areas represented by " $\blacksquare$ " correspond to the range coverage of the RRM. With the RRM the other input of the multiplier, which generally is the data, can be of any word-length. As can be observed from A to D in Fig. 5, there are various alternative coverage of coefficients that can be provided by the RRM which can be used in a variety of applications depending on the application requirements. For the uncovered coefficients, the nearest coefficient in the covered range is used. This can be thought of as a non-linear quantization operation.



Figure 5 Dynamic ranges for various 8-bit RRM designs

It is worth noting that the hardware complexity of the RRM structures that generate the different ranges given in Fig. 5 (A-D) are all the same. Fig. 6 shows the RRM structure that was employed in our design with the dynamic range as displayed in Fig. 5(A). There are four reconfigurable basic structure stages. The possible products out of each intermediate structure are shown in set brackets. The numbers '2', '8' and '16' next to the signals in the RRM diagram shows that those signals are left-shifted by 2-bits, 3-bits, and 4-bits respectively before they are connected to the next stage. S1 and S0 represent the select lines to choose one of the operations that are shown on the structures. This same RRM topology would produce the other dynamic ranges given in Fig. 5 if the shift values and/or the operations inside the basic structures were changed.



Figure 6 RRM structure used in our design

## 3.2 Performance and Area Comparison

This section contains detailed simulation results to compare the effects on the performance and area of using RRM instead of GPM. The performance measures used are *Image-Rejection-Ratio* (IRR), *Modelling-Error* (ME) and BER. Furthermore, convergence times and area in terms of *Look-Up-Tables* (LUT) are given. 256-QAM and 32-PSK modulated signals were used along with varying phase and gain mismatches. The communication channel was assumed to be AWGN.

Mean IRR was used as a performance measure. This is a measure to show how good the hardware implementation is working in eliminating IQ-imbalances, the higher the IRR the better the performance. This can be mathematically expressed in decibels as [6]:

$$IRR(\alpha_{\varepsilon}, \varphi_{\varepsilon}) = 10 \log \left( \frac{2 - 2 \cos \varphi_{\varepsilon} + 0.5 \alpha_{\varepsilon}^{2} (1 + \cos \varphi_{\varepsilon})}{2 + 2 \cos \varphi_{\varepsilon} + 0.5 \alpha_{\varepsilon}^{2} (1 - \cos \varphi_{\varepsilon})} \right)$$

The ME [5] gives a global figure for the quality of the identification of the unknown mixing coefficients  $h_1$  and  $h_2$  by  $w_1$ and  $w_2$ . Furthermore, it provides useful information about the convergence rate of the proposed adaptive algorithm. ME is defined as the squared norm of the difference of the values between the original coefficients used in the scalar mixture and the estimated coefficients, relative to the squared norm of the mixture coefficients.

Table I depicts the simulation results using GPM with phase and gain errors randomly distributed between  $0 - 30^{\circ}$  and 1 - 3 dB respectively. Results are averaged over 100 experiments. Table II on the other hand depicts the results using several RRMs (A - D) with varying coverage-ranges as given in Fig. 5, instead of GPM for the same conditions.

| Modulation                     | Mean II | RR (dB) | Mean Number<br>of Iterations |                       |
|--------------------------------|---------|---------|------------------------------|-----------------------|
| Format                         | Before  | After   | <i>w</i> <sub>1</sub>        | <i>w</i> <sub>2</sub> |
| <b>32-PSK</b><br>[SNR=26.1 dB] | 14.4    | 77.6    | 12387                        | 9240                  |
| <b>256-QAM</b><br>[SNR=30 dB]  | 13.9    | 77.9    | 14944                        | 8473                  |

Table I Performance results utilising GPM

| RRM | Modulation<br>Format | Mean IRR (dB) |       | Mean Number<br>of Iterations |                       |
|-----|----------------------|---------------|-------|------------------------------|-----------------------|
|     |                      | Before        | After | <i>w</i> <sub>1</sub>        | <i>w</i> <sub>2</sub> |
| Α   | 32-PSK               | 14.5          | 75.8  | 14206                        | 8709                  |
|     | 256-QAM              | 15.3          | 74.1  | 15339                        | 6819                  |
| В   | 32-PSK               | 14.8          | 75.9  | 14686                        | 8873                  |
|     | 256-QAM              | 14.8          | 74.2  | 14944                        | 8477                  |
| С   | 32-PSK               | 14.9          | 75.8  | 13313                        | 8688                  |
|     | 256-QAM              | 15.3          | 73.4  | 14967                        | 8754                  |
| D   | 32-PSK               | 15.3          | 75.2  | 14753                        | 8925                  |
|     | 256-QAM              | 14.6          | 75.1  | 14967                        | 7749                  |

Table II Performance results utilising various RRM designs

As can be observed from Tables I and II, replacing the GPM with RRM has resulted in a small reduction in mean IRR of about 3 dB. This mean IRR is still more than acceptable in practical applications. Furthermore, the convergence rate has been somewhat improved by using RRMs. This was due to having higher quantization for some coefficients, which are not covered by the RRM's dynamic range, to the nearest covered value, which in effect manifested itself in the overall algorithm as a variable step size.

In terms of hardware real-estate, if we replace the GPM in the filter by the RRM, we save around 40% on the multiplier area as shown in Table III in terms of LUT count. Our design is implemented on a Xilinx Virtex FPGA. The synthesis was carried out using LeonardoSpectrum, for Virtex FPGA XV300BG432-5. The GPM was designed by the Core Generator from Xilinx. The critical path delay values are provided by the synthesizer and do not include the I/O buffer delays (These are the same for both designs.) Moreover, because of the fewer stages of combinational

logic in the multiplier, (for the RRM there are three stages of LUTs whereas in the GPM this figure comes up to seven stages of LUTs), the critical path delay of the system is reduced which results in reduced power consumption.

|                           | GPM (8x16) | RRM [A - D] |
|---------------------------|------------|-------------|
| Filter Multiplier<br>Area | 140 LUT    | 85 LUT      |
| Delay                     | 12.11 ns   | 10.91 ns    |

Table III Area and delay comparison.

Furthermore, we have performed more experiments using the RRM (A) given in Fig. 6. The resulting constellation diagrams for ideal, corrupted and compensated cases using 32-PSK and 256-QAM modulation formats with phase and gain errors of 15° and 3 dB are given in Fig. 7.



Figure 7 Constellation Diagrams for 32-PSK (a) – (c) and 256-QAM (d) – (f) -  $[\phi_{\epsilon}=15^{\circ}, \beta=3 \text{ dB}]$ 

As can be observed, the compensator has correctly compensated for *IQ*-imbalances. Fig. 8 depicts the BER before and after compensation for (a) 32-PSK and (b) 256-QAM with various phase and gain errors. After compensation the BER closely matches the ideal case.



Figure 8 BER Curves for (a) 32-PSK and (b) 256-QAM

ME plots are given in Fig. 9. As can be observed the demixing coefficients  $w_1$  and  $w_2$  matches the mixing coefficients  $h_1$  and  $h_2$  as the ME approaches zero. Furthermore, we have zoomed in certain parts of the ME plots to show how closely RRM follows the GPM for both  $w_1$  and  $w_2$ . Depending on the RRM dynamic range, the GPM curve and RRM curve may differ at certain parts of the modelling error graphs. The effect of missing coefficients in the RRM dynamic range may lead to an increase in the overall error and reduced performance. In such cases, different RRM designs may be utilized depending on the application environment. The performance measures given in this section shows that,

the use of RRM instead of GPM does not degrade the performance in any significant way.



Figure 9 Modelling Errors (a) 32-PSK and (b) 256-QAM

## 4. CONCLUDING REMARKS

In this paper we have presented an efficient FPGA implementation of an adaptive *IQ*-imbalance corrector utilising reduced-range-multipliers. Use of RRM instead of GPM results in 40% reduction in the hardware complexity and subsequent reduction in power consumption. Through extensive fixed-point simulations we have investigated the affects of replacing the GPM with RRM. Our results show minor reduction in the mean IRR performance which is not that significant. However, utilising RRM still provides us with excellent performance.

## REFERENCES

[1] Churchill F.E., G.W. Ogar and B.J. Thompson, "The Correction of I and Q Errors in a Coherent Processor", *IEEE Trans. on Aerospace and Electronic Systems*, vol. AES-17, no.1, pp. 131-137, January 1981

[2] Lohtia, A., Goud, P., Englefield, C., "An adaptive digital technique for compensating for analog quadrature modulator/demodulator impairments", *IEEE Pacific Rim Conference on Comms., Computers and Sig. Proc.*, vol. 2, pp. 447-50, 93
[3] Li Yu; Snelgrove W.M., "A novel adaptive mismatch cancellation system for quadrature IF radio receivers" *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 46 issue 6, pp. 789–801, June 1999.

[4] M. Valkama, M. Renfors, and V. Koivunen, "Advanced methods for I/Q imbalance compensation in communication receivers," *IEEE Trans. Signal Processing*, vol. 49, pp. 2335–2344, Oct. 2001

[5] Cetin, E.; Kale, I.; Morling, R.C.S., "Adaptive digital receivers for analog front-end mismatch correction", *IEEE VTS 54<sup>th</sup> Vehicular Technology Conference (VTC 2001 Fall)*, vol: 4, pp. 2519–2522, 2001

[6] Cetin, E., I. Kale and R. C. S. Morling, "On The Structure, Convergence and Performance of an Adaptive I/Q Mismatch Corrector", *IEEE VTS 56<sup>th</sup> Vehicular Technology Conference (VTC 2002 Fall)*, vol: 4, pp. 2288–2292, 2002

[7] Demirsoy S.S., E. Cetin and I. Kale, "Reduced range reconfigurable multipliers for efficient implementation of adaptive filters in FPGAs" *to be published in IEE Electronic Letters*.

[8] Widrow B. and S.D. Stearns, "Adaptive Signal Processing", Prentice Hall, 1985 ISBN: 0-13-004029-0.

[9] Turner R. H., R. F. Woods, "Highly efficient, limited range multipliers for LUT-based FPGA architectures", *IEEE Trans. on VLSI Systems*, vol.12, no.10, pp. 1113-17, Oct. 04.