Health & Medicine

A general purpose high speed equalizer

The circuit presented is a high-speed self-adaptive filter achieving equalization over a wide range of signals, with a frequency of up to 40.5 MHz, such as the European D2-MAC and HD-MAC transmission standards. This 105000-transistor chip has been
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  209 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 3, MARCH 1991 A General-Purpose High-speed Equalizer Serge Maginot, Freddy Balestro, Christophe Joanblanq, Patrice Senn, and Jacques Palicot Abstract -The circuit presented in this paper is a high-speed self- adaptive filter achieving equalization over a wide range of signals, with a frequency of up to 40.5 MHz, as for the European D2-MAC and High-Definition Multiplexed Analog Components (HD-MAC) transmis- sion standards. It is composed of a 16-tap transversal filter and a separate operative part computing the gradient algorithm and periodi- cally updating the filter coefficients. This 105 000-transistor chip has been designed in a CMOS 1.0-pm technology and is at this time being used in a D2-MAC reception environment. I. SYSTEM VERVIEW N THE framework of the European High-Definition Tele- I ision System EUREKA EU95 Project, a new European transmission standard has been developed: the High-Defini- tion Multiplexed Analog Components (HD-MAC) standard. It is designed to be compatible with the present European transmission standards (D-MAC and DZMAC), already in- cluding duobinary data and MAC components [l]. Indeed: both D2-MAC and HD-MAC signals have approx- imately the same structure, consisting of a time-division multiplex between the analog components (color difference and luminance) and the digital data (including sound) (Fig. 1). These data are coded in a duobinary form, i.e., each bit to be transmitted is added to its predecessor, and the resulting number, actually transmitted, thus has three possible values. Transmission data rates are equal to 10.125 Mb/s for D2-MAC and HD-MAC duobinary data, and 20.25 Mb/s for HD-MAC binary data, with a bandwidth of respectively 5 and 10 MHz. With an oversampling factor of 2, the sampling frequency ranges from 20.25 MHz for D2-MAC to 40.5 MHz for HD-MAC. Because of their digital part, decoding of these signals could be strongly affected by the distortions induced by the transmission channel. Fig. 2 shows the eye diagram of duo- binary data from a D2-MAC signal without perturbation and with an echo (100-ns delay, 10-dB attenuation). As these perturbations are consumer receiver chain dependent (exam- ple of D2MAC via cable in different buildings or city dis- tricts), they can only be corrected by self-adaptive filtering, i.e., channel equalization. The circuit being presented is a self-adaptive 16-tap transversal filter achieving equalization on any 8-b coded signal with a frequency of up to 40.5 MHz and containing periodically a window of binary or duobinary data samples, Manuscript received July 30, 1990; revised October 26, 1990. S. Maginot, F. Balestro, C. Joanblanq, and P. Senn are with France 'Telecom, Centre National d'etudes des TClCcomunications (CNET), 38243 Meylan Cedex, France. J. Palicot is with Centre Commun d'Etudes des TClCcommunications et de TClCdiffusion (CCETT), 35512 Cesson Sevigne Cedex, France. IEEE Log Number 9041399. level Mps h 17.5 ps 34.5ps : duobinary color difference h nce time sound/data Fig. 1. Structure of a D2-MAC line. such as D, D2, and HD-MAC signals. This chip includes a delay line of 240 8-b data samples which are used for the internal gradient computations. Only linear distortions (echos) can be corrected by this chip. 11. ALGORITHM The algorithm being processed in order to compute filter coefficients is the well-known gradient algorithm (least mean squares, or LMS, algorithm) [2]-[4]. This algorithm is com- puted in different times on a window of 240 consecutive damaged data samples (x,,,, ~238, ., xlr ,) included in the digital part of the signal. The same 240 samples are used during the entire compu- tation of one set of coefficients (they are stored in a looped shift register). At each step of the algorithm, 16 of them are taken to compute the error. If sampling is achieved, without any degradation, using a synchronized clock, each of these data should take one of two possible values V,m and Vhia (if binary data) or one of three possible values V,,, Vmed, and Vhigh (if duobinary data). This restricted number of data values is the only characteristic the signal is supposed to observe in order to be equalized by the following computations. The process executed at step number n of the algorithm is as follows. Let x,, x,-~,. ., x,-~~) e the set of perturbed data, and (hp', hy), * 9 h$)) be the set of filter coefficients. Filter output is 15 (1) n = h$"'x,-i. i=O Let 9 be the estimated ttue data corresponding to y,. Then 001 8-9200/9 1 /0300-0209$01 OO 991 IEEE ~ ~_____~ ~~ ~~ ____ I ~~  ~ 210 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 3, MARCH 1991 E 1 3 (a) B 1 3 (b) tion (echo with 100-ns delay and 10-dB attenuation). Fig. 2. Eye diagram of D2-MAC data (a) with and (b) without distor- the error is given by e,, = y, - E and the new set of filter coefficients for step n + 1 is where p = 2-7= 0.01 is the algorithm step. The value of in is computed according to the following. In the case of binary data, if Vi E [o 15],5?+1) = h ") Penxn-i (2) ow + high 2 = then Yn G S n = ow Yn > s 9 = high. In the case of duobinary data, if ow + 'med and SD2= med + high 2 2 DI = then Yn G SD = ow   = mcd D2 < Yn Yn > S I = high. Of course, in the case of oversampled digital data (such as the duobinary part of the D2-MAC signal), the error e,, must not be computed on interpolated data, which are not infor- mation data. To handle this, an additional input signal will be required equal to ONE for information data and ZERO for interpolated data: the filter output will be computed for both, whereas the error will only be evaluated for informa- tion data (for an interpolated datum, the error computed for the previous information datum will be reused). Equalization on oversampled data shows another impor- tant restriction of this algorithm: indeed, such a gradient algorithm could be computed indefinitely, except for frac- tional tap-spacing equalization (i.e., equalization on over- sampled data), where random coefficient overflows could be reached [5], [6] In such a case, the equalizer will have many sets of tap values resulting in nearly equal values of mean- squared error, some of these tap settings being large enough to cause overflows. In order to avoid this phenomenon, it is possible to introduce a bias R) to balance the tap wandering. Vi€[O... 15]h~ +'~=(1-~)h~ ~-penx,,~, Thus, (2) could be replaced by the following: O<B<P). (3) Another solution (easier to implement) is given by Vi E [O.. . 15]hi"+')= hj")- Bsgnhj")- pe x - O<B<P) (4) where sgn H is the sign of H: f H is positive (resp. negative), sgn H is equal to 1 (resp. ). According to [6], if p = r7 should be nearly equal to 2-14, thus leading to a large increase in internal data width. In order to avoid this, we will assume that, in the case of fractionally spaced equalization (i.e., oversampling of input data), it will be possible to find a criterion of transmission quality, enabling the algorithm computations to be stopped. By doing this, we only need to implement (2). This will be possible for our main application, the D2-MAC receiver. In this system, data are oversampled by a factor of 2 but are also duobinary coded. Consequently, the data sample duobit, ( = 0,1, or 2 received at time n is given by duobit ,, = bit ,, - + bit ,, (5) where bit, ( = 0 or 1) is the decoded information bit trans- mitted at time n. On reception, it is thus possible to decode duobinary data and to find errors: if a decoded bit at time n - 1 (bit, - ) is equal to 0 and received duobinary datum at time n (duobit,,) is equal to 2, it is impossible to decode bit, satisfying (5)-there is a violation Such a violation appears each time, the system receives an even number of 1's between a 0 and a 2 or a 2 and a 0, or an odd number of 1's between two 0 s or two 2's. The number of such errors provides us with a quality criterion which can be used to stop or restart the equaliza- tion process. ~  MAGINOT et al.: A GENERAL-PURPOSE HIGH-SPEED EQUALIZER I ~ 21 1 Fig. 3. Equalizer architecture. 111. CIRCUIT PECIFICATIONS As the method for implementing such signal processing is very sensitive in terms of computing speed and material cost, a specific integrated circuit has been designed for this pur- pose. This circuit, the EQUALIZER, processes a wide range of signals, as mentioned above. Indeed, the signals to be treated have to be 8-b coded and contain a periodic subset of binary or duobinary data. The frequency is also limited to 40.5 MHz. The gradient algorithm is performed according to the following scheme: 240 damaged binary or duobinary data samples are stored in a looped delay line and used in an algorithmic part of the circuit to compute a new set of coefficients which will be transferred to the filter. Typically, this process makes the algorithm converge after 1.5 ms at 40.5 MHz (depending on circuit options). Computations can be continuously performed with a periodic coefficient update (a new window of 240 data is used for each coefficient update). For fractional tap-spacing equalization, computa- tions need to be stopped before coefficients reach the limit values authorized by the implementation. Many circuit parameters are programmable by the user. Among these: a) filter coefficient initialization: transparent filter: one (central) coefficient at 1, the half-Nyquist filter (low-pass filter) with a 20% roll-off; b) position of central coefficient (in the case of a transpar- ent filter): in this case, it is sometimes useful to shift the central coefficient in order to be able to correct echos which have a longer delay; binary data (0 or 1): for example, reference data inserted in line 312 of the vertical blanking interval in the HD-MAC signal, duobinary data (0, 1, or 2): for example, sound and synchronization data inserted in each line of the DZMAC, D-MAC, and HD-MAC signals; d) values of minimum and maximum data levels: for exam- ple, digital values corresponding to -0.4 and +0.4 V others at 0, c) type of data used for the equalization processing: Corrected signal for the D2-MAC signal (duobinary data levels), or 0 and 0.4 V for the HD-MAC signal (binary data levels); e) number of computing steps for the gradient algorithm between two consecutive filter coefficient updates. The circuit also has two operating modes: 1) the equalizer mode (the main goal), and 2) the stand-alone programmable 16-tap filter mode (the equalization algorithm is inhibited and the filter coefficients are loaded by the user). IV. ARCHITECTURE The circuit architecture, as shown in Fig. 3, is composed of five main parts: a 16-tap transversal FIR filter, a dynamic delay line of 240 eight-bit data samples, an operative part performing the gradient algorithm, a set of four pro- grammable registers, and, finally, a control part supervising the whole circuit. A. Filter The transversal filter performs a 16-tap filtering with 8-b data and 9-b two’s complement coefficients. It has a classical transposed architecture, with parallel broadcasting of data to all multipliers and serial accumulation of partial products (Fig. 4). The multiplier output is 15-b rounded; the accumu- lation word length is 18 b and the output is 8-b saturated and rounded. The basic macrocells such as an 8 X 9 Booth carry- save array multiplier and 18-b ripple carry adder have been reused from a previous design [7]. Synchronous updating of the coefficients is performed by serial downloading of the coefficient register bus to the multipliers. The filter can operate either in the stand-alone mode, with external coefficient loading on a dedicated ten-pin 1/0 bus, or in the equalization mode with internal updating of the coefficients. In the latter case, the above-mentioned bus outputs the new coefficient values after each algorithm itera- tion for test purposes. B. Delay Line The delay line stores 240 consecutive data samples (even- tually padded by zeros if the window signal does not include 240 data) which are used in the gradient algorithm computa-  212 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 26, NO. 3, MARCH 1991 Data 20) nput signal Putcoefb Coefflaent transfert signal Fig. 4. Filter architecture. tion. At each step, that is every 41 clock cycles, a new datum is sent from the delay line to the operative part. Several options were available to implement this block: a RAM with FIFO addressing, a shift register, or the chosen structure, which relies on three-transistor-cell bit planes with pointer controlled addressing. Our approach has the follow- ing advantages over the first two: a) transistor count is reduced to a minimum with a stan- dard CMOS technology (excluding one-transistor-cell dynamic RAM technology which cannot be integrated with logic standard blocks). The overall number of transistors per memorized bit is below 3.5; b) the structure is fast enough for our application: the allowable frequencies range between 500 kHz and more than 72 MHz; c furthermore, we developed a generator for such delay lines of arbitrary length and word width [9]. Fig. 5  shows the internal structure of the delay line and the way data are transferred from one cell to the next. Each memory cell has two independent bit lines, allowing a read and a write operation to be performed in one half clock cycle, the first half being the precharge phase of the read bit line to VDD. Read/write word lines are shared between two adjacent cells, thus simplifying the pointer circuitry. At each cycle (Fig. 61, any datum in an addressed column is shifted up to the right by one position. The column just read is ready to be written at the next cycle: this assumes that there is always an empty "buffer" column in the array. The delay line is organized into eight bit planes of six rows and 41 columns each, and has a capacity of 6x40 = 240 eight-bit samples. The pointer circuitry is shared between all the bit planes. However, this structure had to be slightly modified be- cause of our need to access consecutive data at every nth clock cycle (here n = 41) and not every cycle. This could not have been done by simply working at F/n, if F is the main clock frequency. In fact, we would have been very close to the minimum frequency required for the delay line itself (500 kHz) because of its dynamic structure; this would have constrained the circuit to a main frequency over 20 MHz and consequently would have limited its potential range of appli- cations. To avoid this constraint, we forced the delay line to loop after the 240 data acquisition phase and to operate at the main circuit clock frequency. Another problem encountered was the selection of the right datum after n cycles, while all data are shifting perma- 9 bit + oef ( O) Coefficient transfer bus rewrite read/write readhrite line word line word line DOUT TI TI DD - i readbiline .. .. -- VDD ________________________-__-_____-______ Fig. 5. Delay-line internal structure. nently. Indeed, consecutive input data become output every clock cycle and not every n clock cycles. This problem was solved by using not only the last line output, but all the read line outputs and selecting the right datum through a multi- plexer. Let and c + denote the number of lines and columns, respectively (the global delay achieved is equal to IC). Sup- posing the line is looped, at time i if D(imodIc) is the output of the fth line, then data D i- cmodlc), D i- 2c mod IC), ., D i - I 1)c mod IC are the output of lines 1,2,. , - 1 (Fig. 6). If data D i mod fc) s read at time i we want to read data D i + mod IC) to be read at time i nj. Thus, for all j > 0: i + mod fc must belong to {i nj - mod fc; . , i + nj - I - ) c mod IC, i + nj mod IC}. (6) We can verify that (6) has solutions if and only if c divides n - 1. The simplest solution is obtained for c = n - 1. (Note that this constrains the global delay to be a multiple of n, which was not a problem in our case.) Thus, data D i+ j mod fc) an be taken at line j mod 1. Here, f = 6, n = c + 1 = 41, and fc=240. Consecutive input data can then be taken at consecutive lines of the structure every 41 clock cycles.  MAGINOT et al. : A GENERAL-PURPOSE HIGH-SPEED EQUALIZER 213 Xn+l = new data Wi mod IC Wi+l)c mod IC Mia mod k) D ic mod C DOUT DIN 1 POINTER f Fig. 6. Delay-line architecture. COd(89) =.xefMent3tobe transfened m he Condcd = efficient check esult lta A EL I Coefficient check i+   6 16 Fig. 7. Algorithmic part architecture. Compared with the srcinal structure, the multiplexer over- head is very small and the select block (looped register) can be shared between all 8 b. C. Algorithmic Operative Part As seen above, the algorithm (including filter output com- putation) is processed in different times in the operative part (see Fig 7). This part therefore needs to memorize 16 data xi and 16 filter coefficients hj ) at each step of the algorithm. Thus, this part includes four different static shift registers: the first of sixteen 8-b data, the second of sixteen 17-b coefficients, and the two others of sixteen 9-b minimum and maximum coefficient values. The choice of fully static shift registers, instead of a dynamic delay line (as in Section IV-B), was dictated by the following consideration: coefficient values can be stored in the operative part for a long and undetermined period. This can depend on the arrival time of the next window signal, but also on the system configuration: in the DZMAC applica- tion, equalization can be suspended as long as the binary error rate is satisfactory, and then be restarted with the last computed coefficient set. It is thus impossible to use a dynamic delay line which has a minimum shifting frequency (unless using a complicated logic). Furthermore, it was easy to implement, in such a shift register, two different kinds of initialization (transparent filter or half-Nyquist filter). This part also contains one 8-b X 9-b multiplier (identical to filter multipliers), a 19-b carry lookahead adder to per- form the accumulation (equation (l)), and a 17-b carry looka- head adder to compute the coefficient incrementation (equa- tion (2)). These optimized adder architectures are included in our data-path generator. Because of the accuracy required by (2), coefficients are 17-b two s complement coded during the whole algorithm computation, except during the multipli- cation and, of course, before the transfer to the transversal filter, where they are 9-b rounded (Fig. 7). The values of computed coefficients are continuously checked in the operative part. They are first constrained to a certain range around their initial values: two 9-b compara- tors are used with the two sixteen 9-b "minimum and maxi- mum value" shift registers. Second, their sum has to be greater than half the value of the central coefficient: a 13-b ripple carry adder performing the coefficient accumulation and a 13-b comparator are implemented. These checks are performed in order to detect any possible algorithm diver- gence. In such a case, computations are reset to coefficient initial values. D. rogrammable Registers Four programmable configuration registers can be loaded according to an asynchronous bus-like mode (independent of system clock). With a chip select (CS, active high), two address bits (REGADO, REGADl), and a clock signal (WRITE, active on the rising edge), a master chip can load configuration values into the registers using the 8-b input data bus.
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks