A Survey on LDPC Codes and Decoders forOFDMbased UWB Systems
Torben Brack, Matthias Alles, Timo LehnigkEmden,Frank Kienle, Norbert Wehn
Microelectronic System Design Research Group,University of Kaiserslautern,67663 Kaiserslautern, Germany
{
brack, alles, lehnigk, kienle, wehn
}
@eit.unikl.de
Friedbert Berens, Andreas R¨uegg
Computer Systems Division  UWB BU,STMicroelectronics,1228 PlanlesOuates/Geneva, Switzerlandfriedbert.berens@st.com
Abstract
—Current UWB systems apply convolutional codes astheir channel coding scheme. For next generation systems LDPCcodes are in discussion due to their outstanding communicationsperformance. LDPC codes are already utilized in the new WiMaxand WiFi standards. Thus it is reasonable to investigate thesecodes as candidate LDPC codes for UWB. In this paper theauthors present an implementation complexity and performancecomparison of LDPC decoders. We will show that it is of greatadvantage to design new LDPC codes which are tailored to thespecial latency and throughput constraints of upcoming UWBsystems. This new class of LDPC codes is named UltraSparseLDPC codes. Synthesis results of WiMax, WiFi, and US LDPCdecoders are presented based on an enhanced 65 nm CMOSprocess. We show that the implementation complexity of thenew US LDPC decoders is 55% smaller, utilizing only 0.2 mm
2
instead of over 0.4 mm
2
, while the communications performanceof all observed LDPC codes are almost identical under all theconsidered UWB simulation conditions.
Index Terms
—LDPC, WIMEDIA, UWB, 802.16e, 802.11n,channel coding, implementation, 65nm
I. I
NTRODUCTION
The existing WIMEDIA UWB standard [1] for short rangedevices speciﬁes a data rate of up to 480 Mbit/s within arange of around 2m. The deployed channel coding scheme isbased on traditional convolutional code (CC) with a constraintlength
K
= 7
and code rates between
1
/
3
and
3
/
4
. Especiallyfor the high code rate
R
=
3
/
4
, the diversity gain is limited andthus the communications performance is poor. To be able tosupport larger ranges for the high data rate modes of up to 480Mbit/s and above a more sophisticated channel coding schemeproviding an increased coding gain is mandatory as shownin [2]. To support streaming applications with very stringentlatency and delay jitter requirements a channel coding schemewith a low packet error ratio (PER) below
10
−
3
is needed.LDPC codes are promising candidates for very high throughput and low PER channel coding systems. LDPC codes wereinvented by Gallager in 1963 [3]. They were almost forgottenfor nearly 30 years and rediscovered by MacKay in the mid90s and enhanced to irregular LDPC codes by Richardson et.al. in 2001 [4]. Now they are to be used for forward errorcorrection in a vast number of upcoming standards like DVBS2 [5], WiMax (IEEE 802.16e) [6], and WirelessLAN (IEEE802.11n) [7].
Fig. 1. WIMEDIA UWB simulation chain
In this paper the superior performance of LDPC codeswill be demonstrated through simulations. Furthermore, theimplementation complexity of the chosen codes is shown bysynthesis results and complexity analysis using a 65 nm ASICprocess technology. We will evaluate wellknown LDPC codesfrom the WiMax and the WLAN standard as well as a newclass of low complex LDPC codes named UltraSparse LDPCcodes.The paper is structured as follows: In Section II a shortoverview over the WIMEDIA UWB system will be given,followed by an introduction to LDPC codes in Section IIIand the decoding algorithm in Section IV. The code designand selection for the UWB system is presented in Section V.After the presentation of the used LDPC decoder architecturesin Section VI, the communications performance and synthesisresults are depicted in Section VII.II. WIMEDIA UWB S
YSTEM
M
ODEL
The WIMEDIA UWB standard is based on a multibandOFDM air interface with and without frequency hopping. Theoverall US UWB band ranging from 3.1 GHz to 10.6 GHz issplit into 14 subbands using 528 MHz of bandwidth with 128OFDM subcarriers. The subbands are grouped to form ﬁveband groups. Four of them contain three subbands and oneconsists of two subbands. In this paper we will focus on theﬁrst band group ranging from 3.1 GHz to 4.8 GHz. In order toevaluate the communications performance of the WIMEDIAUWB standard a SystemC based simulation chain has beenimplemented. The basic structure of the simulation chain isdepicted in Figure 1.The current WIMEDIA system uses standard convolutionalcoding. For the purpose of this paper an LDPC encoder hasbeen added to the standard chain. After the channel coding
Channel Range RMS Average No. TransmissionModel Delay of Paths Condition
CM1 04 m 5 ns 21.4 LineofSight (LoS)CM2 04 m 8 ns 37.2 Non LoSCM3 410 m 14 ns 62.7 Non LoSCM4 410 m 26 ns 122.8 Extreme Non LoSTABLE IIEEE C
HANNEL MODELS FOR
UWB
Parameter Value
Data rate 53 Mbit/s to 480 Mbit/sData carriers 100FFT size 128 pointsSymbol Duration 312.5 ns (incl. Guard)Channel Coding CC with
K
= 7(Here: LDPC Code)Carrier Modulation QPSK, DCMTABLE IIWIMEDIA P
HYSICAL LAYER PARAMETERS
the coded data stream is interleaved and then mapped ontoQPSK symbols for the lower data rates and DCM symbolsfor the higher data rates ranging from 300 Mbit/s to 480Mbit/s. These symbols are then used in the OFDM modulatorto generate the OFDM symbols to be transmitted over thechannel. The channel models used correspond to the IEEE802.15.3a channels CM1 to CM4 [8]. The main characteristicsof these models are depicted in Table I. In the receiver thesignal is demodulated and equalized deploying ideal channelestimation. The resulting demapped and deinterleaved softinformation is then processed by the channel decoder which iseither a Viterbi decoder or an LDPC decoder. The WIMEDIAphysical layer parameters are depicted in Table II.III. LDPC C
ODES
LDPC codes are linear block codes deﬁned by a sparsebinary matrix
H
, called the parity check matrix. The set of valid codewords
C
satisﬁes
Hx
T
= 0
,
∀
x
∈
C.
(1)A column in
H
is associated to a codeword bit, and eachrow corresponds to a parity check. A nonzero element ina row means that the corresponding bit contributes to thisparity check. The complete code can best be described by aTanner graph [4], a graphical representation of the associationsbetween code bits and parity checks. Code bits are shownas so called variable nodes (VN) drawn as circles, paritychecks as check nodes (CN) represented by squares, withedges connecting them accordingly to the parity check matrix.Figure 2 shows a Tanner graph for a generic irregular LDPCcode with
N
variable and
M
check nodes with a resultingcode rate of
R
= (
N
−
M
)
/N
.The number of edges supplying each node is called the nodedegree. If the node degree is constant for all CNs and VNs,the corresponding LDPC code is called regular, otherwise itis called irregular. Note that the communications performanceof an irregular LDPC code is known to be generally superiorto which of regular LDPC codes. The degree distributionof the VNs
f
[
d
maxv
,...,
3
,
2]
gives the fraction of VNs with a
Fig. 2. Tanner graph for an irregular LDPC code
certain degree, with
d
maxv
the maximum variable node degree.The degree distribution of the CNs can be expressed as
g
[
d
maxc
,d
maxc
−
1]
with
d
maxc
the maximum CN degree, meaningthat only CNs with two different degrees occur [9].To obtain a good communications performance of an LDPCcode the degree distribution should be optimized with respectto the codeword size
N
. The degree distribution can beoptimized by density evolution as shown in [9]. Furthermore,the resulting Tanner graph should have cycles as long aspossible to ensure that the iterative decoding algorithm worksproperly. A cycle in the Tanner graph is deﬁned as the shortestpath from a VN back to its srcin without traveling an edgetwice. Especially cycles of length four have to be avoided [4].IV. D
ECODING
A
LGORITHM
LDPC codes can be decoded using the message passing algorithm [3]. It exchanges softinformation iteratively betweenvariable and check nodes. Updating the nodes can be donewith a canonical, twophased scheduling: In the ﬁrst phaseall variable nodes are updated, in the second phase all check nodes respectively. The processing of individual nodes withinone phase is independent and can thus be parallelized. Theexchanged messages are assumed to be loglikelihood ratios(LLR). Each variable node of degree
d
v
calculates an updateof message
k
according to:
λ
k
=
λ
ch
+
d
v
−
1
l
=0
,l
=
k
λ
l
,
(2)with
λ
ch
the corresponding channel LLR of the VN and
λ
l
the LLRs of the incident edges. The check node LLRupdate can be done in an either optimal or suboptimal way,trading of implementation complexity against communicationsperformance.
A. Suboptimal Decoding
The simplest suboptimal check node algorithm is the wellknown MinSum algorithm [10], where the incident messagewith the smallest magnitude determines the output of all othermessages:
λ
k
=
d
c
−
1
l
=0
,l
=
k
sign(
λ
l
)
·
min
l
=0
,l
=
k
(

λ
l

)
.
(3)The resulting performance comes close to the optimal SumProduct algorithm only for high rate LDPC codes (R
≥
3
/
4
)with relatively large CN degree. It can be further optimizedby multiplying each outgoing message with a message scaling
factor (MSF) of 0.75. For lower code rates the communicationsperformance strongly degrades.
B. Layered Decoding
Layered decoding applies a different message schedule thanthe classical twophase decoding. It was srcinally proposedby Mansour [11] and denoted as turbo decoding messagepassing (TDMP), then it was referred to as layered decodingby Hocevar [12]. The basic idea is to process a subset of CNand to pass the newly calculated messages immediately to thecorresponding VN. The VN update their outgoing messages inthe same iteration. The next CN subset will thus receive newlyupdated messages which improves the convergence speed andtherefore increases communications performance for a givennumber of iterations. In Section VI we present a partlyparallelarchitecture for layered decoding, processing each of thesesubsets in parallel.V. LDPC C
ODE
D
ESIGN AND
S
ELECTION
To be compatible with the framing already used for theconvolutional code, the codeword size has to be (around) 1200bit with a code rate of
3
/
4
. To offer a reasonable comparisonfor our proposed UltraSparse LDPC code, we also analyzedifferent LDPC codes from current communication standards,namely WiMax and WiFi. All selected LDPC codes have toallow for high throughput decoding of at least 480 Mbit/sby providing inherent parallelism in the code structure, andthey have to be encodable with linear time complexity. Detailsabout all LDPC codes presented in this paper are summarizedin Table III.
A. UltraSparse LDPC Code Design
The aim was to design a rate
3
/
4
code which is capableof layered decoding to enable enhanced throughput whileminimizing memory and logic area compared to nonlayeredimplementations (see Section VII). Thus the density of thegraph has to be very low, and
d
maxv
has to be reduced toavoid access conﬂicts. At the same time, communicationsperformance should be still competitive to normal densitycodes. A further beneﬁt of such a very sparse graph is the smallnumber of edges which has to be processed in each iteration,allowing for even more throughput or reduced parallelism.This in turn relaxes the constraint on
d
maxv
, giving more degreeof freedom to the actual code design.To fulﬁll these requirements, we designed an UltraSparseLDPC Code for 1200 bit codeword size with
f
[2
,
3]
=
{
1
/
4
,
3
/
4
}
, consisting of only 3300 edges. Figure 3 presentsthe resulting parity check matrix, obtained by the 2VPEGalgorithm presented in [13]. In the following, we use the termUltraSparse LDPC Code for LDPC codes with
d
maxv
≤
3
andoverall code density below 1%.
B. Standardized LDPC Codes
The new WiMax standard 802.16e [6] provides two differentLDPC codes for code rates of
3
/
4
which differ in their VNdegree distribution, resulting in slightly different communications performance. Although the WiMax standard supports
0 60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1.020 1.080 1.140 1.2000 60 120180240300
Variable Nodes
C h e c k N o d e s
Structured parity check matrix for UWB
Fig. 3. The UltraSparse UWB LDPC Code
layered decoding for the lower code rates
1
/
2
and
2
/
3
inboth rate
3
/
4
codes do not allow for layered decoding on anarchitecture with serial check nodes due to their parity check matrix structure and relatively high
d
maxv
. The LDPC codesare speciﬁed for 19 different codeword sizes ranging from 576to 2304 bit with a granularity of 96 bit, therefore the codewordsize of 1248 bit was chosen for fair comparison.The upcoming WLAN standard 802.11n [7] supports thecode rate of
3
/
4
with three different codeword sizes of 648,1296, and 1944 bit. For our purposes, we selected the codeword size of 1296.VI. D
ECODER
A
RCHITECTURES
To obtain the optimal implementation and throughput resultsfor both, the standardized LDPC codes as well as the proposedUltraSparse LDPC code, two different architectures are used,see Figure 4 and Figure 5. The hardware realization of botharchitectures is partly parallel, thus only a subset of nodes inthe Tanner graph is instantiated as variable node and check node functional units (VFU and CFU). The FUs work ina serial manner what gives the needed ﬂexibility regardingthe variable node and check node degrees. However thisserial architecture prevents the standardized codes from beingdecoded with a layered scheduling. The reason is that theCFU introduces a latency of
d
c
clock cycles. Because of themaximal variable node degree of four and six respectively wecould not guarantee in a layered architecture that the updatedmessage is computed before it is being needed for anothercheck node. For the proposed UltraSparse LDPC code withthe low
d
maxv
of three this constraint for layered decoding issatisﬁed.There are some fundamental differences between the twophase decoder and the layered decoder architecture. First of all the twophase decoder contains two sum RAMs that areused to accumulate all incoming messages of a variable node.During one iteration one sum RAM is used to compute
λ
k
asshown in Equation 2 by subtracting the corresponding messagefrom the message RAM and adding the channel value of thechannel RAM. The second sum RAM is needed to build newsums for the next iteration, hence both RAMs are swappedafter each iteration. In contrast the layered decoder stores the9 bit wide
a posteriori
information in the channel RAM. ThisRAM contains the sum of channel value and all incomingmessages of a variable node, thus only the correspondingmessage has to be subtracted in the check node block (CNB) toobtain
λ
k
. When the CFU has computed new messages theseare stored in the message RAM and added to the bypassed
λ
k
.
Fig. 4. TwoPhase Decoder ArchitectureFig. 5. Layered Decoder Architecture
For the layered architecture it is possible to save a permutation network since newly computed information can bestored in a shifted way. However, we have to store an offsetfor each address of the channel RAM for the next readaccess. In both architectures the permutation networks arerealized with logarithmic barrel shifters. This is possible sinceall investigated codes are designed using permuted identitymatrices.VII. R
ESULTS
A. Synthesis Results
Table III shows synthesis results for decoder implementations for the LDPC codes introduced in Section V. Allresults are obtained using the current 65 nm technology fromSTMicroelectronics with the clock frequency constrained to528 MHz as speciﬁed by the overall system design. Theﬁrst column describes the implementation of our UltraSparseLDPC code based on the layered architecture template fromFigure 5, all other implementations have to rely on the twophase architecture from Figure 4 due to their more dense codestructure. The throughput speciﬁcation of at least 480 Mbit/sand a decoding latency not exceeding
2
µs
have to be kept for
56789101112131410
−3
10
−2
10
−1
10
0
MBOA−Chain CM1 R=3/4E
S
/N
0
[dB]
P E R
U−S LDPC802.11n802.16e (A)802.16e (B)CC
Fig. 6. Communications performance in the CM1 channel
56789101112131410
−3
10
−2
10
−1
10
0
MBOA−Chain CM2 R=3/4E
S
/N
0
[dB]
P E R
U−S LDPC802.11n802.16e (A)802.16e (B)CC
Fig. 7. Communications performance in the CM2 channel
all implementations. The check node is realized as a MinSumprocessor employing a message scaling factor (see Section IVA) with only very small decoding loss because of the relativelyhigh code rate used.Altogether, the synthesis results show large savings of morethan 55% in overall area consumption between our proposaland the implementations of the standardized codes which canbe explained by four key factors:
ã
Only 30 check nodes
have to be instantiated to reachthe target throughput due to the smaller number of edgeswhich had to be processed in each iteration, the inherenthigher throughput of the layered decoding architecture,and the improved convergence speed of the layereddecoding schedule.
ã
No accumulator memories
are needed as for the twophase architecture (Sum RAM 1+2).
ã
Message memory size is decreased
because of thesmaller number of edges.
ã
Only a single shifting network is needed with 30 ports
of 9 bit each, compared to two 6 bit networks with morethan 50 ports.