Documents

Maximum Liklihood

Description
Maximum Liklihood
Categories
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
   XI Reuni´ on de Trabajo en Procesamiento de la Informaci´ on y Control,16 al 18 de octubre de 2007  Maximum Likelihood Decoding on a Communication Channel Cesar F. Caiafa y , Nestor R. Barraza z , Araceli N. Proto y ;? y  Laboratorio de Sistemas Complejos, Facultad de Ingenier ´ a, Universidad de Buenos Aires - UBAccaiafa@.uba.ar  z  Instituto de Ingenier ´ a Biom´ edica, Facultad de Ingenier ´ a, Universidad de Buenos Aires - UBAnbarraza@.uba.ar  ?  Comisi´ on de Investigaciones Cient´cas de la Prov. de Buenos Aires Aires - CIC aproto@.uba.ar  Keywords  — Digital Communication Channel,Maximum Likelihood Decoder, Markov Chain, Log-arithmic Distribution, Ising Model. Abstract  — A binary additive communicationchannel with different noise processes is analyzed.Several noise processes are generated according toBernoulli, Markov, Polya, and Logarithmic distribu-tions. A noise process based on the two dimensionalIsing model (Markov 2D) is also studied. In all cases,a maximum likelihood decoding algorithm is derived.We obtain interesting results since in many cases, themost probable code-word is either the closest to theinput, or that farthest away, depending on the modelparameters.I INTRODUCTION Maximum likelihood - ML decoding on communicationshas been applied for different kind of channels: Ad-ditive White Gaussian Noise - AWGN (Chi-Chao et al(1992), Haykin (2001)), Binary Symmetric Channel -BSC (Haykin (2001)), Binary Erasure Channel - BEC(Khandekar and Mc. Eliece (2001)) and others. MLdecoding was also studied when some code is transmit-ted over the channel, such as Turbo Codes (Hui Jin andMcEliece (2002), Moreira and Farrel (2006)), Linear Predicting Code - LPC (Haykin (2001), Moreira and Far-rel (2006)) or Cyclic Redundant Codes - CRC (Haykin(2001), Moreira and Farrel (2006)). In some cases, itis considered that Maximum Likelihood is equivalent tominimum Hamming distance decoding. However, it isnot true for different kind of noise processes (crossover  probabilities). In this paper, we show some cases wherethe most probably transmitted code-word is that farthestaway from the input. Also, cases which are equivalent tominimum Hamming distance, and intermediate possibili-ties are presented. The type of channel we analyze is theBSC, where the output is produced by adding some noise process to the input code-word. The different kinds of noise distributions we analyze are Bernoulli, Polya con-tagion, Markov chain and Logarithmic. In addition, a twodimensional Ising noise process is also studied. New andinteresting results, depending on the parameters of thenoise process, are shown. II THE BINARY ADDITIVECOMMUNICATION CHANNEL We study a discrete communication channel with binaryadditive noise as it is depicted in Fig. 1. Then, the ithoutput  Y  i  2 f 0 ; 1 g is the module-two sum of the ith input X  i  2 f 0 ; 1 g  and the ith noise symbol  Z  i  2 f 0 ; 1 g , i.e. Y  i  =  X  i    Z  i ,  i  = 1 ; 2 ;  . We assume independence between input and noise processes, and input is a nitecode-word chosen from a nite code-book. This type of channel was analyzed in Alajaji and Fuja (1994) wherethe process  Z  i  follows the Polya contagion model .Figure 1: The binary additive communication channelmodel.Following these assumptions, for an output vector  Y  = [ Y  1 ;Y  2 ;   ;Y  n ] , a random input code-word  X  =[ X  1 ;X  2 ;   ;X  n ]  and a random noise vector   Z  =[ Z  1 ;Z  2 ;   ;Z  n ] , the channel transition probabilities aregiven by 1 : P  ( Y  =  y = X  =  x ) =  P  ( Z  =  x  y )  (1)where  x  y  = [ x 1  y 1 ;x 2  y 2 ;  ] .To clarify concepts, a given input, output and noiseoutcomes could be: x  = [1 ; 0 ; 0 ; 1 ; 1 ; 0 ; 1 ; 1 ; 1 ; 0 ; 1 ; 0 ; 0] z  = [0 ; 0 ; 1 ; 1 ; 0 ; 0 ; 0 ; 0 ; 1 ; 1 ; 0 ; 0 ; 1] y  = [1 ; 0 ; 1 ; 0 ; 1 ; 0 ; 1 ; 1 ; 0 ; 1 ; 1 ; 0 ; 1] 1 Through out this paper, we use capital letters for random variablenames and lower case letters for denoting realizations of them. Addi-tionally, bold letters are used for vectors.   XI Reuni´ on de Trabajo en Procesamiento de la Informaci´ on y Control,16 al 18 de octubre de 2007  Therefore, ”1's” in the noise process determines whichinput symbols are changed. The Hamming distance be-tween input and received code-words, is given by: d  = n X i =1 z i  (2)In order to simplify the notation through out the paper,we will avoid the usage of random variable names whena probability of a specic realization is written, for exam- ple, instead of writing  P  ( X  =  x = Y  =  y )  we will write P  ( x = y ) . III MAXIMUM LIKELIHOOD DECODING For a code-book   C   composed by a set of   m  code-words,i.e.,  C   =  x 1 ; x 2 ;:::; x m  , the maximum likelihood de-coder chooses, as the estimated input, the most probablycode-word  x k given a received output  y , i.e. by maxi-mizing  P  ( x k = y ) . Following the Bayes rule we get: P  ( x k = y ) =  P  ( y = x k ) P  ( x k ) P  ( y )  (3)Since  P  ( y )  is independent of the decoding rule, andconsideringthatallcode-wordsareequallylikely, theMLalgorithm results: ^x  = argmax( P  ( y = x k )) :  x k 2 C   (4)Following (1) and (4), the estimated code-word is ob-tained by choosing  ^x  =  x k which makes  P  ( z )  maxi-mum, i.e.: ^x  = argmax( P  ( z k )) :  z k =  y  x k ;  x k 2 C   (5)Then, the estimated input is fully determined by noise(crossover) characteristics and the used code-book.Following the chain rule of probability, for code-wordsof length  n  the noise process can be expressed as: P  ( z ) =  P   ( z 1 ) n Y i =2 P  ( z i =z i  1 ;z i  2 ;:::;z 1 )  (6) A ML Decoder error probability If   k max  denotes the index for which the probability P  ( y = x k )  is maximized, i.e.  ^x  =  x k max , then the con-ditional error probability of the ML decoder is dened as(Barbero et al (2006)): P  ( error = y ) =  P  ( x k max 6 =  x k = y )  (7)and the error probability of the ML decoder is P  ( error ) = X y P  ( error = y ) P  ( y )  (8) Now we obtain an expression of the error probabilityin terms of the code-book   C   and the received vector   y .Equation (7) can be rewritten as follows: P  ( error = y ) = X i 6 = k max P  ( x i = y ) and by using the Bayes rule, it is easy to see that the con-ditional error probability can be written in the followingform: P  ( error = y ) = P i 6 = k max  P  ( y = x i ) P  ( x i ) P mi =1  P  ( y = x i ) P  ( x i ) (9) = 1  P  ( y = x k max ) P mi =1  P  ( y = x i ) P  ( x i ) (10)From equation (10), it is clear that, as the atter thefunction  P  ( y = x k )  is in terms of   x k , the bigger the error is.In the following subsections, we analyze the decoder  behavior for some specic noise distributions. B Bernoulli noise model For this noise distribution, all the  Z  i 's are independentand have a common parameter   p  (probability of changein one bit or crossover probability), so (6) results: P  ( z i =z i  1 ;z i  2 ;:::;z 1 ) =  P   ( z i ) =  p z i (1   p ) 1  z i (11)According to (1), (6) and (11), the probability that agiven code-word  x k had been input when a code-word  y is received  P  ( y = x k )  is given by: g B ( d ) =  P  ( z k ) =  nd   p 1   p  d (1   p ) n (12)where  d  =  d ( x k ; y )  is the Hamming distance between x k and  y  as was already dened in (2).As it can be seen from (12), when  p  is less than  1   p ,the most probable input code-word (ML decoding) whichmaximizes  g B ( d )  is that one closest from that received(minimum  d ). Conversely, when  p  is greater than  1   p ,the ML input decoding is that having the greatest  d , i.e.,the most different code-word from that received. Thissimple case shows the two possibilities for ML decod-ing, when  p <  12 , the noise parameter is not enough to produce considerable changes. When  p >  12 , the noise parameter is big enough to consider that the input waschanged at maximum. C Polya contagion noise model As it was analyzed in Alajaji and Fuja (1994), when thenoise process is given by the Polya contagion model (seePolya and Eggenberger (1923), Feller (1950)) the condi-tional probabilities are given by: P  ( z i =z i  1 ;z i  2 ;:::;z 1 ) =  P   ( z i =s i  1 )  (13)where  s i  1  =  P i  1 l =1  z i . The channel transition probabil-ities result: g P  ( d ) =  P  ( z k ) = (1 =  )( =   +  d )( =   +  n  d )( =  )( =  )(1 =   +  n ) (14)   XI Reuni´ on de Trabajo en Procesamiento de la Informaci´ on y Control,16 al 18 de octubre de 2007  where  d  is the Hamming distance as it was dened be-fore;   ,    and     are model parameters and  ( t ) = R  1 0  u t  1 exp(  u ) du  is the gamma function. Since g P  ( d )  is strictly convex, has a unique minimum  d 0 , andis symmetric about  d 0 , the most probable code-word will be either that having minimum or maximum Hammingdistance from the received code-word (Alajaji and Fuja(1994)). It means, the best estimate corresponds to  d  far-thestawayfrom d 0 . ThispropertyforthePolyacontagionmodel, is independent from the parameters, it means, theestimated input could be the closest or the farthest de- pending on the received code-word. It is due to the con-vexity of   g P  ( d ) . D Markov noise model We consider here that the noise process can be mod-eled by a rst order Markov chain (Feller (1950)), i.e. P  ( z i =z i  1 ;z i  2 ;:::;z 1 ) =  P   ( z i =z i  1 ) . This modeldepends on three parameters: the crossover probabil-ity  p  =  P  ( z i  = 1)  and the noise transition probabil-ities    =  P  ( z i  = 1 =z i  1  = 0)  (probability of a bit”1” given that the previous noise outcome is a ”0”) and    =  P  ( z i  = 0 =z i  1  = 1)  (probability of a bit ”0” giventhat the previous noise outcome is a ”1”). Using the chainrule (6) we obtain the channel transition probabilities asfollows: P  ( z k ) =  p z 1 (1   p ) 1  z 1  n 10 (1   ) n 11   n 01 (1    ) n 00 (15)where the parameters  n st  ( s;t  = 0 ; 1 ) are the number of  bitswith the value ” s ” followed by a bit with the value ” t ”and verifying the constraint:  n 10  +  n 11  +  n 01  +  n 00  = n  1 .A very simple expression for the ML decoder is ob-tained from (15) for the particular case where the noisetransitions are symmetric, i.e.    =    . In the later case,the function to be maximized (ML decoder) is: g M  ( z 1 ;q  ) =   p 1   p  z 1    1    q (16)where  q   =  n 01  +  n 10  is the number of transitions (”0”to ”1” and ”1” to ”0”) in the noise vector   z  =  y    x k .We conclude from (16) that the ML decoder is a non-decreasing (decreasing) function of   q   when the noisetransition probability is   >  0 : 5  (  <  0 : 5 ). In other words, when  >  0 : 5 , the most probable inputcode-wordis that one corresponding to a noise vector with the high-est number of transitions as possible (maximum  q  ). E Logarithmic noise model In this model, we consider that noise is composed byalternate chains of ”1's” and ”0's” and the length of each chain follows a logarithmic distribution (Douglas(1980)). If we denote by  U   the length of a given ”1's”chain and by  V   the length of a given ”0's” chain, then: P  ( U   =  u ) =     1 u ln(1    1 )  (17) P   ( V   =  v ) =     0 v ln(1    0 )  (18)where    1  and    0  are the parameters of the logarithmicdistributions corresponding to ”1's” and ”0's” respec-tively and  0  <   1 ;  0  <  1 .In order to clarify this model, a noise output exampleis shown below: z  = [0 ; 0  |{z}  v 1 =2 ;  1 ; 1  |{z}  u 1 =2 ; 0 ; 0 ; 0 ; 0  | {z }  v 2 =4 ;  1 ; 1  |{z}  u 2 =2 ;  0 ; 0  |{z}  v 3 =2 ;  1  |{z}  u 3 =1 ] where  u i  and  v j  are the lengths of the ”1's” chain number  i  and the ”0's” chain number   j . Notice that high values of    1  and    0  produce noise congurations with long chains,on the other hand, for     1 ;  0  !  0  we get congurationswith alternate single ”1's” and ”0's”.The interest in this model comes from the property thatthe probability of getting a ”1” in a given bit, following agroup of   r  ”1's”, tends to a constant value    1  as  r  ! 1 ,as it is shown from the conditional probability: P  ( Z  i = 1 =Z  i  1  = 1 ;   ;Z  i  r  = 1) =  S  r +1  +  S  r +2  +  S  r  +  S  r +1  +  (19)where: S  r  =  P  ( U   =  r ) This property shows a difference with the Polya conta-gion model, where the conditional probability (19) tendsto 1 as  r  tends to innity. The property (19) for the loga-rithmic distribution was remarked in Siromoney (1964).Assuming independence among  u i  and  v j  8  i;j , weobtain the channel transition probabilities as follows: P  ( z k ) =  k 1 Y i =1 P  ( u i ) !0@ k 0 Y j =1 P  ( v j ) 1A  (20)where  k 1  is the number of ”1's” chains,  k 0  is the number of ”0's” chains,In order to obtain the ML decoder, we apply the natu-ral logarithmic function to (20) and we obtain the  g L ( : ) function to be maximized which is: g L ( n 1 ;k 1 ;k 0 ; f u i g ; f v j g ) =  n 1  ln    1   0   (21)  k 1  ln[   1 ]  k 0  ln[   0 ]  k 1 X i =1 ln[ u i ]  k 0 X j =1 ln[ v j ] where  n 1  =  P k 1 i =1  u i  is the total number of ”1's” and the parameters    1  and    0  are dened by    s  =   ln(1    s ) (for   s  = 0 ; 1 ).From the observation of equation (21) we con-clude that there are too many variables to measure( n 1 ;k 1 ;k 0 ; f u i g  and  f v j g ) for the implementation of theML decoder, which could be a problem from the point of view of its complexity. For this reason, in this paper, we propose an approximation of (21) in order to reduce its   XI Reuni´ on de Trabajo en Procesamiento de la Informaci´ on y Control,16 al 18 de octubre de 2007  Figure 2: Plot of   h (   ) = 1  ln h   1   i . The ML decoder chooses the maximum or minimum number of transitions q   as   <  0 : 73  or    >  0 : 73 , .Figure 3: Exact ML decoder (optimum) versus Approx-imated ML decoder for      =    1  =    0 ,  m  = 16  and n  = 7 ; 14 ; 21  and  28 .complexity. The idea is that the last two terms in (21) can be approximated by using the linear approximation of thelogarithm ( ln( t )    t  1  for  j t j  <  ) as follows: k 1 X i =1 ln u i    n 1  1  k 1  +  k 1  ln  1  (22) k 0 X i =1 ln v i    n  n 1  0  k 0  +  k 0  ln  0  (23)where   1  =  E  [ U  ] =     1 = [(1    1 )ln(1    1 )]  and  0  =  E  [ V   ] =     0 = [(1    0 )ln(1    0 )]  are the meanvalues of the logarithmic random variables  U   and  V   re-spectively. Note that, in the approximations (22) and(23), we have used the approximation for   u i  1  ;  v j  0   <  which indicates that these approximations will be validfor the cases where chain lengths are not so far away fromtheir mean values. Finally, by putting (22) and (23) in(21), we obtain the approximated  g L ( : )  function whichis: ^ g L ( n 1 ;k 1 ;k 0 ) =  n 1  ln    1   0 + 1  1   1  0   (24) + k 1 f 1  ln[   1  1 ] g +  k 0 f 1  ln[   0  0 ] g Let us now consider a simple case where ”1's” chainsand”0's”chainsareidenticallydistributed, i.e.     =    1  =   0 . In this case,   1  =   0  and    1  =    0  and therefore theapproximated ML decoder is even simpler: ^ g L ( q  ) = ( q   + 1)  1  ln     1      (25)where  q   =  k 1  +  k 0   1  is the number of transitions (”0”to ”1” and ”1” to ”0”) in the noise vector   z  =  y    x k .Looking at equation (25) we see that, in this particular case,  ^ g L ( q  )  depends on the number of transitions lin-early; so, we need to determine if the factor   h (   ) = n 1  ln h   1   io  is positive or negative in order to as-sign the most probable transmitted code-word to the max-imum or to the minimum number of transitions  q  . FromFig. 2 we see that     = 0 : 73  is a threshold fromthe most probable code-word corresponds to that havingmaximum or minimum number of transitions  q  , provided  <  0 : 73  or    >  0 : 73 .In order to test the effectiveness of our approximations(22) and (23) we have conducted a huge number of sim-ulations where vector noises were generated accordingto their logarithmic distributions for the case of having    =    1  =    0  and covering the complete range of the parameter     . A random code-book with  m  = 16  code-words was generated for different cases of code-wordlengths  n  ( n  = 7 ,  14 ,  21  and  28 ) and a minimum Ham-ming distance among code-words of   d ( x i ; x j ) = 2  wasguaranteed. For each value of     , a total of   500  simu-lations were conducted in order to average the obtaineddecoder error probability and reach to an estimation of (8). In Fig. 3 the decoder error probability obtained byusing the exact ML decoder (equation (21)) and the ap- proximated ML decoder (equation (25)) are shown. No-tice that the exact ML decoder always gives a lower error  probability than the approximated version as expected.Maximum probability error is reached at       0 : 75 , asshown. We remark that this value of      also gives themaximum variance of   q  , in agreement with the maximum probability of error of the ML decoder and the transitionthreshold shown in Fig. 3. These results will be further studied in a future work. F 2D Ising noise model In this subsection, we extend the ML decoder for 2D bi-nary signals transmitted over a channel with the samecharacteristics as shown in Fig. 1. 2D signals are use-ful for representing digital images. A very well knownmodel for binary images is the Ising model which hasits roots in statistical mechanics as a model for ferro-magnetic materials (Huang (1987)). The Ising model has been widely applied to model interactions between pixelsin images, (Geman and Geman (1984)), and introducedthe development of the theory of Markov Random Fields,(Greaffeath (1976)). In this paper we propose to use the
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks