Healthcare

A robust image authentication method distinguishing JPEGcompression from malicious manipulation

Description
A robust image authentication method distinguishing JPEGcompression from malicious manipulation
Categories
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Universal Quadratic Lower Bounds on Source Coding Error Exponents Cheng Chang and Anant Sahai  Abstract —We consider the problem of block-size selec-tion to achieve a desired probability of error for universalsource coding. While Baron, et al  in [1], [9] studied thisquestion for rates in the vicinity of entropy for known distributions using central-limit-theorem techniques, we areinterested in all rates for unknown distributions and useerror-exponent techniques. By adapting a technique of Gallager from the exercises of [7], we derive a universallower bound to the source-coding error exponent thatdepends only on the alphabet size and is quadratic in thegap to entropy. I. I NTRODUCTION In [10], the lossless source coding with decoderside-information problem, as shown in Figure 1, isintroduced. The source and decoder side-informationsequence ( x  n 1 , y  n 1 ) are drawn iid from a joint distribution  p xy  on a finite alphabet X ×Y  . If the decoder knows y  n 1 , the error probability Pr (ˆ x  n 1  = x  n 1 ) , goes to 0 , as thecode length n goes to infinity, for any rate R > H  (  p x  | y  ) ,where H  (  p x  | y  ) is the conditional entropy of  x  given y  .Encoder ˆ x  n 1 “Lossless”reconstructionDecoder b ( x  n 1 ) Encoded bits -- 6- x  n 1 y  n 1 SourceSide-information ( x  i , y  i ) ∼  p xy   6    ? Fig. 1. Lossless source coding with decoder side-information The performance of the coding system, i.e. how fastthe error probability converges to 0 with block length n ,when the coding rate is above the minimum requiredrate, is studied in [5], [6], [7]. We summarize therelevant error exponent results from the literature in thefollowing. Theorem 1: [6] Assume a decoder with access to theside information, where the memoryless source and sideinformation are generated from a distribution p xy  . A ran-dom binning encoder and jointly ML decoding system,shown in Figure 1, has error probability Pr (ˆ x  n 1  = x  n 1 ) ≤ C. Chang and A. Sahai are with the Department of Elec-trical Engineering and Computer Science, University of Califor-nia at Berkeley. Email: cchang@eecs.berkeley.edu,sahai@eecs.berkeley.edu e − nE r ( R ) , where E  r ( R ) = max 0 ≤ ρ ≤ 1 ρR − ¯ E  0 ( ρ ) (1)where ¯ E  0 ( ρ ) = ln(  y (  x  p xy  ( x,y ) 11+ ρ ) 1+ ρ ) Without decoder side information, the Gallager func-tion ¯ E  0 simplifies 1 to: E  0 ( ρ ) = (1 + ρ )ln(  x ∈X   p x  ( x ) 11+ ρ ) In Theorem 1, the random binning scheme at theencoder is uniform, and thus universal in nature [6].However for the ML decoding rule, the decoder needsto know the statistics of the source. In [5], a universalsystem based on minimum entropy decoding is shownto achieve the same error exponent asymptotically. Forthe universal decoder, Pr (ˆ x  n 1  = x  n 1 ) ≤ e − n ( E r ( R ) − φ ( n )) (2)Where φ ( n ) is the vanishing term |X| ln nn for the casewithout side-information and φ ( n ) = |X||Y| ln nn for thecase with decoder side-information.  A. Motivation and related work  For fixed block source coding systems, block lengthis an important parameter as it is related to both systemdelay and complexity. Suppose there is a system-levelrequirement that the block error probability Pr ( x  n 1  =ˆ x  n 1 ) be below some constant P  e > 0 . If the distributionsare known, the minimum block length can be calculatedfrom Theorem 1. However, the exact distribution neednot be available to the encoder since that knowledgeis not needed to do uniform binning. Thus, a universalestimate to the error exponent is desirable.A related problem is studied in [1], [9]. They turnthe question around and ask: for non-asymptotic lengthsource coding with side-information, what is the mini-mum rate required to achieve block error P  e assumingthat the distribution is in fact known? A more quanti-tative discussion of the relation between the problem in[1] and our work here is deferred to Section II-A. 1 This is the source coding counterpart of the channel coding resultin Theorem 5.6.4 [7].   B. A universal bound on channel coding exponents In Exercise 5.23 [7], Gallager gives a quadratic lowerbound on the random channel coding error exponentfor a discrete memoryless channel P  ( ·|· ) with outputalphabet size J  . If  Q is the distribution that achievesthe channel capacity C  , then the random coding errorexponent E  cr ( R,Q ) , defined in Theorem 5.6.4 of [7] islower bounded by the following quadratic function of the gap to capacity ( C  − R ) for all R < C  . E  cr ( R,Q ) ≥ ( C  − R ) 2 8 /e 2 + 4[ln J  ] 2 (3)This bound can be further tightened and we give the newresult as a corollary to Lemma 1. This bound is universalin the sense that it only depends on the size of outputalphabet and the gap to capacity and not on the detailedchannel statistics.Following Gallager’s techniques, we derive universalquadratic bounds on the random source coding errorexponent with and without decoder side-information. Forboth cases, the quadratic bounds are only determined bythe gap to entropy and the size of the source alphabet |X| . The results are summarized in Theorem 2. The proof details are in Section III.II. M AIN R ESULTS AND D ISCUSSION Theorem 2: For a memoryless source x  and decoderside information y  , jointly generated iid from p xy  withconditional entropy H  (  p x  | y  ) = h on finite alphabet X ×Y  , the random coding error exponent E  r ( R ) , defined in(1), is lower bounded by a quadratic function, ∀ R ∈ [ H  (  p x  | y  ) , ln |X| ) : E  r ( R ) ≥ G h ( R ) where G h ( R ) =  ( R − h ) 2 2(ln |X| ) 2 if  |X|≥ 3 ( R − h ) 2 2ln2 if  |X| = 2 (4)Because the bound depends only on the gap to entropy,we can also write it as G ( R − h ) . Furthermore, if there isno side-information, the source is from p x  , s.t. H  (  p x  ) = h and the same bound applies.It is interesting to note that this quadratic bound onthe error exponent E  r ( R ) has no dependence on the sizeof the side-information alphabet |Y| .  A. Discussion and Examples For sources with H  (  p x  ) = h , it is easy to see that R − h is always an upper bound to E  r ( R ) and henceTheorem 2 implies: G h ( R ) ≤ min  p x  : H  (  p x  )= h E  r ( R ) ≤ max  p x  : H  (  p x  )= h E  r ( R ) ≤ R − h To illustrate the looseness of our bounds, consider |X| = 3 , and distributions p x  , s.t. H  (  p x  ) = h = 0 . 394 .Since the alphabet size is so small, we can use brute-force optimization to obtain the upper and lower contoursof possible E  r ( R ) . These are plotted along with theuniversal quadratic lower bound G ( R − h ) = ( R − h ) 2 2(ln3) 2 and the linear upper bound R − h in Figure 2.  00.10.20.30.40.50.60.7 Rate R    E  r  r  o  r   E  x  p  o  n  e  n   t  s Universal linear boundUpper contour E r (R)Lower contour on E r (R)Universal quadratic bound H(p x )ln|X| Fig. 2. Plot of the error exponents bounds for a threeletter alphabet. In order from top to bottom: R − H  (  p x  ) , max p x  : H ( p x  )= h E  r ( R ) , min p x  : H ( p x  )= h E  r ( R ) , and G h ( R ) For ML decoding where the decoder knows the dis-tribution, the encoder can pick a block length sufficientto achieve block error probability Pr ( x  n 1  =ˆ x  n 1 ) ≤ P  e ,knowing only the gap to entropy R − h and the sourcealphabet size |X| by choosing n ≥− ln P  e /G ( R − h ) The size of the side-information alphabet |Y| is notneeded at all! If the decoder is also ignorant of the joint distribution, then φ ( n ) in (2) must be taken intoconsideration. n can be chosen by solving: nG ( R − h ) − nφ ( n ) ≥− ln P  e Since nφ ( n ) = |X||Y| ln n , this implies that determining n from our bound requires the encoder to know the side-information alphabet size. At low probabilities of error,the dependence is relatively weak however since the ln n is dominated by the nG ( R − h ) term.In [1], it is shown that, for source coding withside-information, the required rate is H  (  p x  | y  ) + K  ( P  e )   − ln P  e /n + o (   − ln P  e /n ) to achieve block error probability P  e with fixed block length n — wherethe O (   − ln P  e /n ) is called the redundancy rate. Theexact constant K  ( P  e ) is also computed and clearlydepends on the probability distribution of the source. The  converse, proved in [1] for binary symmetric sources,is not universal and moreover, cannot be made so. Asimple counter-example is that given y  = y , suppose x  is uniform on some subset of  S  y ⊂ X  , where | S  y | = K   < |X| for all y ∈ Y  . In this case, therandom coding error exponent E  r ( R ) is a straight line E  r ( R ) = R − ln | K  | . For this example, the redundancyrate needs to be ln P  e /n when using random coding, andcould potentially be zero with some other scheme.Theorem 2 tells us that for block length n , rate R = H  (  p x  | y  ) + K   ( |X| )   − ln P  e /n then the block error is smaller than P  e , no matter what distributionis encountered. While not tight, it does show that aredundancy of  O (   − ln P  e /n ) suffices for universality.III. P ROOF OF T HEOREM 2In this section we prove Theorem 2. First, we need atechnical lemma and definitions of tilted distributions[6].  A. Lemmas and Definitions In this paper we use the following lemma to upperbound a non-concave function f  E ( · ) .  Lemma 1: For constant E  ≥ 0 , write f  E (  ω ) = J   j =1 ω j (ln ω j − E  ) 2 (5)  ω ∈S  J  , where S  J  = {  ω ∈R J  |  k ω k = 1 , and ω j ≥ 0 , ∀  j } is the probability simplex of dimension J  . Thenfor any distribution  ω ∈S  J  , f  E (  ω ) ≤  E  2 + 2 E  (ln J  ) + (ln J  ) 2 if  J  ≥ 3 E  2 + 2 E  (ln2) + T  if  J  = 2 (6)where T  = t 1 (ln t 1 ) 2 + t 2 (ln t 2 ) 2 (7)and t 1 =1 + √ 1 − 4 e − 2 2; t 2 =1 + √ 1 + 4 e − 2 2 T  ≈ 0 . 563 > (ln2) 2 and T < ln2 .The proof is in the appendix. The challenge in the proof lies in the non-concavity of  f  E (  ω ) . In Figure 3, for J  =2 thus  ω = ( x, 1 − x ) and E  = 0 , we plot f  0 (( x, 1 − x )) .The maximum occurs at x = t 1 or t 2 which are definedabove.  Definition 1: Tilted distributions: For a distribution p x  on a finite alphabet X  , ρ ∈ ( − 1 , ∞ ) , we denote the ρ − tilted distribution by p ρ x  , where  p ρ x  ( x ) = p x  ( x ) 11+ ρ  s ∈X   p x  ( s ) 11+ ρ 0 0.5100.10.20.30.40.50.60.7 x    f    0    (   (  x ,   1 −  x   )   )  =  x   (   l  n   x   )    2    +    (   1 −  x   )   l  n    (   1 −  x   )    2 t 1  t 2   Fig. 3. Non concaveness of  f  E (  ω ) defined in (5) , E  = 0 , J  = 2 . For a distribution p xy  on a finite alphabet X × Y  , wedenote x  − y  tilted distribution of  p xy  by ¯  p ρ xy  , ¯  p ρ xy  ( x,y ) =[  s ∈X   p xy  ( s,y ) 11+ ρ ] 1+ ρ  t ∈Y  [  s ∈X   p xy  ( s,t ) 11+ ρ ] 1+ ρ × p xy  ( x,y ) 11+ ρ  s ∈X   p xy  ( s,y ) 11+ ρ Obviously p 0 x  = p x  and ¯  p 0 xy  = p xy  . Write the marginaldistribution of  y  under distribution ¯  p ρ xy  as ¯  p ρ y  and theconditional distribution of  x  given y  under distribution ¯  p ρ xy  as ¯  p ρ x  | y  , then from the definition: ¯  p ρ xy  ( x,y ) = ¯  p ρ y  ( y )¯  p ρ x  | y  ( x | y )¯  p ρ y  ( y ) =[  s ∈X   p xy  ( s,y ) 11+ ρ ] 1+ ρ  t ∈Y  [  s ∈X   p xy  ( s,t ) 11+ ρ ] 1+ ρ ¯  p ρ x  | y  ( x | y ) = p xy  ( x,y ) 11+ ρ  s ∈X   p xy  ( s,y ) 11+ ρ Denote the entropy of  p ρ x  by H  (  p ρ x  ) and the condi-tional entropy of  x  given y  under distribution ¯  p ρ xy  by H  (¯  p ρ x  | y  ) , then H  (¯  p 0 x  | y  ) = H  (  p x  | y  ) . Write H  (¯  p ρ x  | y  = y ) as the conditional entropy of  x  given y  = y , then: H  (¯  p ρ x  | y  = y ) = −  x ¯  p ρ x  | y  ( x | y )ln ¯  p ρ x  | y  ( x | y ) .  B. Proof of the case without side-information As in the solution to Gallager’s problem 5.23 in[7], we use the Taylor expansion for E  0 ( R ) to find aquadratic bound on E  r ( R ) . Proof: From the mean value theorem, we expand E  0 ( ρ ) at 0 , where ρ ≤ 1 , ∃ ρ 1 ∈ [0 ,ρ ] s.t. E  0 ( ρ ) = E  0 (0) + ρE   0 (0) + ρ 2 2 E   0 ( ρ 1 )  From basic calculus, as shown in the appendix of [3], itcan be shown that E   0 ( ρ ) = dE  0 ( ρ ) dρ = H  (  p ρ x  ) (8)and hence E   0 (0) = H  (  p x  ) . Note that E  0 (0) = 0 , and so E  0 ( ρ ) ≤ ρH  (  p x  ) + ρ 2 2 α (9)where α > 0 is any upper bound of  E   0 ( ρ 1 ) that holds, ∀ ρ 1 ∈ [0 , 1] . Substitute (9) into the definition of  E  r ( R ) to get E  r ( R )= max 0 ≤ ρ ≤ 1 ρR − E  0 ( ρ ) ≥ max 0 ≤ ρ ≤ 1 ρR − ρH  (  p x  ) − ρ 2 2 α = max 0 ≤ ρ ≤ 1 − α 2( ρ − R − H  (  p x  ) α ) 2 +( R − H  (  p x  )) 2 2 α =( R − H  (  p x  )) 2 2 α (10)for R − H  (  p x  ) ≤ α . In the last step, we note that ρ = R − H  (  p x  ) α is the maximizer, which is within [0 , 1] . To find α , an upper bound on E   0 ( ρ ) , ∀ ρ ∈ [0 , 1] , we expand E   0 ( ρ ) = dH  (  p ρ x  ) dρ = −  x (1 + ln  p ρ x  ( x )) dp ρ x  ( x ) dρ =  x (1 + ln  p ρ x  ( x ))  p ρ x  ( x )1 + ρ (ln  p ρ x  ( x ) + H  (  p ρ x  ))=11 + ρ  x   p ρ x  ( x )(ln  p ρ x  ( x )) 2 +  p ρ x  ( x )ln  p ρ x  ( x )+  p ρ x  ( x ) H  (  p ρ x  ) +  p ρ x  ( x )  ln  p ρ x  ( x )  H  (  p ρ x  )  =11 + ρ  x  p ρ x  ( x )(ln  p ρ x  ( x )) 2 − H  (  p ρ x  ) 2 1 + ρ (11)Since the last term in (11) is negative 2 and ρ > 0 , bythe definition of  f  E ( · ) in (5), E   0 ( ρ ) ≤  x  p ρ x  ( x )(ln  p ρ x  ( x )) 2 = f  0 (  p ρ x  ) Lemma 1 tells us that E   0 ( ρ ) ≤ α , where α =  (ln |X| ) 2 if  |X|≥ 3ln2 if  |X| = 2 (12)Here we replace the T  from Lemma 1 with a looserupper bound ln2 . Since (ln |X| ) 2 > ln |X| for |X|≥ 3 , 2 This is a loose analysis. For |X| ≥ 3 , the upper bound on thefirst term is achieved when p ρ x  is uniform on X  , giving the maximumat (ln |X| ) 2 as shown in (12). The actual value of (11) is 0 for theuniform distribution. we have R − H  (  p x  ) ≤ α , ∀ R ∈ [ H  (  p x  ) , ln |X| ) .Combining (10) and (12), for the case without side-information the theorem is proved. 3  C. Proof in General The general proof is parallel: Proof: Once again, we expand ¯ E  0 ( ρ ) and basiccalculus as shown in the appendix of [3], reveals that ¯ E   0 ( ρ ) = d ¯ E  0 ( ρ ) dρ = H  (¯  p ρ x  | y  ) (13)and hence ¯ E   0 (0) = H  (  p x  | y  ) . Thus: ¯ E  0 ( ρ ) ≤ ρH  (  p x  | y  ) + ρ 2 2 α (14)where α > 0 is any upper bound of  ¯ E   0 ( ρ 1 ) that holds ∀ ρ 1 ∈ [0 , 1] . Substituting as before shows that E  r ( R ) ≥ max 0 ≤ ρ ≤ 1 ρR − ρH  (  p x  | y  ) − ρ 2 2 α =( R − H  (  p x  | y  )) 2 2 α (15)for R − H  (  p x  | y  ) ≤ α . To find α , an upper bound on ¯ E   0 ( ρ ) , ∀ ρ ∈ [0 , 1] , we expand ¯ E   0 ( ρ ) = dH  (  p ρ x  | y  ) dρ = ddρ  y ∈Y  ¯  p ρ y  ( y ) H  (¯  p ρ x  | y  = y )=  y ∈Y  ¯  p ρ y  ( y ) dH  (¯  p ρ x  | y  = y ) dρ +  y ∈Y  d ¯  p ρ y  ( y ) dρH  (¯  p ρ x  | y  = y ) (16)By basic calculus 4 , we have: dH  (¯  p ρ x  | y  = y ) dρ =11 + ρ  x ¯  p ρ x  | y  ( x | y )(ln ¯  p ρ x  | y  ( x | y )) 2 − 11 + ρ  H  (¯  p ρ x  | y  = y )  2 (17)and,  y ∈Y  d ¯  p ρ y  ( y ) dρH  (¯  p ρ x  | y  = y )=  y ¯  p ρ y  ( y ) H  (¯  p ρ x  | y  = y ) 2 − H  (¯  p ρ x  | y  ) 2 (18) 3 Although the upper bound on E   0 ( ρ ) is not tight as we dropthe negative term in (11), it has the right order on |X| . For adistribution p = { 12 , 12( |X|− 1) ,..., 12( |X|− 1) } , the evaluation of (11)is ∼ 14 (ln |X| ) 2 for large |X| , thus the upper bound in (12) of  (ln |X| ) 2 has the right order. 4 The tedious details of the derivation are in the proofs of Lemma 10and Lemma 11, in the appendix of [3].  Substituting (17) and (18) in (16), we have ¯ E   0 ( ρ )=11 + ρ  y ¯  p ρ y  ( y )[  x ¯  p ρ x  | y  ( x | y )(ln ¯  p ρ x  | y  ( x | y )) 2 ] − 11 + ρ  y ¯  p ρ y  ( y ) H  (¯  p ρ x  | y  = y ) 2 +  y ¯  p ρ y  ( y ) H  (¯  p ρ x  | y  = y ) 2 − H  (¯  p ρ x  | y  ) 2 =11 + ρ  y ¯  p ρ y  ( y )[  x ¯  p ρ x  | y  ( x | y )(ln ¯  p ρ x  | y  ( x | y )) 2 ]+ ρ 1 + ρ  y ¯  p ρ y  ( y ) H  (¯  p ρ x  | y  = y ) 2 − H  (¯  p ρ x  | y  ) 2 (19)Since  x ¯  p ρ x  | y  ( x | y ) = 1 for any y ∈ Y  , Lemma 1 tellsus,  x ¯  p ρ x  | y  ( x | y )(ln ¯  p ρ x  | y  ( x | y )) 2 ≤ α (20)where α =  (ln |X| ) 2 if  |X|≥ 3ln2 if  |X| = 2 It is clear that: H  (¯  p ρ x  | y  = y ) 2 ≤ (ln |X| ) 2 ≤ α, ∀ y (21)Substituting (20) and (21) in (19) and dropping the lastterm in (19) which is negative, we have ¯ E   0 ( ρ ) ≤ 11 + ρ  y ¯  p ρ y  ( y ) α + ρ 1 + ρ  y ¯  p ρ y  ( y ) α = α (22)Since (ln |X| ) 2 > ln |X| for |X| ≥ 3 , we have R − H  (  p x  | y  ) ≤ α , ∀ R ∈ [ H  (  p x  | y  ) , ln |X| ) . Combining (15)and (22), the general theorem is proved.  IV. C ONCLUSIONS AND F UTURE W ORK In this paper we have derived a universal lower boundto random source coding error exponents. This boundhas the quadratic form a ( R − h ) 2 , where a , determiningthe shape of the quadratic function, is determined by thesize of the source alphabet, and R − h is the excess ratebeyond the relevant entropy. It quantifies the intuitiveidea that driving the probability of error to zero comesat the cost of either greater rate or longer block-lengths.These results are the source coding counterparts to thequadratic bounds on channel coding error exponents inExercise 5.23 of [7], which can also be tightened slightlyby using Lemma 1 as shown in [4]. Interestingly, theside-information alphabet size plays no role in the bound.Numerical investigation reveals that this bound isloose and so it remains an open problem to see if itcan be tightened while still maintaining an easy closed-form expression. This will involve solving the non-concave maximization problem in (11) exactly instead of dropping the negative term. We also suspect that similaruniversal bounds exist for all sorts of error exponents.It would be interesting to find a unified treatment thatcould also give a universal bound on the error exponentfor lossy source coding investigated in [8].A PPENDIX  A. Proof of Lemma 1Proof: We prove Lemma 1 by solving the followingmaximization problem for f  E (  ω ) with constraint  ω ∈S  J  . max  ω ∈S J f  E (  ω ) = max  ω ∈S J J   j =1 ω j (ln ω j − E  ) 2 We have one equality constraint  J j =1 ω j = 1 and J  inequality constraints, ω j ≥ 0 , ∀  j = 1 , 2 ,...,J  ,for the maximization problem. Note that f  E (  ω ) is abounded differentiable function and S  J  is a compact setin R J  . Thus, there exists a point in S  J  , to maximizeit. We examine the necessary conditions for a point  ω ∗ , in S  J  , to maximize f  E (  ω ) . By the Karush-Kuhn-Tucker necessary conditions [2], there exist γ  j ≥ 0 ,  j = 1 , 2 ,...,J  and λ ≥ 0 , s.t.  f  E (  ω ∗ ) + J   j =1 γ  j  ω ∗ j + λ  J   j =1 ω ∗ j = 0 γ  j ω ∗ j = 0 , ∀  j = 1 , 2 ,...,J  ; and J   j =1 ω ∗ j = 1 That is, (ln ω ∗ j ) 2 + 2(1 − E  )ln ω ∗ j + γ  j + λ − 2 E  = 0 γ  j ω ∗ j = 0 , ∀  j = 1 , 2 ,...,J  ; and J   j =1 ω ∗ j = 1 Note that ∂f  E (  ω ) ∂ω j | ω j =0 = (ln ω j ) 2 + 2(1 − E  )ln ω j | ω j =0 = ∞ ∂f  E (  ω ) ∂ω j | ω j  =0 = (ln ω j ) 2 + 2(1 − E  )ln ω j | ω j  =0 < ∞ and thus to maximize f  E (  ω ) , the ω ∗ j are strictly positive.Hence γ  j = 0 , (ln ω ∗ j ) 2 + 2(1 − E  )ln ω ∗ j + λ − 2 E  = 0 , ∀  j and J   j =1 ω ∗ j = 1 Since ln ω ∗ j is a root of a quadratic equation x 2 +2(1 − E  ) x + λ − 2 E  = 0 , this implies ω ∗ j can only be either
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks