School Work

Binarising Camera Images for OCR

Binrising Cmer Imges for CR Muritius Seeger nd Christopher Dne Xerox Reserh Centre Europe 61 Regent Street, Cmridge CB2 1AB, United Kingdom Astrt In this pper we desrie
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Binrising Cmer Imges for CR Muritius Seeger nd Christopher Dne Xerox Reserh Centre Europe 61 Regent Street, Cmridge CB2 1AB, United Kingdom Astrt In this pper we desrie new inristion method designed speifilly for CR of low qulity mer imges: Bkground Surfe Thresholding or BST. This method is roust to lighting vritions nd produes imges with very little noise nd onsistent stroke width. BST omputes surfe of kground intensities t every point in the imge nd performs dptive thresholding sed on this result. The surfe is estimted y identifying regions of lowresolution text nd interpolting neighouring kground intensities into these regions. The finl threshold is omintion of this surfe nd glol offset. Aording to our evlution BST produes onsiderly fewer CR errors thn Nilk s lol verge method while lso eing more runtime effiient. 1. Introdution ur reserh hs een motivted y the onveniene of using digitl video mers s opposed to onventionl snning devies. Cmers oupy little spe on user s desk, provide exellent feedk for lignment, pture instntly nd llow douments to e snned fe up. However, sine mers quire imges under less onstrined onditions thn devies speifilly designed for high-qulity doument pture, they n introdue severe imge vritions nd degrdtions. This mkes it espeilly hrd to otin relile CR result from these imges. Hene our im ws to design inristion lgorithm speifilly for CR of mer imges. In order to yield eptle error rtes in onjuntion with off-the-shelf CR softwre, this method must perform well in the presene of degrdtions suh s low-resolution, lighting vritions, lur, noise nd ompression rtefts. Speifilly, it should work roustly with imges t resolution of dpi in ny lighting ondition nd with miniml omputtionl overhed. In engineering our method we therefore deided to mesure nd design for two riteri whih re diretly relted to the usility of our mer snning system: CR error rtes nd runtime effiieny. We hve found tht glol thresholding methods typilly designed for imges quired on flted snners re unsuitle for mer imges [3, 4, 8], minly due to the presene of lighting vritions nd lur. Although lol dptive lgorithms yield onsiderly etter results [6],we hve found tht lol verge methods suh s Nilk s method [2], whih is often quoted s one of the est dptive lgorithms, tend to rek down in the presene of lrge homogeneous res nd hene require post-proessing [7]. Ynowitz nd Brukstein s method [9], whih hs een shown [6] to perform lmost s well s Nilk s method, derives threshold surfe y extrting nd interpolting from res identified s hrter oundries. However, it lso requires post proessing step nd is not prtiulrly runtime effiient due to its itertive interpoltion sheme. Furthermore, this method results in noisy threshold vlues sine pixels lying on hrter oundries hve prtiulrly vrile grey vlues [5]. This pper presents simple ut novel dptive threshold lgorithm tht hieves onsiderly etter CR performne thn Nilk s method, while eing more runtime effiient. This method is lled kground surfe thresholding or BST. As the nme suggests, this lgorithm determines the kground intensity t every pixel in order to derive suitle threshold surfe. In the following setion we present n overview of BST. We then give more detiled desription of the lgorithm, nd finlly present the results of omprison etween BST nd Nilk s method. 2. BST Method BST n oneptully e divided into the following prts: Lelling of text res t low-resolution Greysle imge Blok vrine Blok verge Remove text loks from verge y thresholding vrine Interpolte kground Binry imge Threshold Find offset Upsmple Figure 1. utline of the BST lgorithm Estimtion of kground intensity in text res y interpoltion Performing thresholding t the kground plus some offset. The initil segmenttion etween fore- nd kground relies on the ssumption tht pge illumintion is slowly vrying. More speifilly, tht the sle of kground vritions is lrger thn tht of foreground vritions, i.e. trnsitions suh s hrter edges. This is mostly oserved in prtie: the vrine of grey levels in smll neighourhood of pixels is lrger in res ontining text, thn in kground regions. Hene using mesure of vrine nd suitle threshold for this, our lgorithm is le to distinguish etween fore- nd kground. In the next stge, the kground in res ontining text is estimted y liner interpoltion of surrounding kground intensities. To otin good results, it is importnt to void filling foreground regions with inorretly lelled kground. Hene our priority ws to onservtively lel loks s foreground even if they ontin little or no text, sine given the nture of the slowly vrying illumintion, kground estimtion is muh more roust to mislelling of this kind. The imge is inrised in third stge. Pixels re lelled s fore- or kground given threshold whih is the sum of the kground surfe nd glol offset. This offset is proportionl to the verge distne etween the forend kground surfe. Figure 1 outlines the min omputtionl steps of BST. Exmples of intermedite results re shown in figure Text Lelling t Low Resolution Bsed on the ssumption tht pixels in window ontining text hve higher vrine thn kground regions, we hve designed vrine test to lel pixels s foreground. imge, We ompute men,, nd vrine, s shown in figure 2, using djent y pixel loks (where ). An ssumption here is tht is lrger tht the hrter stroke width, sine we nnot expet higher vrine in homogeneous foreground regions, thn Figure 2. Intermedite results: () lok verge; () lok vrine (high vrine shown y right res); () lok verge with high vrine res removed; (d) missing res interpolted to yield ontinuous kground estimte. riginl input: ; size fter pre-proessing: ; size of intermedite results: in kground res. n the other hnd, if the lok width is too lrge spes etween text lines my not e deteted orretly or kground regions with rpid lighting vritions my e lssified s foreground. For 12 pt text t 120 dpi, good results re otined for lok sizes of etween 7 nd 19, nd results re given here for. Ares of text re initilly identified y thresholding the vrine imge t the lol verge vrine, d omputed using n y window of lok vrines, where! # . We hve found this method to e more roust thn ttempting to detet text regions from verge intensity informtion. At this stge we lso estimte glol mesure of the vrine due to noise in kground regions. This step is ruil in mking the method roust to imges quired un-, 2 F 7 F Frequeny ptiml CR threshold Split hrters Merged hrters nd noise Grey vlue Figure 3. Idelised histogrm of mer imge of text der different onditions of lighting, ontrst, mer noise nd lur. Using n initil guess of $ %'&()+*, we refine the kground vrine in first pss using the following vrine threshold surfe:,.-/ $5%8& where 09;:= yields the est CR results ording to our experiments, nd the method remins roust in the rnge By verging the vrine of ll pixels BAC,.-/ for whih (1), we otin refined estimte of 1$ %'&. In seond pss using eqution 1 nd the updted $5%'&, we remove foreground pixels from the lok-verge imge,, s shown in figure Interpoltion The kground intensity for the removed pixels is estimted y interpolting from neighouring pixels. This yields ontinuous kground surfe, D, s shown in figure 2 d. For effiieny we perform 1-D liner interpolte etween the losest two neighouring points long rows nd olumns seprtely nd omine the results lter. In ses where missing points re not situted etween two known kground vlues, the interpolte is set equl to the nerest kground vlue. Row nd olumn results re omined y seleting vlues from the most urte interpolte. This ury is mesured using the distne in pixels etween the losest know kground vlue nd the position of the interpolted pixel. For eh pixel we selet the interpolte tht minimises this distne. The omined results re then smoothed y using ox lur of size 5 to redue rtefts reted during interpoltion nd then ilinerly upsmpled y ftor of E to mth the resolution of the originl imge Thresholding In finl step we threshold the originl grey level imge, F. Figure 3 illustrtes n idelised histogrm typil of mer imge of text in the sene of lighting vritions. Usully, it is not imodl for resolutions elow 200 dpi, even though imodlity is ssumed y mny glol thresholding methods [4]. The est hoie of threshold is trdeoff of the following risks: Split hrters if the threshold is too low Merged hrters nd noise if the threshold is too high. Typilly we find tht CR engines suh s SnSoft TextBridge re most sensitive to split hrters nd kground noise. Hene, s shown y figure 3, the est hoie of threshold is somewhere just elow the kground pek. In typil mer imges of douments, most of the lighting vrition experiened is of diffuse nture, nd hene oserved pixel vlues G might e desried s produt etween n inident illuminnt H nd suitly normlised underlying imge I, with dditive sensor noise J, GKLHMI JN (2) This formul suggests tht the threshold e seleted s some multiple of the kground surfe. However, our experiene is tht thresholding t n offset from the kground gives etter performne in prtie. This might e euse the threshold is lrgely influened y the noise J whose vrine does not hnge muh with the illuminnt. Use of n offset lso ppers more roust in situtions where n unknown gmm orretion hs een pplied to the imge y mer. The threshold surfe is weighted sum of the previously determined kground, D nd glol offset, , whih is the verge distne etween the foreground nd kground: where ] PRQTSU VWYXZ E_ \ D 4 5?[D \ ]^\ ` The threshold surfe is then given y:,m D 4 5?[d (3) (4) The ftor, d, determines the thikness of hrter strokes nd thus lso the mount y whih hrters re either merged or split. We present results for d[ e, nd find tht these results remin stle in the rnge *. 3. Pre-proessing We hve een le to drmtilly improve CR results y pre-proessing imges efore inristion [5]. Imges 3 i A ` Figure 4. Qulity improvement of text imges gined y pre-proessing () originl imge () BST without pre-proessing () BST fter pre-proessing re delurred using simple shrpening or high frequeny oost filter [1] nd upsmpled iuilly y ftor of three. This enhnes the high frequenies nd llows wht we ll inry super-resolution [5]: trding of grey sle intensity resolution for sptil resolution. Figure 4 illustrtes the dvntge gined from this type of pre-proessing. 4. Results We hve ompred the performne of BST with Nilk s lol verge thresholding method [2]. The Nilk method opertes on the following threshold surfe:,m Bi.j 3 4 (5) 3 4 where nd re the lol men nd vrine respetively omputed using moving window of size. A previous omprison of inristion methods y Trier nd Jin [6] onluded tht Nilk is the est performer when the gol is hrter reognition. However, s shown in figure 5, we found tht Nilk produes noisy results when used with the reommended window size of k le for our 3x up-smpled imges) nd inonsistent stroke width when used with lrger windows, using some hrters to merge nd others to split. More importntly, n evlution of CR performne shows tht Nilk inrised imges produe signifintly higher hrter error rtes. We hve lso found tht the runtime of n effiient implementtion of Nilk is roughly twie tht of BST. The runtime exluding pre-proessing on 700MHz PC is 0.5 Figure 5. BST vs. Nilk () riginl grey level imge of 12 pt Times New Romn t 120 dpi; () Nilk w=45, k=-0.4; () w=200, k=-1; (d) BST seonds for BST v.s seonds for Nilk, when inrising imge. Pre-proessing (3 upsmpling nd shrpening of imge) n e ompleted in 0.5 seonds. We ompred the CR performne of BST nd Nilk using 17 imges of A6 portions of mgzine rtiles, newspper rtiles nd offie douments, with pt text quired with Philips Vest Pro video-onferening mer t dpi. SnSoft TextBridge ws employed for CR nd the hrter error rtes were omputed s the Levenstein (string edit) distne etween the output nd the mnully derived ground truth. To hieve fir omprison, we fine-tuned the performne of Nilk s method y hoosing prmeters nd nd the prmeters of our pre-proessing step to minimise the verge CR error rte. As shown in figure 6, Nilk performs signifintly etter with lrge window nd in the est se hieves n verge hrter error rte of 3.1% ompred to 2.3% for BST. The error rtes oserved were highly vrile from imge to imge, hene the signifine of the verge error rte is questionle here. Indeed, in two ses the Nilk method produed more thn 10% errors. Nonetheless for ll ut one of the imges, the error rte for BST ws less thn tht for the Nilk method. Error rtes with Nilk were espeilly high for smll windows ( #:: ) due to the lrge mount of noise in kground regions. For lrger windows ( #:: ) d 4 CR error rte (%) k = -1.5 k = -1 k = Window size (pixels) [4] P. K. Shoo, S. Soltni, A. K. C. Wong, nd Y. C. Chen. A survey of thresholding tehniques. Computer Vision, Grphis, nd Imge Proessing, 41: , [5] M. J. Tylor nd C. R. Dne. Enhnement of doument imges from mers. In SPIE Conferene on Doument Reognition V, volume 3305, pges SPIE, Septemer [6] Ø. D. Trier nd A. K. Jin. Gol-direted evlution of inriztion methods. IEEE Trnstions on Pttern Anlysis nd Mhine Intelligene, 17(12): , [7] J. S. Vlverde nd R. Grigt. ptiml inristion of tehnil doument imges. In ICIP 2000 Proeedings, pges IEEE, Septemer [8] H. Yn. Unified formultion of lss of imge thresholding tehniques. Pttern Reognition, 29(12): , [9] S. D. Ynowitz nd A. M. Brukstein. A new method for imge segmenttion. Computer Vision, Grphis nd Imge Proessing, 46(1):82 95, Figure 6. Chrter error rtes hieved with Nilk s method for different window sizes w, nd vrine gins k. Best performne: 3.1% for w=800, k=-1 the noise is redued, ut Nilk eomes less roust to lighting vritions, sine it then effetively ehves like glol threshold, nd lso less roust to lrge hnges in the mount of foreground in the window. 5. Conlusion ur omprison suggest tht BST performs etter inristion of mer imges for CR thn Nilk s method. In ddition, the BST implementtion n e more runtime effiient. We feel tht the poor CR performne of Nilk is in prtiulr due to noisy kground regions produed when using smll windows nd inonsistent stroke width when using lrger windows. Sine BST uses threshold tht is glolly offset from the kground, it is muh less suseptile to mislssifition of lrge homogeneous regions. Also, sine threshold levels in BST re not lolly relted to neighouring fetures it produes hrters with more onsistent stroke width. Referenes [1] A. K. Jin. Fundmentls of Digitl Imge Proessing. Prentie Hll, Englewood Cliffs, [2] W. Nilk. An Introdution to Digitl Imge Proessing. Prentie Hll, Englewood Cliffs, [3] L. Gormn. Binriztion nd multithresholding of doument imges using onnetivity. CVGIP: Grphil Models nd Imge Proessing, 56(6): ,
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks