A general framework for the evaluation of symbol recognition methods

A general framework for the evaluation of symbol recognition methods
of 16
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  IJDAR (2007) 9:59–74DOI 10.1007/s10032-006-0033-x ORIGINAL PAPER A general framework for the evaluation of symbol recognitionmethods E. Valveny  ·  P. Dosch  ·  Adam Winstanley  · Yu Zhou  ·  Su Yang  ·  Luo Yan  ·  Liu Wenyin  · Dave Elliman  ·  Mathieu Delalandre  ·  Eric Trupin  · Sébastien Adam  ·  Jean-Marc Ogier Received: 1 April 2005 / Accepted: 22 September 2006 / Published online: 18 November 2006© Springer-Verlag 2006 Abstract  Performanceevaluationisreceivingincreas-inginterestingraphicsrecognition.Inthispaper,wedis-cusssomequestionsregardingthedefinitionofageneralframework for evaluation of symbol recognition meth-ods. The discussion is centered on three key elementsin performance evaluation: test data, evaluation metricsand protocols of evaluation. As a result of this discus-sion we state some general principles to be taken intoaccount for the definition of such a framework. Finally,we describe the application of this framework to theorganization of the first contest on symbol recognitionin GREC’03, along with the results obtained by the par-ticipants. Keywords  Performance evaluation  ·  Symbolrecognition E. Valveny ( B )Centre de Visió per Computador, Edifici O, Campus UAB,Bellaterra (Cerdanyola), 08193 Barcelona, Spaine-mail: ernest@cvc.uab.esP. DoschLORIA, 615, rue du jardin botanique, B.P. 101,54602 Villers-lès-Nancy Cedex, Francee-mail: Philippe.Dosch@loria.frA. Winstanley  ·  Y. ZhouNational University of Ireland, Maynooth,County Kildare, Irelande-mail: adam.winstanley@nuim.ieY. Zhoue-mail: yuzhou@cs.nuim.ieS. YangDepartment of Computer Science and Engineering,Fudan University, Shanghai 200433, Chinae-mail: suyang@fudan.edu.cn 1 Introduction Performance evaluation has become an importantresearch interest in pattern recognition during the lastyears. As the number of methods increases there isa need for standard protocols to compare and evalu-ate all these methods. The goal of evaluation shouldbe to establish a solid knowledge of the state of theart in a given research problem, i.e., to determine theweaknesses and strengths of the proposed methods ona common and general set of input data. Performanceevaluation should allow the selection of the best-suitedmethod for a given application of the methodology un-der evaluation. L. Yan  ·  L. WenyinDepartment of Computer Science,City University of Hong Kong, Honk Kong, Chinae-mail: luoyan@cs.cityu.edu.hkL. Wenyine-mail: csliuwy@cityu.edu.hkD. EllimanUniversity of Nottingham, Nottingham, UKe-mail: dge@cs.nott.ac.ukE. Trupin  ·  S. AdamLITIS Laboratory, Rouen University, Rouen, Francee-mail: Sebastien.Adam@univ-rouen.frM. Delalandre  ·  J.-M. OgierL3i Laboratory, La Rochelle University, Rochelle, Francee-mail: mathieu.delalandre@univ-lr.frJ.-M. Ogiere-mail: jean-marc.ogier@univ-lr.fr  60 E. Valveny et al. Following these criteria, image databases have beencollected and performance metrics have been proposedfor several domains and applications [6,12,18,21,29]. Several of these works deal with the evaluation of pro-cesses involved in document analysis systems, such asthinning [13], page segmentation [2], OCR [28], vec- torization [22,26,27] or symbol recognition [1], among others. In fact, the general performance evaluationframework proposed in this paper is based on the workcarried out for the contest on symbol recognition orga-nized during GREC’03 [25].Although in any domain there are always some spe-cific constraints, we can identify three main issues thatmust be taken into account in the definition of anyframeworkforperformanceevaluation:acommondata-set,standardevaluationmetricsandaprotocoltohandlethe evaluation process. The common dataset should beas general as possible, including all kinds of variabil-ity that could be found in real data. It must contain alarge number of images, each of them annotated with itscorresponding ground-truth. Metrics must be objective,quantitative and accepted by the research communityas a good estimate of the real performance. They musthelp to determine the weaknesses and strengths of eachmethod.Inmanycases,itisnotpossibletodefineasinglemetric, but several metrics have to be defined accordingto different evaluation goals. The protocol must definethe set of rules and formats required to run the evalua-tion process.In this paper, we propose a general framework forperformanceevaluationofsymbolrecognition.Foreachof these issues (data, metrics and protocol), we describethemainproblemsanddifficultiesthatwemustfaceandwestatethegeneralguidelinesthatwehavefollowedforthe development of such a framework. Finally, we showhowwehaveappliedthisframeworktotheorganizationof the GREC’03 contest on symbol recognition.Symbol recognition is one of the main tasks in manygraphics recognition systems. Symbols are key elementsin all kinds of graphic documents, as they usually con-veyaparticularmeaninginthecontextoftheapplicationdomain.Therefore,identifyingandrecognizingthesym-bols in a drawing is essential for its analysis and inter-pretationandagreatvarietyofmethodsandapproacheshave been developed (see some of the surveys on sym-bolrecognition[5,8,17]togetanoverviewofthecurrent state of the art).In fact, symbol recognition could be regarded as aparticular case of shape recognition. However, thereare some specific issues that should be taken into ac-countinthedefinitionofanevaluationframework.First,symbolrecognitionisnotastand-aloneprocess.Usually,it is embedded in a whole graphics recognition systemwherethefinalgoalisnotonlytorecognizeperfectlyseg-mented images of symbols, but to  recognize and localize thesymbolsinthewholedocument.Sometimessegmen-tation and recognition are completely independent pro-cesses, but sometimes they are related and performedin a single step. For evaluation, that means that wemust consider two different sub-problems: recognitionof segmented images of symbols and localization andrecognition of symbols in a non-segmented image of adocument. These two different sub-problems will be re-ferred to as  symbol recognition  and  symbol localization ,respectively, throughout the paper. Second, sometimes,symbolrecognitiondependsonothertasksinthegraph-ics recognition chain (for example, binarization or vec-torization).Theperformanceoftheseprocessescanalsoinfluence the performance of symbol recognition. Weshould try to make the evaluation of symbol recognitionindependent of these other tasks. At least, the analy-sis of the results should be made taking into accounttheir influence. Third, symbol recognition is applied to awide variety of domains (architecture, electronics, engi-neering,flowcharts,geographicmaps,music,etc.).Somemethods have been designed to work only in some of these domains and have been only tested using veryspecific data.Finally, if the goal of performance evaluation is tohelp to determine the current state-of-art of research,then, any proposal should give response to the needs of the whole research community and should be acceptedby it. Therefore, in our proposal, a key point is the ideaof collaborative framework. The initial proposal mustbe validated by the users and must be easily extendedas research advances and new needs or requirementsappear.Thus,ourproposalreliesonfourdesirableprop-erties: •  public availability of data, ground-truth and metrics •  adaptability to user needs: each person must be ableto select a subset of the framework to work with •  extensibilitytheframeworkmustallowfornewkindsof images or metrics to be easily added •  collaborativevalidationofdata,metricsandground-truth.The paper is organized as follows: Sects. 2 and 3 are devoted to discuss each of the main aspects inperformance evaluation, data and evaluation metrics,respectively. In Sect. 4 we describe the protocol andimplementation issues of the framework. In Sect. 5 weshow the application of this framework to the GREC’03contest. Finally, in Sect. 6 we state the main conclusionsand discuss the future work.  A general framework for the evaluation of symbol recognition methods 61 2 Data One of the key issues in any performance evaluationscheme is the definition of a common set of test data.Running all methods on this common set will permitto obtain comparable results. This set should be generic,large, and should contain all kinds of variability of real data.In symbol recognition, generality means including alldifferent kindsof symbols,i.e.,symbolsfromall applica-tions (architecture, electronics, engineering, flowcharts,geographicmaps,music,etc.)andsymbolscontainingalltypes of features or primitives (lines, arcs, dashed-lines,solid regions, compound symbols, etc.). In this way, wewill be able to evaluate the ability of recognition meth-ods to work properly in any application.On the other hand, variability can be srcinated bymultiple sources: acquisition, degradation or manipu-lation of the document, handwriting, etc. All of themshouldbetakenintoaccount,whencollectingtestdatainordertoevaluatetherobustnessofrecognitionmethods.However, in symbol recognition many methods arespecifically designed for a particular application or aparticular kind of symbols under specific constraints.Therefore,itisnotpossibletodefineasingledatasetcon-taining all kinds of images. Then, following the generalprinciple of adaptability, stated in the previous section,we propose to define several datasets, instead of a singleone. Each dataset will be labeled according to the kindof images contained in it. In this way, users can selectthedatasetstheywanttouseaccordingtothepropertiesof their method. In addition, we can generate as manydatasetsasrequired,combining allkinds ofsymbols andcriteria of variability.Therefore, we need to establish some criteria to clas-sify and organize all kinds of symbols (Sect. 2.1). Then,we must also identify and categorize all kinds of vari-ability of real images (Sect. 2.2). Finally, we will be ableto discuss how to collect and generate a large amountof data and organize it according to these criteria of classification (Sect. 2.3).2.1 Classification of symbolsIn general, there are two points of view for classify-ing evaluation tests and their associated data [9]: tech-nological and application. The technological point of view refers to the evaluation of methods as stand-aloneprocesses trying to measure their response to varyingmethodological properties of input data and executionparameters. Datasets must be independent of the appli-cationandmustdifferonthekindofimagefeatures.Forsymbolrecognitionthispointofviewcorrespondstothegeneric evaluation of performance independently of theapplication domain. Image features will be the differ-ent shape primitives that can be found in the symbols.According to the data used in the contest, we have iden-tifiedthreeshapeprimitives:straightlines,arcsandsolidregions. However, new primitives (for example, dashedlines, text, textured areas) could be added to the datasetif required.On the other hand, the application point of viewrefers to the evaluation of methods in a particular appli-cation scenario. Different datasets will correspond todifferent application domains of a given method, andeach dataset will only include specific data for the givenapplication. In symbol recognition, categories refer tothe different domains of application: architecture, elec-tronics,geographicmaps,engineeringdrawingsorwhat-ever domain we should consider.We have used this double criteria to classify symbolsin our framework. The support for it is that algorithmsare usually designed using these two points of view too.Some methods are intended to be as general as possible,and work well with symbols in a wide range of appli-cations. On the other hand, some other methods areintended to be part of a complete chain of a graphicsrecognition system in a particular application domain.They are specifically designed to recognize the symbolsin that application.These are the two main criteria for classifying testdata. But from a more general viewpoint, we can uselabels corresponding to property/value pairs. The prop-erty can refer to the application domain, primitives, ori-gin,etc.,whilevaluesareoccurrencesoftheseproperties(respectively, architecture/electronic/ . . . , segments/arcsandsegments/ . . . ,CADdesign/sketch/ . . . ).Thisprovidesa general labeling system which can be easily extended,allowing to define as much data as needed.Therefore, we will assign at least two categories of labels to each symbol: one with the domain of the sym-bolandtheotherwiththesetofprimitivescomposingit.Each dataset is also labeled in the same way accordingtothesymbolsincludedinit.Withthisorganizationeachuser can select those datasets that fit the features of themethodunderevaluation.Inaddition,newcategoriesof data can be easily added or modified and therefore, theframework can evolve according to research needs. InFig. 1 we can see several examples of images classifiedaccording to both points of view. Note that each symbolcan be included in several categories.2.2 Variability of symbol imagesRobustness to image degradation is essential for thedevelopment of generic algorithms. Then, a framework  62 E. Valveny et al. Fig. 1  Classification of the same images according to the twopoints of view:  a  technological,  b  application for performance evaluation must include all kinds of degradation in the test data. Besides, images should beranked according to the degree of degradation in orderto be able to determine whether the performance de-creases as the difficulty of images increases.In general, we can distinguish four sources of vari-ability in symbol recognition: •  acquisition parameters: acquisition device (scanner,camera or online device) and acquisition resolution •  globaltransformations:globalskewofthedocument,rotation and scaling of symbols •  binary noise: degradation of old documents, photo-copies, faxes and binarization errors. •  Shape transformations: missing or extra primitives(due to segmentation errors) and shape deforma-tions due to hand-drawing.We need to guarantee that all these types of degrada-tions are included in the common dataset. We will gen-erate different datasets corresponding to each kind anddegree of transformation and to selected combinationsof them. Each dataset will be labeled accordingly too.2.3 Generation of test dataAccording to the principles stated in previous sectionswe need to collect a large number of images. These im-ages will be organized into several datasets, includingall kinds of symbols described in Sect. 2.1 and all typesof variability identified in Sect. 2.2. In addition, imagesmust be labeled with the ground-truth, i.e., the expectedresult. We have to collect segmented images of isolatedsymbols, but also non-segmented images of documentsinordertoevaluatebothsymbolrecognitionandsymbollocalization, as stated in Sect. 1.There are basically two possibilities for collecting testdata: to use real data or to generate synthetic data. Inthe following of this section, first, we will discuss theadvantages and drawbacks of each approach and howwe use them in our framework. Then, we will considersome other specific issues related to the generation of data for evaluation of symbol recognition.  2.3.1 Real data Clearly, the main advantage of using real data is that itpermits to evaluate the algorithms with the same kindof images as for real applications. Then, evaluation willbe a very good estimate of performance in real situa-tions. However, manually collecting a large number of real images is a great effort, unaffordable in many cases.The task of annotating images with their correspondingground-truthisalsotime-consuming,anderrorscaneas-ily be introduced. Another disadvantage is the difficultyof collecting images with all kinds of transformationsand noise. Besides, it is not easy to quantify the degreeof noise in a real image. Then, it is not possible to definea ranking of difficulty of images according to the degreeof noise.  2.3.2 Synthetic data As an alternative, we can develop automatic methods togenerate synthetic data. Clearly, the main advantage isthat it allows to generate as many images as necessary,and the annotation of images with the ground-truth isalso automatic. Then, manual effort is reduced. How-ever, we need to devote research effort to the develop-ment of models and methods able to generate imagesresembling real ones with all possibilities of noise andtransformations. This is not an straightforward task inmany cases although several works have been done inrelated fields of document analysis [3,11,15,16]. Images generated using these methods will be easily classifiedaccording to the type and degree of noise or degrada-tion applied, permitting to assess the reduction in per-formance with increasing degrees of image degradation.Wearguethatbothtypesofimagesareusefulinagen-eral framework for performance evaluation of symbolrecognition.Webelievethatrealimagesarethebesttestfor assessing performance in symbol localization. It isreallydifficulttodevelopautomaticmethodstogeneratenon-segmented images of complete graphic documents.Besides, as we can find many symbols in a single graphicdocument, not many images are required. The problemcan be the annotation of images with the ground-truth.We discuss it in Sect. 3.3.On the other hand, synthetic images are the only wayto perform evaluation tests with large sets of segmentedimages taking into account all degrees of degradationand variation. In this case, many images are requiredand it is easier to develop methods for their genera-tion. In our framework we have developed methods forthe generation of global transformations, binary noise(based on Kanungo’s method [15] and shape transfor-mation (based on active shape models [25]).  A general framework for the evaluation of symbol recognition methods 63 Fig. 2  Generation of data:  a  synthetic images,  b  real images Figure 2 shows both synthetic and real images forsymbol recognition.  2.3.3 Specific issues In addition, we have to take into account two otherspecific issues of symbol recognition when generatingtest data. •  Relationtovectorization :AsexplainedinSect.1sym-bolrecognitionissimplyonetaskinthegraphicsrec-ognition chain. Vectorization is usually performedas a previous step for recognition and then, manysymbol recognition methods work directly on thevectorial representation of the image. The problemis that, although there is not an optimal vectoriza-tionmethod,theresultofvectorizationcaninfluencethe performance of recognition. Then, apart from araster representation of images, we must also pro-vide images in a common vectorial format so that allmethodscanusethesamevectorialdataandrecogni-tionresultsarenotinfluencedbytheselectedvector-izationmethod.Forimagesthatcanbeautomaticallygeneratedinvectorialformat,wecanprovideimagesin their ideal vectorial representation, without needfor applying any vectorization method. If not pos-sible (for example, for real images of for syntheticimages with binary degradations), we should applydifferent standard vectorization methods to the ras-ter image. •  The problem of scalability : One of the problems insymbol recognition [17] concerns scalability: manymethods work well with a limited number of sym-bol models, but their performance decrease whenthe number of symbols is very large (hundreds orthousands of symbols). One of the goals of the eval-uation of symbol recognition must be to assess therobustness of methods with a large number of sym-bols. Then, for each kind of test several datasets withan increasing number of symbols will be generated. 3 Performance evaluation 3.1 ObjectivesInsomepatternrecognitionfields,themaingoalofeval-uation is the definition of a global measure that per-mits to determine the “best” method on a standard andcommon dataset. However, it seems difficult to followthe same approach for symbol recognition. As we havestated in previous sections, performance of symbol rec-ognition depends on many factors and it is not realistictrying to define a single measure and dataset taking intoaccountallofthem.Then,assymbolrecognitionremainsan active research domain, it seems more interesting tofocus on analyzing and understanding the strengths andthe weaknesses of the existing methods. This will be themain goal of the proposed evaluation framework.In this context, evaluation relies on three issues: first,the definition of a number of standard datasets, cover-ing the full range of variability, as discussed in Sect. 2.Second, the definition of a set of measures, each of themaiming at evaluating a specific aspect of performance.


Jan 23, 2019

RPP Matematika

Jan 23, 2019
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks