Court Filings

A Robust Model for Intelligent Text Classification

Description
Methods for taking into account linguistic content into text retrieval are receiving a growing attention [16],[14]. Text categorization is an interesting area for evaluating and quantifying the impact of linguistic information. Works in text
Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A robust model for intelligent text classification Roberto Basili and Alessandro MoschittiUniversity of Rome Tor VergataDepartment of Computer Science, Systems and Production00133 Roma (Italy)   basili, moschitti ✁  @info.uniroma2.it Abstract  Methods for taking into account linguistic content intotext retrieval are receiving a growing attention [16],[14].Text categorizationis an interesting area for evaluatingand quantifying the impact of linguistic information. Works intext retrieval through Internet suggest that embedding lin-guistic information at a suitable level within traditionalquantitative approaches (e.g. sense distinctions for queryexpansion as in [14]) is the crucial issue able to bring theexperimental stage to operational results.This kind of representational problem is also studied in this paper where traditional methods for statistical text catego-rization are augmented via a systematic use of linguisticinformation. Again, as in [14], the addition of NLP ca- pabilities also suggested a different application of existingmethods in revised forms. This paper presents an extensionof the Rocchio formula [11] as a feature weighting and se-lection model used as a basis for multilingual Information Extraction. It allows an effective exploitation of the avail-ablelinguistic informationthat better emphasizes this latter with significant both data compression and accuracy. Theresults is an srcinal statistical classifier fed with linguistic(i.e. more complex) features and characterized by the novel featureselection and weightingmodel. It outperformsexist-ing systems by keeping most of their interesting properties(i.e. easy implementation, low complexity and high scala-bility). Extensive tests of the model suggest its applicationas a viable and robust tool for large scale text classificationand filtering, as well as a basic module for more complexscenarios. 1 Introduction Methods for taking into account linguistic content intotext retrieval are receiving a growing attention [16],[14].Text categorization is an interesting area for studying theimpact of NLP information in retrieval processes. Worksin text retrieval through Internet suggest that embeddinglinguistic information at a suitable level within traditionalquantitative approaches (e.g. sense distinctions for queryexpansion as in [14]) is the crucial issue able to bring theexperimental stage to operational results. This kind of rep-resentationalproblemis alsostudiedin thispaperwheretra-ditional methods for statistical text categorization are aug-mented via a systematic use of linguistic information.The study of text classification ( ✂☎✄  ) is very useful tovalidateand measurethe qualityof the lexical methods(e.g.inductive methods over corpora or treebanks). Althoughtext classification cannot objectively measure the relevanceof linguistic information for every task, its benefits in ✂✆✄ suggest also a positive impact in other ✝✟✞  tasks. In ✂✆✄  asystematic experimental framework is possible: tasks andperformance factors, influenced by the availability of in-duced lexical information, can be assessed and measuredover well-assessed benchmarking data sets.In [3] an srcinal extension of the well-know Rocchiomodel for feature weighting was proposed. The aim was tobetter assess the contribution of richer forms of feature rep-resentation on benchmarking data. Weighting was seen asa suitable method for measuring the effects of more infor-mative features on the performance of the target classifier.Large scale experiments confirmed the need of tuning theRocchio’s formula parameters to training data. Sensitivityof the formula to different values of the parameters is alsodiscussed in [7], where warnings on the estimation method-ology are also raised.The technique proposed in [3] is based on empirical pa-rameter estimation aiming to optimize performances overan establish document set of documents. In [3] estimat-ing over the test set itself was used to avoid noise in themodel setting. The introduced bias (and its high perfor-mance) was thus adopted as an experimental framework tosystematically measure the contribution of NLP. The resultwas that such a feature selection method was effective inemphasizing linguistic features like POS tagged lemmas,complex proper nouns and noun phrases, showing a sig-  nificant improvement with respect to poorer features (i.e.simple stems). The adopted estimation procedure was notgenerally assessed, as different test sets may lead to differ-ent parameter settings. In order to define a valid general-ized Rocchio model, we have to show that parameters donot depend on document sets chosen for estimation but canbe tuned via generally valid procedures.In this paper, a parameter estimation procedure for thethe extended Rocchio classifier is suggested and experi-mented. If an improvement similar to those suggested in[3] can be obtained this would assess the methodology as anovel approach to profile based classification. This woulddepend both on the availability of linguistic (i.e. more com-plex) information and on the better weighting and selectionguaranteed by the proposed generalized formula. The re-sulting hybrid model would thus be assessed as a viableintelligent approach to ✂✆✄  combining symbolic modeling,used in language processing and disambiguation, with arathersimplequantitativetechniquelargelyemployedinop-erational systems.In Section2, the basic conceptsaboutthe problemissuedin this paper will be introduced. The novel feature selectionmodelwith its weightingcapabilities is presentedin Section3, where the suggestedestimation procedureis also defined.In Section 4 experiments are reported aiming to show theeffectiveness of the proposed estimation technique as wellas to quantify the contribution of linguistic information. 2 Language-driven Text Classification The classification problem is the derivation of a decisionfunction  ✂✁✄  that maps documents ( ☎✝✆✟✞  ) into one ormore classes, i.e.  ✠✁✄☛✡✞✌☞✎✍✑✏  , where a set of classes, ✄ ✓✒✓✔ ✄ ✖✕✘✗✚✙✛✙✜✙✜✙✛✗ ✄ ✣✢✥✤ , represent topics and subtopics (e.g.” Politics ”/” Foreign Politics ”) and an extensive collection of examples classified into them, often called  training set  , isavailable to derive  ✠✁✄  . Profile-based   (orlinear)classifiers arecharacterizedbyafunction  ✠✁✄  based on a similarity measure between the rep-resentation of the incoming document ☎  and each class ✄ ✧✦  .Both representations are vectors and similarity is tradition-ally estimated as the cosine angle between the two vectors.Thedescription ★ ✄ ✩✦  of each targetclass ( ✄ ✪✦  ) is usually called  profile , that is the vector summarizing the content of all thetraining documents pre-categorized under ✄ ✦  . The vectorcomponents are called  features  and refer to independent di-mensions in the space in which similarity is estimated. The ✫ -th components of a vector representing a given document ☎ is a numerical weight associated to the ✫  -th feature ✬  of the dictionary that occurs in ☎  . Similarly, profiles are de-rived from the grouping of positive instances ☎  in class ✄ ✧✦  ,i.e. ☎✭✆ ✄ ✩✦  .Traditional techniques (e.g. [15]) make use of singlewords ✬  as basic features. The next section will describethe kind of linguistic information that extends class profilesand the processes used to obtain them. 2.1 Linguistic features in text categorization Linguistic content in ✂☎✄  can be represented by suitable ✮✰✯✁✄✲✱✴✳✯✶✵ able to express the needed evidence to the  ✂✁✄ function, i.e. selective information about training and testdocuments. Basic language processing capabilities tradi-tionally allow to extend the knowledge about words occur-ring in documents, like for example their canonical forms(i.e. the morphological derivation from a lemma) and theirsyntactic roles (i.e. part-of-speech (POS) in the input con-text). PreviousworksonNLP-driventext classification(e.g.[3]) also suggest that availability of significant (or domainspecific) multiwords improves performances. The recog-nition of Proper Nouns and terminological expressions pro-videseffectiveinformationablefocusonmoreselectivefea-ture sets.The next section describe the nature of the linguistic in-formation available from training data set and the processesused to derive them. 2.1.1 The extraction of linguistic features The ✂✆✄  model that is proposed in this paper has been usedwithin TREVI (Text Retrieval and Enrichment for Vital In-formation 1 ), a system for Intelligent Text Retrieval and En-richment. TREVI components are servers cooperating tothe processing, extraction, classification, enrichment anddelivery of news. Basically two TREVI components con-tribute to the ✂✆✄  task: ✷ the ✸✹✁✳✵✚✯✳  , i.e. a full linguistic preprocessor that takea normalized version of the news and produces a set of grammatical and semantic information for each text. ✷ a ✺✱✥✻✽✼✯ ✄ ✝ ✑☎✯✿✾✄✫✮✫✯✳  , that according to the ✸✹✁✳✵✿✯✳ output and to the derived class profiles assigns oneor more topics to each news. This is the proper ✂✆✄ (sub)system.The ✸✹✁✳✵✿✯✳  in TREVI is a complex (sub)system com-biningtokenization,lemmatization (via an independentlex-ical server), Part-of-Speech tagging [5] and robust parsing[4] 2 . The information produced by the parser and used bythe the ✺✱✥✻✽✼✯ ✄ ✝ ✑☎✯✿✾✄✫✮✫✯✳  component is the following: 1 TREVI is a distributed object-oriented system, designed and devel-oped within a European consortium under the TREVI ESPRIT projectEP23311. 2 Details on the linguistic methods and algorithms for each phase canbe found in [4].  ✷ Lemmas or multiwords expressions. Simple words(e.g. ✻✁✾ ✁   , ✂ ✁✄  ☎✄  ) as well as complex terminologi-cal expressions (e.g. noun phrases like ” bond issue ”or functional expressions as  in order to ) are detectedand properly used during the later phases. Details onthe extraction of relevant complex nominals acting asterminological expressions for the target categories isdescribed in Section 2.1.2; ✷ Proper Nouns ( ✸ ✝✆  s). In line with systems for Infor-mation Extraction, Named-Entities are recognized byextensive catalogs as well as by the application of NEgrammars. A typedset of propernounsis derivedfromeach news and processed independentlyfrom the otherlemmas. ✷ Syntactic Categories of lemmas. Units of text (i.e.simple or complex terms) are tagged by a single Part-of-Speech (POS), (e.g.  N  for nouns,  V  for verbs).Documentdescriptions include lemmas with their ownPOS, so that verbal and nominal occurrences are inde-pendent (e.g. ✳✁✄✯  /  V ✞ ✒✳✁✄✯  /  N ) ✷ Major grammatical relations (i.e.  Subj/Obj  rela-tions among words) are detected. News are thus anno-tated with basic syntactic structures emphasizing theroles of significant constituents (verbs and their modi-fiers).The classification model that we propose is a profile-based classifier using as features the document’s lemmasassociated with their part-of-speech (POS) labels and theterminological expressions. Only nouns, verbs and adjec-tives are considered candidates features, and the resultingindexes are couples  <lemma, POStag> . Proper Nouns(PNs) are also part of the profile 3 . Moreover, no stop list isused in TREVI, as POS taggingsupplies the corresponding,and linguistically principled, filtering ability. 2.1.2 Corpus-driven terminology extraction The noun phrase detection is supported by an inductivemethod for (off-line) terminology extraction early intro-duced in [2]. It is based on an integration of symbolic andstatistical modeling. First, relevant atomic terms ✄ ✄  (i.e.singleton words) are identified by traditional techniques,e.g. the ✫☎✮  score early suggested in [15]. Linguisticallyprincipled grammars 4 are then applied to identify linguistic 3 Future work will include in the pro le also the available syntagmaticinformation, as its treatment requires a more complex description languageand statistical modeling. No grammatic relation is thus considered in thefeature set, although terminological structures brings information by hid-ing inner modi ers and relations. 4 A linguistic preprocessing supports tokenization, Part-of-Speech tag-ging and lemmatization for the grammatical recognition. structures (headed by ✄ ✄  ) as admissible candidates for ter-minological expressions. Finally, extracted candidates arevalidated and selected by the of use statistical filters. Sta-tistical properties imposed on the occurrences of multiwordsequences aim to restrict the semantic relations expressedby terms.In terminology terms are surface canonical forms of structured expressions referring to entities with complexproperties in a domain. They are nouns or noun phrasesgenerally denoting specific concepts in a given corpus, i.e.in a given domain.Usually term candidates are couples ✟✡✠ ✗★ ☛✌☞  , where ★ ☛ represents the sequence of (left and/or right) modifiers,e.g.  (disk, (-1,hard)) ,  (system, ((-2,cable),(-1,television)) for  hard disk   and  cable television system , respectively. Mu-tual information (MI), [10], has been often used to capturelinguistic relations between words (e.g. [6, 8]): ✝ ✍✟✎✠ ✗ ☛✏☞ ✒ ✒✑✔✓✖✕✘✗✚✙✜✛✣✢✥✤✦☎✧ ✙★✛✣✢✩✧✎✙★✛✔✦✪✧ .The stronger is the captured relation between ✠  and ☛  thelarger is the joint with respect to marginal probabilities 5 .The basic problem is that MI (and its estimation) is con-cerned with only two events, and is better suited with bi-grams, e.g.  hard disk  . Longer expressions usually requirean iterative estimation as in [2, 9]. In [1] a different ap-proach is proposed based on an extension of MI to collec-tions of events (i.e. vector of words): ✝ ✍✟✎✠ ✗★ ☛✏☞ ✒ ✫✑✔✓✖✕✗✬✙✜✛✣✢✥✤✎✭✦☎✧ ✙★✛✔✢✮✧✎✙✜✛✯✭✦✰✧ wheretheconceptuallinkis consideredbetweenword ✠  andthe vector ★ ☛ ✒ ✱✟☛ ✕✘✗ ☛✗ ✗✚✙✛✙✜✙✜✗ ☛ ✢ ☞  . The MI estimation ✝ ✍✟✎✠ ✗★ ☛✏☞  isobtained first by estimating each ✫  -th component, ✲ ✝ ✍✟✎✠ ✗ ☛ ✦ ☞  ,then by graphical comparison among the obtained ✲ ✝ ✍✟✎✠ ✗ ☛ ✦ ☞  .The obtained points define an histogram corresponding toa complex noun phrase. The study of the envelope by ashape factor allows to analyse the MIs of ”multiple” mod-ifiers. If a semantic relation holds between the modifiers ★ ☛ and the head ✠  , than the obtained plot should be flat,i.e. no significant difference between the ✝ ✍✟✎✠ ✗ ☛ ✦ ☞  valuesshould be observed. In this way each candidate term ✟✡✠ ✗★ ☛✏☞ is analysed looking ”in parallel” to all its different MIs (i.e. ✝ ✳✟✡✠ ✗ ☛ ✦ ☞✵✴ ✫ ). Thresholding on the differences provides astraightforward and efficient decision criteria applied with-out iterating.We processed the full training set available for a class ✄ ✖✦ an derived specific terminological datasets, ✂ ✯✳ ✂ ✦  . Dur-ing preprocessing (i.e parsing), items in ✶ ✦ ✂ ✯✳ ✂ ✦  are thus 5 A variety of estimations and extension of MI have been proposed, [6],like the following: ✷✸✺✹✣✻✽✼✡✾❀✿✌❁❃❂❅❄✰❆✮❇✏❈❊❉●❋ ✹✣✻✽✼✡✾❀✿ ❉✹✔✻✖✿❉✹✔✾✩✿ (1)where ❉❍❋✹✔✻✽✼■✾❀✿  is the frequency of cooccurence of words ✻  and ✾  at dis-tance ❏  .  matched and represented as document features. Such com-plex noun phrases have been employed within the ✂✆✄  ex-periments described in the Section 4. 3 Extending the Rocchio’s formulafor optimal feature selection and weighting The poor improvements observed in NLP-driven IRtasks (e.g. [16]) usually depends on the noise introducedbythe linguistic recognition errors or ambiguities (e.g. senseambiguity in query expansion) which provides drawbackscomparableto the significant advantages. When more com-plex features (e.g. words and their POS tag or terminologi-cal units) are captured,it can be evenmoredifficult to selectthe relevant ones among the set of all features. Data sparse-nesseffects(e.g. thelowerfrequencyof  ✾  -gramswrtsimplewords) interact with wrong recognitions (e.g. errors in POSassignment) and the overall information has a lower selec-tivity for the function  ✠✁✄  .The traditional solution is usually the  feature selection ,discussed for examplein [18]. By applyingstatistical meth-ods, (information gain,   ✗  , mutual information ...), the notrelevant features are removed. Major drawbacks are thatfeatures irrelevant for a class may be removed even if theyare important for another one. ✝ ✂ ✂✁☎✄ ✳✘✄✁✾✄  but rare or spe-cific ✮✰✯✁✄✲✱✴✳✯✘✵  may be cut in this way, as also noted in [13].The crucial issue here is how to give the right weight to agiven feature in different classes. This is even more impor-tant when NLP (and, especially, terminology recognition)is applied: some technical terms can be perfectly valid fea-tures for a class and, at the same time, totally irrelevant ormisleading for others.The Rocchio’s formula has been traditionally used inorder to build profiles associated to categories in Profile-Based Text Classifier. It is defined as follows. Given: ✷ the set of training documents ✞ ✦  classified under thetopics ✄ ✩✦  (positive examples), ✷ the set ✆ ✞ ✦  of the documents not belonging to ✄ ✦  (neg-ative examples) and ✷ ✞✝✠✟✡ , the weights 6 of feature ✮  in document ✄  ,the weight ☛✟✡  of a given feature ✮  in the profile of the class ✄ ✦✒ ✂☞✌☛ ✕ ✡ ✗ ✍☛ ✗ ✡ ✗✂✙✜✙✛✙ ✏✎ is: ☛ ✦ ✡ ✒ max ✑✓✒ ✗ ✕✔✖ ✞ ✦ ✖✘✗✟✚✙✚✛✢✜ ✝✟✡✤✣✥ ✖✆ ✞ ✦ ✖✗✟✚✙✧✦✛ ✜✝✟✡✩★  (2)In Eq. 2 the parameters ✔  and ✥   control the relative im-pact of positive and negative examples and determine theweight of  ✮  in the ✫  -th profile. In [11], Eq. (2) has been 6 Several methods are used to assign weights of a feature, as widelydiscussed in [15]. used with values ✔  = 16 and ✥   = 4 as the task was catego-rization of low quality images.The relevance of a feature deeply depends on the corpuscharacteristic and, in particular, on the differences amongthe training material for the different classes, e.g. size, thestructure of topics or the style of documents. They sensi-bly change according to text collections and classes. TheEquation 2 takes this into account setting to 0 features witha negative difference between positive and negative rele-vance. This aspect is crucial since the 0-valued featuresare irrelevant in the similarity estimation (i.e. they give anull contribution to the scalar product). This form of selec-tion is rather smooth and allows to retain features that areselective only for some of the target classes. As a result,features are optimally used as they influence the similarityestimation for all and only the classes for which they areselective. The ✥   and ✔  setting that optimizes the classifica-tion performance allows to drastically reduce noise withoutdirect feature elimination.At the same time Eq. 2 provides scores, ☛ ✦ ✡  , that can be di-rectly used as weights in the associated feature space. Eachcategory has in this way its own set of relevant and irrel-evant features. It has been thus proposed in [3] that theoptimal values of these two parameters can be obtained byestimating them independently for each class ✫  . This re-sults in a vector of ( ✥  ✦  , ✔ ✦  ) couples each one optimizing theperformance of the classifier over the ✫  -th class. From nowon we will refer to this model as the ✞ ✪✄ ✶ ✠  ☎✄ ✫ ✄✬✫✜  classifier.Notice that the combined estimation of the two parame-ters is not required. For each class, one parameter ( ✔ ✦  =1)is fixed and ✥  ✦  is tuned until the optimal performance isreached. The weighting, rankingand selection scheme usedfor ✞ ✪✄ ✶✠  ☎✄ ✫ ✄✭✫✜  classifier is thus the following: ☛ ✦ ✡ ✒ max ✑✒ ✗ ✯✮✖ ✞ ✦ ✖✗✟✚✙✚✛✜✝✟✡✂✣✰✥  ✦ ✖✆ ✞ ✦ ✖✗✟✚✙✱✦✛✢✜ ✝✟✡★  (3)Equation3 has beenappliedgiventhe parameters ✥  ✦  that foreach class ✄ ✦  lead to the maximum breakeven point 7 of  ✄ ✦  . 3.1 Estimating parameters in a generalized Rocchio model TheideaofparameteradjustmentintheRocchioformulais not completelynew. In [7] has been pointedout that theseparametersgreatlydependonthe trainingcorpusanddiffer-ent settings of their values produce a significant variation inperformances. However their estimation was not clarified.The major problem was that the simple parameter estima-tion procedure that provides the lowest ✄✲✳✁✫✾✫✾ ✳✲  set error 7 It is the threshold values for which precision and recall coincide (see[17] for more details).  produceda small improvementin the error rate over the ref-erence test-set. The reason was that parameters for optimiz-ing classification of training documents are very differentfrom those optimizing the test-set classification. This didlead to the erroneous conclusion that the parameters are aproperty of the document set used for their derivation andso their use cannot increase general classification perfor-mances.We are in agreement with the obtained results, but, asusually suggested, parameter estimation should never becarried out just on the set also used for training. The con-sequence can be a parameterization which depends heavilyon the evidence extracted from training texts, that is too bi-ased by this last information. Notice that an approach thattakes a set of training documents for profile building and aseconddifferentsubset, calledthe estimation set, forparam-eter estimation is more reasonable. First, the estimation isstill carried out over data independenton the test set. More-over, the obvious bias due to training material is avoided.If the estimated parameters converge to settings that havecomparable(i.e optimal) performancealso on the target testset we can conclude that: ✷ ✥   ✦ values do not dependent on document sets but aretightly related to the categories ✄ ✪✦  , and ✷ this procedure is general enough to be largely appliedin operational scenarios of real AI applications.More technically, the following parameter estimationprocedure has been used. A benchmarking collection isusually made by a set of controlled, i.e. already catego-rized, documents. This set is then splitted into a first subsetof training documents, called  learning  set    ✣✺  , and a secondsubset of documents used to evaluate performance, called test   set. This split can be fixed (as in the Reuters 3 collec-tion [17]), or generated randomly from the collection. Instatistical text categorization the learning set is traditionallyused to extract features and build profiles.Assomewhereappliedtostatistical NLP,theparameteresti-mation for the Eq. 3 can be carried accordingto an held-outestimation procedure.1. First, a subset of     ✣✺  , called estimation set ✁ ✺  , is de-fined.2. The set    ✣✺ ✣ ✁ ✺  is then used for profile building3. Estimation of the ✥   ✦  parameters is finally carried outover ✁ ✺  .Performance of the resulting model can be thus measuredover the ✂ ✺  documents. Notice that this procedure can beapplied iteratively if steps 2-3 are carried out according todifferent, randomly generated splits ✁ ✺ ✄✂  and    ✣✺ ✣ ✁ ✺ ☎✂  .Several vectors ★ ✥   ✦  are thus derived at steps    , denoted by ★ ✥   ✛ ✂ ✧ ✦ . A resulting ★ ✆ ✦  can be thus obtained via a point wiseestimator ✝  applied to the ★ ✥   ✦ ✛ ✂ ✧  distribution, i.e. ★ ✆ ✦✒ ✞✝ ✟ ★ ✥   ✦ ✛ ✕ ✧ ✗✚✙✛✙✜✙✛✗ ★ ✥   ✦ ✛ ✟ ✧☞ (4)Performance of the model parameterized by ✆ ✦  can bethen measured over the ✂ ✺  documents.The above procedure is easily applicable whenever thenumber of documents in the training set    ✣✺  is large enoughfor ✁ ✺  (or ✁ ✺ ✠✂  )toberepresentativeofalltheclasses. Ifthenumber of training documents available in ✁ ✺  for a class ✄ ✦ is too low, the parameter estimation procedure that op-timize BEP is not stable, possibly producing biased results.Unfortunately, a number of benchmarking collections arecharacterized by a poor balancing between the number of available training material for the target categories. Thisprevents the choice of smaller ✁ ✺  sets, as they would notprovide enough information for reliable parameter estima-tion: this can penalize the accuracy of the profile build-ing phase. However, real operational scenarios (e.g. newsagencies repositories, like the Reuters one used within theTREVIproject)areless affectedbythese problemsaslargerdata set can be made available.It should be noticed that the use of just one parameter(i.e. ✥    ) allows the estimation procedure to be easily imple-mented. Other models, as [7], use to select ✔  and ✥    amonga small set of values, empirically defined. The procedurepresented above keeps ✔ ✦  fixed and allows to tune the nega-tive contribution of other categories expressed by ✥   ✦  . In thisway, the ✥   ✦  estimation implementsa pruningof featuresthatare too frequent in other categories (singletons, or ✾  -grams,assigned with a 0 weight). This naturally shrinks the rangeof parameter values (i.e. ✥   ✦  ) to be tried.If the suggested procedure provides an increase in per-formanceswith respect tothe previousRocchio-basedmod-els, several implications can be drawn: ✷ First, a systematic feature selection is available so thatit can be used to emphasize the linguistic features in ✂✆✄ . ✷ The overall performances are in line with traditionalbenchmarking in the ✂☎✄  area and can be thus used asa comparative result with respect to other models ✷ Finally, the relative low complexity of the overallmodel can be generalized to real (i.e operational) tasksin Information Filtering and Knowledge Managementareas. 3.2 Related works A probabilistic analysis of the Rocchio classifier algo-rithm has been carried out in [12], that discusses a versionof the Rocchio formula using ✂ ☛✡✌☞ ✝ ✑✞ ✍✡  (product between
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks