Products & Services

A Semi-Supervised Multi-View Genetic Algorithm

Description
Semi-supervised learning combines labeled and unlabeled examples in order to find better future predictions. Usually, in this area of research we have massive amounts of unlabeled instances and few labeled ones. In this paper each instance has
Published
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Semi-Supervised Multi-View Genetic Algorithm Gergana Lazarova Software Technologies Sofia University, “St. Kliment Ohridski”  Sofia, Bulgaria gerganal@fmi.uni-sofia.bg Ivan Koychev Software Technologies Sofia University, “St. Kliment Ohridski”  Sofia, Bulgaria koychev@fmi.uni-sofia.bg  Abstract  —  Semi-supervised learning combines labeled and unlabeled examples in order to find better future predictions. Usually, in this area of research we have massive amounts of unlabeled instances and few labeled ones. In this paper each instance has attributes from multiple sources of information (views) and a genetic algorithm is applied for regression function learning. Based on the few labeled examples and the agreement among the views on the unlabeled examples the error of the algorithm is optimized, striving after minimal regularized risk. The performance of the algorithm (based on RMSE: root-mean-square error), is compared to its supervised equivalent and shows very good results.  Keywords - semi-supervised learning; multi-view learning;  genetic algorithms I.   I NTRODUCTION Recently, there has been significant interest in semi-supervised learning. Insufficient information has been a burning problem for newly - released systems. The question is how to model the preferences of our users having only few labeled examples, how to recommend articles based on only few rated items, how to forecast future performance and results having a small amount of labeled instances. Furthermore, there are areas of research where new labeled examples are hard to collect and in some cases it is not possible at all. When there is scarce information defining the objects, new sources of information (views) are sought. Each view consists of characteristics of the object, projected onto its data source. To better the performance of the forecast, unlabeled examples can also be explored and used for the learning process. Semi-supervised learning requires less human effort and achieves higher accuracy, which makes it of great interest both in theory and in practice. Blum and Mitchel [1] use a two-view co-training algorithm for faculty web-page classification. The first view contains the words on the web-pages and the second  –   the links that point to the web pages. They use only 12 labeled examples out of the 1051 web pages and achieve an error rate of 5%. Multi-view semi-supervised learning has also been applied for image segmentation [3], object and scene recognition [9], and statistical parsing [10]. Vikas Sindhwani, Partha Niyogi, Mikhail Belkin [2] propose a co-regularization approach to semi-supervised learning. The presented in this paper approach is similar in that it also uses a co-regularization framework but the loss function is optimized via a genetic algorithm. Biology has been the impetus for the development of a highly efficient method for computer optimization  –   genetic algorithms. This method successively improves a generation of candidate solutions to a given problem, using as a criterion how fit or adept they are at solving the problem. Genetic algorithms have been applied in many optimization problems [20]: a flight booking system, which optimizes the revenue of airline [18], fuzzy logic controller optimization [17], in bioinformatics [21], etc. II.   S EMI -S UPERVISED LEARNING  Semi-supervised learning uses both labeled and unlabeled examples. It falls between unsupervised learning and supervised learning. A teacher has already labeled a small amount of instances (D 1 ), the regression function has already been defined for these examples. In semi-supervised learning unlabeled instances (D 2 ) are also used and added to the pool of training examples. The final training data contains both the examples of D 1 and D 2 (D = D 1  D 2 ). Let the number of labeled examples be l and   the number of the unlabeled examples - u .   nliii  y x D 11  )},{(    , nu j j  x D 12  }{   , Unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in the learning accuracy. III.   M ULTI -V IEW L EARNING  Let each instance X consist of multiple views:  X = (X  1  , …  , X  k  ) .  X  1  , …,  X  k   represent feature sets. Each feature set defines the object based on its data source. Let k    f    f    ,.., 1 are the regression functions, corresponding to the views  X  1  , .. , X  k  . The goal is to find such a combination of learners **1  ,.., k    f    f   (for views  X  1  , .. , X  k  ) so that they can agree with one another and a final combined loss function is minimal. Multiple hypotheses are trained from the same labeled data set, and they are required to make similar predictions on future unlabeled instances.   A.    Loss function A loss function ),0[))(,,(    x f  y xc  measures the amount of loss of the prediction. It quantifies the amount by which the prediction deviates from the actual values. In regression we can define the squared loss function as: 2 ))(())(,,(  x f  y x f  y xc      B.    Risk The risk associated with  f   is defined as the expectation of the loss function: ))](,,([)(  x f  y xc E  f  R     C.    Emperical Risk In general, the risk )(  f  R cannot be computed because the distribution ),(  y xP  is unknown to the learning algorithm. However, we can compute an approximation, called empirical risk. The empirical risk of a function is the average loss incurred by  f   on a labeled training sample.    liiiiemp  x f  y xc l f  R 1 ))(,,( 1)(    D.    Regularized Risk A regularizer )(  f    is a non-negative function, which has a non-negative real-valued output. If  f    is “smooth” , )(  f   will be close to zero. If  f   is too zigzagged and overfits the training data, )(  f    will be large. )()()(  f  f  R f  Err  emp       In supervised linear regression:    d ssSLT  ww f  xw x f  122 ||||)( ,)(  In semi-supervised learning we can define )(  f    as the sum of the supervised regularizer and the disagreement between the learners on unlabeled instances, multiplied by 2   . )()()( 2  f  f  f  SSLSL           k vuulliiviuiSSL  x f  x f  xc f  1, 1 ))(),(,()(    E.    Minimizing the regularized risk For the multi-view learning framework where we have k   regression functions, we can define   ),..,( 1  k   f  f  Err   as the sum of the regularized risk of each  f   j   on view  j  , based on the labeled examples of  D 1  ,  plus the disagreement between the pairs vu  f    f    , based on the unlabeled examples of  D 2 .      k vvSLliiviik   f  x f  y xc l f  f  Err  1 11  ))())(,,( 1(),..,(        k vuulliiviui  x f  x f  xc 1, 12  ))(),(,(    Let **1  ,.., k    f    f    be the learners for which ),..,( 1  k   f  f  Err  is minimal: ),..,(minarg,.., 1,..,**1 1 k  f  f k   f  f  Err  f  f  k      Example: T  wo-view (  x=(x 1  , x 2 ))  semi-supervised learning, ridge regression [16]:    bssSLd ssSLT T  vv f view ww f view  xv x f  xw x f  1221222211 ||||)(:2_ ,||||)(:1_ ,)(,)( 21     ))xv()xw(( 1minarg),(  2i2,T12i1,T,*2*1 21 ilii f  f   y yl f  f  )xvx(w|||||||| 2,iT11.iT22121     ulli vw       )1(  IV.   GENETIC  ALGORITHMS (GA)   Darwin’s theory of natural selection [22] revolutionize d nineteenth century natural science by revealing that all plants and animals had slowly evolved from earlier forms [23]. Darwin’s concept of evolution have been carried over into genetic algorithms for solving some of the most demanding problems for computer optimization. Genetic algorithms are based on natural evolution and use heuristic search techniques to find decent solutions. In a genetic algorithm, a population of candidate solutions (individuals) is evolved towards better solutions. Each candidate solution has a set of features, which can be mutated and altered. The population of individuals is evolving, new individuals arise after crossover of the best fitted so far. New children are born and they replace the worst members of the population [4]. In order to minimize the regularized risk, we apply a genetic algorithm. To find the parameters of (1), we can also solve a system of linear equations. Not always, such a solution exists. In such cases genetic algorithms are preferred. Of course, at the cost of an iterative procedure.  A.    Individuals All living organisms (individuals) consist of cells, and each cell contains the same set of one or more chromosomes  —  strings of DNA. These chromosomes are the features which define the characteristics of the object. The individuals used for the semi-supervised multi-view learning genetic algorithm contain chromosomes (features) from the multiple views. For the example in (1) we can define individual  j  as: Individual  j : w  j1   …  w  js  v  j1   …  v  jp     B.   Fitness Function The fitness of an organism is defined as the probability that the organism will live to reproduce or as a function of the number of offspring the organism has (fertility). A fitness function measures how close a given design solution is to achieving set aims. A better fitted individual has a higher fitness function and will contribute the most offspring to the next generation. As we want to optimize the regularized risk and find its minimum a more fitted individual will have a smaller value for ),..,( 1  k   f  f  Err  . ),..,()( 1  k   f  f  Err  p fitness     C.   Selection This operator selects chromosomes out of the population for reproduction. The fitter the chromosome (based on the fitness function), the more times it is likely to be selected to reproduce. Therefore, individuals with smaller regularized risk will survive, evolve and produce offspring.  D.   Crossover Crossover of two individuals is the process of reproduction. We strive after population perfection, individuals which are as fitted to their world as possible. Consequently, spreading genes with high fitness is important. Only individuals chosen after the selection process are used for crossover. There are various existing types of crossover. We used uniform crossover  –   with probability of 0.5 newborn children have 50% of the genes of each parent with randomly chosen crossover points. Figure 1 - Uniform Crossover  E.    Mutation In genetics, a mutation is a change of the genome of an organism. With small probability a chromosome of the individual is mutated. As the individuals consist of weights, with small probability one of the weights is replaced with a new one, generated as a real-valued small number ∈ [-0.5, 0.5]. F.   The Algorithm 1.    Init     –   Generate a population P of N individuals (described in IV - A)). Let each chromosome of the individuals be a weight generated as a small number ∈ [-0.5, 0.5]. 2.    for(int i= 0 ; i < MAX_ITER ; i ++ ){  for(int j = 1; j < t; j++){ - (parent _1, parent _2) = SELECTION(P); - (child _1, child _2) = CROSSOVER(  parent _1, parent _2); - children.add(child _1, child _2);  } P` = Replace_the_worst_individuals(children);  MUTATION(P`); P = P`  }  MAX_ITER defines the stopping criterion of the algorithm. Other convergent techniques can also be applied. For example: evolve the population until there is no change in the fitness of the best S% of the population. The parameter t   is the number of pairs that are selected for crossover and it reflects the percentage of individuals that are going to be replaced at each iteration of the algorithm. V.   E XPERIMENTAL RESULTS  Construction of the training and test sets:    The training set consists of a fraction of labeled examples from the srcinal dataset. Randomly a small amount of examples are chosen, they are added to D 1 . The rest of the instances have the values of the regression function removed (hidden). These unlabeled examples are added to D 2 . A final training set is constructed: D = D 1     D 2 .    The test set contains all the examples in D 2 . We will evaluate the algorithms based on these examples, using the srcinal known labels.  A.    Root Mean Squared Error (RMSE) RMSE measures the differences between the predicted values and the observed ones with respect to all examples in the test set, averaged. n x f  yn x f  y xc  RMSE  niiiniiii      121 ))(())(,,(    B.    Monte-Carlo Cross Validation For error estimation Monte Carlo cross-validation [12] was used, the RMSEs of multiple random samples were averaged. It randomly splits the dataset into training and test sets. For each split, we run the genetic algorithm using the training set, and the error is assessed using the test set. The results depend on the initial labeled examples and their representativeness. C.   Comparison of Genetic Algorithms In order to compare two genetic algorithms, the following schema was used:    For each cross validation run both algorithms using the same training and test sets;     For each cross validation run both algorithms are executed 100 times. Only the best performing genetic algorithm (out of the 100 executions) is used further as a result. This best genetic algorithm (BGA) is chosen with accordance to the fitness function. As we have small amount of labeled examples, we can’t afford to further divide them into a train and validation set.    For each cross validation run the RMSE of the BGA based on the test set (TRMSE  –   test RMSE) is calculated.    TRMSE of all cross validation runs are averaged  D.    Experimental dataset The dataset diabetes , which was used for the algorithm’s evaluation is obtained from UCI Machine Learning Repository [15]. All attributes are real-valued, there are no missing values. It consists of 768 examples, 8 attributes and 2 classes. The two classes were treated as real values  –   0.0 (tested negative) and 1.0 (tested positive). In Table I the comparison between a supervised genetic algorithm and the described semi-supervised multi-view genetic algorithm can be seen. The two algorithms are compared in accordance to RMSE. The semi-supervised multi-view genetic algorithm outperforms its supervised equivalent as it has: RMSE = 0.63, whereas the RMSE of its supervised equivalent is 0.81. The parameters of the semi-supervised multi-view algorithm are as follows: MAX_ITER = 20000, 05.0,5.0 2       , N = 100. Mutation rate: 5%. The number of labeled examples is 20. The rest 748 examples are used as unlabeled. Two views were used:    view_1= {preg, plas, pres, skin};    view_2= {insu, mas, pedi, age}; TABLE I. C OMPARISON OF THE TWO ALGORITHMS  Algorithm RMSE Supervised GA 0.81 Semi-supervised multi-view GA 0.63 For the implementation of the semi-supervised multi-view genetic algorithm, opencv [13] was used. The cross-validation steps would be hard to perform, without the optimization of the library. Little research has been performed in this area so far, due to the tedious evaluation process, concerning the small amount of labeled examples and the large number of cross-validation steps. VI.   CONCLUSIONS Given the results from the previous section, it can be concluded that even with a small number of labeled examples, good results can be achieved. Based on multiple views a common error is optimized. This optimization is done with the help of a genetic algorithm, but at the cost of an iterative procedure. A CKNOWLEDGMENT  This work was supported by the European Social Fund through the Human Resource Development Operational Programme under contract BG051PO001-3.3.06-0052 (2012/2014). R EFERENCES   [1]   A. Blum and T. Mitchell. “ Combining labeled and unlabeled data with co-training ” . In Proceedings of the eleventh annual conference on Computational learning theory, COLT’ 98, pages 92–  100, New York, NY, USA, 1998. ACM. [2]   V. Sindhwani, P. Niyogi, M. Belkin “ Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples,   Journal of Machine Learning Research, 7(Nov):2399-2434, 2006. [3]   G. Lazarova “ Semi-Supervised Image Segmentation ” . The 16th International Conference on Artificial Intelligence: Methodology, Systems, Applications, 2014 [4]   Mitchell, Melanie. “ An Introduction to Genetic Algorithms ” . Cambridge, MA: MIT Press, 1996. [5]   X. Zhu and A. Goldberg. “ Introduction to Semi-Supervised Learning ” . Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009. [6]   O. Chapelle, B. Scholkopf, A. Zien "Semi-supervised Learning", 2006, MIT Press [7]   X. Zhu, Z. Ghahramani, and J. Lafferty. “ Semi-supervised learning using Gaussian fields and harmonic functions ” . In The 20th International Conference on Machine Learning (ICML), 2003. [8]   M. Balcan, A. Blum, & K. Yang. “Co - training and expansion”.  Towards bridging theory and practice. In L. K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17 Cambridge, MA, (2005): [9]   X. Han, Y. Chen, X. Ruan: Multi-class “ Co-training Learning for Object and Scene Recognition ” . MVA 2011: 67-70 [10]   A. Sarkar “ Applying Co-Training Methods to Statistical Parsing ” In Proceedings of the 2nd Meeting of the North American Association for Computational Linguistics: NAACL 2001. pp. 175-182. Pittsburgh, PA, June 2-7, 2001. [11]   M. Belkin, P. Niyogi “ Semi-supervised Learning on Riemannian Manifolds ”, Machine Learning, 56, 209-239, 2004. [12]   Monte Carlo Crossvalidation  –   http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29 [13]   OpenCv - http://opencv.org/ [14]   K. Nigam, A. McCallum, S. Thrun & T. Mitchell, “ Text classification from labeled and unlabeled data ” . Machine Learning, 39:2/3, 2000 [15]   K. Bache & M. Lichman. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2013 [16]   A. Tikhonov.   “ Solution of incorrectly formulated problems and the regularization method ”,  Translated in Soviet Mathematics 4: 1035  –  1038, 1963 [17]   D. Pelusi, “Optimization of a fuzzy logic controller using genetic algorithms ”, (2011) Proceedings - 2011 3rd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2011, 2, art. no. 6038235, pp. 143-146.  [18]   D. Whitley, "Applying Genetic Algorithms to Neural Network Problems," International Neural Network Society p. 230,1988 [19]   A. George, B. R. Rajakumar, D. Binu:  “ Genetic algorithm based airlines booking terminal open/close decision system ” . ICACCI 2012:   174-179 [20]   D. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learning”, 1989  [21]   K. Wong, C. Peng, M. Wong, K. Leung: “ Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm. ”  Soft Computing, 15:1631-1642, 2011 [22]   C. Darwin,  “ On the Origin of Species ” , 1859 [23]    N. Forbes, “Imitation of life”, 2004  
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks