A SemiSupervised MultiView Genetic Algorithm
Gergana Lazarova
Software Technologies
Sofia University, “St. Kliment Ohridski”
Sofia, Bulgaria gerganal@fmi.unisofia.bg
Ivan Koychev
Software Technologies
Sofia University, “St. Kliment Ohridski”
Sofia, Bulgaria koychev@fmi.unisofia.bg
Abstract
—
Semisupervised learning combines labeled and unlabeled examples in order to find better future predictions. Usually, in this area of research we have massive amounts of unlabeled instances and few labeled ones. In this paper each instance has attributes from multiple sources of information (views) and a genetic algorithm is applied for regression function learning. Based on the few labeled examples and the agreement among the views on the unlabeled examples the error of the algorithm is optimized, striving after minimal regularized risk. The performance of the algorithm (based on RMSE: rootmeansquare error), is compared to its supervised equivalent and shows very good results.
Keywords  semisupervised learning; multiview learning; genetic algorithms
I.
I
NTRODUCTION
Recently, there has been significant interest in semisupervised learning. Insufficient information has been a burning problem for newly  released systems. The question is how to model the preferences of our users having only few labeled examples, how to recommend articles based on only few rated items, how to forecast future performance and results having a small amount of labeled instances. Furthermore, there are areas of research where new labeled examples are hard to collect and in some cases it is not possible at all. When there is scarce information defining the objects, new sources of information (views) are sought. Each view consists of characteristics of the object, projected onto its data source. To better the performance of the forecast, unlabeled examples can also be explored and used for the learning process. Semisupervised learning requires less human effort and achieves higher accuracy, which makes it of great interest both in theory and in practice. Blum and Mitchel [1] use a twoview cotraining algorithm for faculty webpage classification. The first view contains the words on the webpages and the second
–
the links that point to the web pages. They use only 12 labeled examples out of the 1051 web pages and achieve an error rate of 5%. Multiview semisupervised learning has also been applied for image segmentation [3], object and scene recognition [9], and statistical parsing [10]. Vikas Sindhwani, Partha Niyogi, Mikhail Belkin [2] propose a coregularization approach to semisupervised learning. The presented in this paper approach is similar in that it also uses a coregularization framework but the loss function is optimized via a genetic algorithm. Biology has been the impetus for the development of a highly efficient method for computer optimization
–
genetic algorithms. This method successively improves a generation of candidate solutions to a given problem, using as a criterion how fit or adept they are at solving the problem. Genetic algorithms have been applied in many optimization problems [20]: a flight booking system, which optimizes the revenue of airline [18], fuzzy logic controller optimization [17], in bioinformatics [21], etc. II.
S
EMI
S
UPERVISED LEARNING
Semisupervised learning uses both labeled and unlabeled examples. It falls between unsupervised learning and supervised learning. A teacher has already labeled a small amount of instances (D
1
), the regression function has already been defined for these examples. In semisupervised learning unlabeled instances (D
2
) are also used and added to the pool of training examples. The final training data contains both the examples of D
1
and D
2
(D = D
1
D
2
). Let the number of labeled examples be
l
and
the number of the unlabeled examples 
u
.
nliii
y x D
11
)},{(
,
nu j j
x D
12
}{
,
Unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in the learning accuracy. III.
M
ULTI
V
IEW
L
EARNING
Let each instance X consist of multiple views:
X = (X
1
,
…
, X
k
)
.
X
1
, …,
X
k
represent feature sets. Each feature set defines the object based on its data source. Let
k
f f
,..,
1
are the regression functions, corresponding to the views
X
1
, .. , X
k
. The goal is to find such a combination of learners
**1
,..,
k
f f
(for views
X
1
, .. , X
k
) so that they can agree with one another and a final combined loss function is minimal. Multiple hypotheses are trained from the same labeled data set, and they are required to make similar predictions on future unlabeled instances.
A.
Loss function
A loss function
),0[))(,,(
x f y xc
measures the amount of loss of the prediction. It quantifies the amount by which the prediction deviates from the actual values. In regression we can define the squared loss function as:
2
))(())(,,(
x f y x f y xc
B.
Risk
The risk associated with
f
is defined as the expectation of the loss function:
))](,,([)(
x f y xc E f R
C.
Emperical Risk
In general, the risk
)(
f R
cannot be computed because the distribution
),(
y xP
is unknown to the learning algorithm. However, we can compute an approximation, called empirical risk. The empirical risk of a function is the average loss incurred by
f
on a labeled training sample.
liiiiemp
x f y xc
l f R
1
))(,,(
1)(
D.
Regularized Risk
A regularizer
)(
f
is a nonnegative function, which has a nonnegative realvalued output. If
f
is “smooth”
,
)(
f
will be close to zero. If
f
is too zigzagged and overfits the training data,
)(
f
will be large.
)()()(
f f R f Err
emp
In supervised linear regression:
d ssSLT
ww f xw x f
122
)(
,)(
In semisupervised learning we can define
)(
f
as the sum of the supervised regularizer and the disagreement between the learners on unlabeled instances, multiplied by
2
.
)()()(
2
f f f
SSLSL
k vuulliiviuiSSL
x f x f xc f
1, 1
))(),(,()(
E.
Minimizing the regularized risk
For the multiview learning framework where we have
k
regression functions, we can define
),..,(
1
k
f f Err
as the sum of the regularized risk of each
f
j
on
view
j
,
based on the labeled examples of
D
1
,
plus the disagreement between the pairs
vu
f f
,
based on the unlabeled examples of
D
2
.
k vvSLliiviik
f x f y xc
l f f Err
1 11
))())(,,(
1(),..,(
k vuulliiviui
x f x f xc
1, 12
))(),(,(
Let
**1
,..,
k
f f
be the learners for which
),..,(
1
k
f f Err
is minimal:
),..,(minarg,..,
1,..,**1
1
k f f k
f f Err f f
k
Example: T
woview (
x=(x
1
, x
2
))
semisupervised learning, ridge regression [16]:
bssSLd ssSLT T
vv f view
ww f view
xv x f xw x f
1221222211
)(:2_
,)(:1_
,)(,)(
21
))xv()xw((
1minarg),(
2i2,T12i1,T,*2*1
21
ilii f f
y yl f f
)xvx(w
2,iT11.iT22121
ulli
vw
)1(
IV.
GENETIC
ALGORITHMS
(GA)
Darwin’s theory of natural selection [22] revolutionize
d nineteenth century natural science by revealing that all plants and animals had slowly evolved from earlier forms [23].
Darwin’s concept of evolution
have been carried over into genetic algorithms for solving some of the most demanding problems for computer optimization. Genetic algorithms are based on natural evolution and use heuristic search techniques to find decent solutions. In a genetic algorithm, a population of candidate solutions
(individuals) is evolved towards better solutions. Each candidate solution has a set of features, which can be mutated and altered. The population of individuals is evolving, new individuals arise after crossover of the best fitted so far. New children are born and they replace the worst members of the population [4]. In order to minimize the regularized risk, we apply a genetic algorithm. To find the parameters of (1), we can also solve a system of linear equations. Not always, such a solution exists. In such cases genetic algorithms are preferred. Of course, at the cost of an iterative procedure.
A.
Individuals
All living organisms (individuals) consist of cells, and each cell contains the same set of one or more chromosomes
—
strings of DNA. These chromosomes are the features which define the characteristics of the object. The individuals used for the semisupervised multiview learning genetic algorithm contain chromosomes (features) from the multiple views. For the example in (1) we can define individual
j
as: Individual
j
: w
j1
…
w
js
v
j1
…
v
jp
B.
Fitness Function
The fitness of an organism is defined as the probability that the organism will live to reproduce or as a function of the number of offspring the organism has (fertility). A fitness function measures how close a given design solution is to achieving set aims. A better fitted individual has a higher fitness function and will contribute the most offspring to the next generation. As we want to optimize the regularized risk and find its minimum a more fitted individual will have a smaller value for
),..,(
1
k
f f Err
.
),..,()(
1
k
f f Err p fitness
C.
Selection
This operator selects chromosomes out of the population for reproduction. The fitter the chromosome (based on the fitness function), the more times it is likely to be selected to reproduce. Therefore, individuals with smaller regularized risk will survive, evolve and produce offspring.
D.
Crossover
Crossover of two individuals is the process of reproduction. We strive after population perfection, individuals which are as fitted to their world as possible. Consequently, spreading genes with high fitness is important. Only individuals chosen after the selection process are used for crossover. There are various existing types of crossover. We used uniform crossover
–
with probability of 0.5 newborn children have 50% of the genes of each parent with randomly chosen crossover points.
Figure 1  Uniform Crossover
E.
Mutation
In genetics, a mutation is a change of the genome of an
organism. With small probability a chromosome of the individual is mutated. As the individuals consist of weights, with small probability one of the weights is replaced with a new one, generated as a realvalued small number
∈
[0.5, 0.5].
F.
The Algorithm
1.
Init
–
Generate a population P of N individuals (described in IV  A)). Let each chromosome of the individuals be a weight generated as a small number
∈
[0.5, 0.5].
2.
for(int i=
0
; i <
MAX_ITER
; i
++
){ for(int j = 1; j < t; j++){  (parent _1, parent _2) = SELECTION(P);  (child _1, child _2) = CROSSOVER( parent _1, parent _2);  children.add(child _1, child _2); } P` = Replace_the_worst_individuals(children); MUTATION(P`); P = P` } MAX_ITER
defines the stopping criterion of the algorithm. Other convergent techniques can also be applied. For example: evolve the population until there is no change in the fitness of the best S% of the population. The parameter
t
is the number of pairs that are selected for crossover and it reflects the percentage of individuals that are going to be replaced at each iteration of the algorithm. V.
E
XPERIMENTAL RESULTS
Construction of the training and test sets:
The training set consists of a fraction of labeled examples from the srcinal dataset. Randomly a small amount of examples are chosen, they are added to D
1
. The rest of the instances have the values of the regression function removed (hidden). These unlabeled examples are added to D
2
. A final training set is constructed: D = D
1
D
2
.
The test set contains all the examples in D
2
. We will evaluate the algorithms based on these examples, using the srcinal known labels.
A.
Root Mean Squared Error (RMSE)
RMSE measures the differences between the predicted values and the observed ones with respect to all examples in the test set, averaged.
n x f yn x f y xc
RMSE
niiiniiii
121
))(())(,,(
B.
MonteCarlo Cross Validation
For error estimation Monte Carlo crossvalidation [12] was used, the RMSEs of multiple random samples were averaged. It randomly splits the dataset into training and test sets. For each split, we run the genetic algorithm using the training set, and the error is assessed using the test set. The results depend on the initial labeled examples and their representativeness.
C.
Comparison of Genetic Algorithms
In order to compare two genetic algorithms, the following schema was used:
For each cross validation run both algorithms using the same training and test sets;
For each cross validation run both algorithms are executed 100 times. Only the best performing genetic algorithm (out of the 100 executions) is used further as a result. This best genetic algorithm (BGA) is chosen with accordance to the fitness function. As we have small amount of labeled
examples, we can’t afford to further divide them
into a train and validation set.
For each cross validation run the RMSE of the BGA based on the test set (TRMSE
–
test RMSE) is calculated.
TRMSE of all cross validation runs are averaged
D.
Experimental dataset
The dataset
diabetes
, which was used for the
algorithm’s evaluation
is obtained from UCI Machine Learning Repository [15]. All attributes are realvalued, there are no missing values. It consists of 768 examples, 8 attributes and 2 classes. The two classes were treated as real values
–
0.0 (tested negative) and 1.0 (tested positive). In Table I the comparison between a supervised genetic algorithm and the described semisupervised multiview genetic algorithm can be seen. The two algorithms are compared in accordance to RMSE. The semisupervised multiview genetic algorithm outperforms its supervised equivalent as it has: RMSE = 0.63, whereas the RMSE of its supervised equivalent is 0.81. The parameters of the semisupervised multiview algorithm are as follows: MAX_ITER = 20000,
05.0,5.0
2
, N = 100. Mutation rate: 5%. The number of labeled examples is 20. The rest 748 examples are used as unlabeled. Two views were used:
view_1= {preg, plas, pres, skin};
view_2= {insu, mas, pedi, age};
TABLE I. C
OMPARISON OF THE TWO ALGORITHMS
Algorithm RMSE
Supervised GA 0.81 Semisupervised multiview GA 0.63
For the implementation of the semisupervised multiview genetic algorithm, opencv [13] was used. The crossvalidation steps would be hard to perform, without the optimization of the library. Little research has been performed in this area so far, due to the tedious evaluation process, concerning the small amount of labeled examples and the large number of crossvalidation steps. VI.
CONCLUSIONS
Given the results from the previous section, it can be concluded that even with a small number of labeled examples, good results can be achieved. Based on multiple views a common error is optimized. This optimization is done with the help of a genetic algorithm, but at the cost of an iterative procedure. A
CKNOWLEDGMENT
This work was supported by the European Social Fund through the Human Resource Development Operational Programme under contract BG051PO0013.3.060052 (2012/2014). R
EFERENCES
[1]
A. Blum and T. Mitchell.
“
Combining labeled and unlabeled data with cotraining
”
. In Proceedings of the eleventh annual conference
on Computational learning theory, COLT’ 98, pages 92–
100, New York, NY, USA, 1998. ACM. [2]
V. Sindhwani, P. Niyogi, M. Belkin
“
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples,
Journal of Machine Learning Research, 7(Nov):23992434, 2006. [3]
G.
Lazarova “
SemiSupervised Image Segmentation
”
. The 16th International Conference on Artificial Intelligence: Methodology, Systems, Applications, 2014 [4]
Mitchell, Melanie.
“
An Introduction to Genetic Algorithms
”
. Cambridge, MA: MIT Press, 1996. [5]
X. Zhu and A. Goldberg.
“
Introduction to SemiSupervised Learning
”
. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2009. [6]
O. Chapelle, B. Scholkopf, A. Zien "Semisupervised Learning", 2006, MIT Press [7]
X. Zhu, Z. Ghahramani, and J. Lafferty.
“
Semisupervised learning using Gaussian fields and harmonic functions
”
. In The 20th International Conference on Machine Learning (ICML), 2003. [8]
M. Balcan, A. Blum, & K. Yang.
“Co

training and expansion”.
Towards bridging theory and practice. In L. K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in neural information processing systems 17 Cambridge, MA, (2005): [9]
X. Han, Y. Chen, X. Ruan: Multiclass
“
Cotraining Learning for Object and Scene Recognition
”
. MVA 2011: 6770 [10]
A.
Sarkar “
Applying CoTraining Methods to Statistical Parsing
” In
Proceedings of the 2nd Meeting of the North American Association for Computational Linguistics: NAACL 2001. pp. 175182. Pittsburgh, PA, June 27, 2001. [11]
M. Belkin, P. Niyogi
“
Semisupervised Learning on Riemannian Manifolds
”,
Machine Learning, 56, 209239, 2004. [12]
Monte Carlo Crossvalidation
–
http://en.wikipedia.org/wiki/Crossvalidation_%28statistics%29 [13]
OpenCv  http://opencv.org/ [14]
K. Nigam, A. McCallum, S. Thrun & T.
Mitchell, “
Text classification from labeled and unlabeled data
”
. Machine Learning, 39:2/3, 2000 [15]
K. Bache & M. Lichman. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2013 [16]
A. Tikhonov.
“
Solution of incorrectly formulated problems and the regularization method
”,
Translated in Soviet Mathematics 4: 1035
–
1038, 1963 [17]
D.
Pelusi, “Optimization of a fuzzy logic controller using genetic
algorithms
”,
(2011) Proceedings  2011 3rd International Conference on Intelligent HumanMachine Systems and Cybernetics, IHMSC 2011, 2, art. no. 6038235, pp. 143146.
[18]
D. Whitley, "Applying Genetic Algorithms to Neural Network Problems," International Neural Network Society p. 230,1988 [19]
A. George, B. R. Rajakumar, D. Binu:
“
Genetic algorithm based airlines booking terminal open/close decision system
”
. ICACCI 2012:
174179 [20]
D. Goldberg,
“Genetic Algorithms in Search, Optimization, and Machine Learning”, 1989
[21]
K. Wong, C. Peng, M. Wong, K. Leung:
“
Generalizing and learning proteinDNA binding sequence representations by an evolutionary algorithm.
”
Soft Computing, 15:16311642, 2011 [22]
C. Darwin,
“
On the Origin of Species
”
, 1859 [23]
N. Forbes, “Imitation of life”, 2004