History

A Robust Elastic and Partial Matching Metric for Face Recognition

Description
A Robust Elastic and Partial Matching Metric or Face Recognition Gang Hua Amir Akbarzadeh Microsot Corporate One Microsot Way, Redmond, WA 9852 {ganghua, Abstract We present a robust
Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
A Robust Elastic and Partial Matching Metric or Face Recognition Gang Hua Amir Akbarzadeh Microsot Corporate One Microsot Way, Redmond, WA 9852 {ganghua, Abstract We present a robust elastic and partial matching metric or ace recognition. To handle challenges such as pose, acial expression and partial occlusion, we enable both elastic and partial matching by computing a part based ace representation. In which N local image descriptors are extracted rom densely sampled overlapping image patches. We then deine a distance metric where each descriptor in one ace is matched against its spatial neighborhood in the other ace and the minimal distance is recorded. For implicit partial matching, the list o all minimal distances are sorted in ascending order and the distance at the αn-th position is picked up as the inal distance. The parameter α 1 controls how much occlusion, acial expression changes, or pixel degradations we would allow. The optimal parameter values o this new distance metric are extensively studied and identiied with real-lie photo collections. We also reveal that iltering the ace image by a simple dierence o Gaussian brings signiicant robustness to lighting variations and beats the more utilized sel-quotient image. Extensive evaluations on ace recognition benchmarks show that our method is leading or is competitive in perormance when compared to state-o-the-art. 1. Introduction Face recognition has been extensively studied in the community or several decades [1, 2, 8, 14, 17, 24, 29, 3]. It has been shown by the ace recognition grand challenge [19] that under controlled settings, the recognition rate can be higher than 99% with alse acceptance rate as low as.1%. Nevertheless, this is not the case when perorming ace recognition in uncontrolled real lie photos [11]. Such photos include considerable visual variations caused by, or example, lighting [7, 21], dierence in pose [21], acial expression [2] and partial occlusion [2]. It has been demonstrated that state-o-the-art photometric rectiication techniques such as sel-quotient image [26] can largely mitigate lighting variations except or extreme Figure 1. Examples o matched aces in our experiments. Notice the signiicant variations in lighting, pose, acial expression as well as partial occlusion. Each row shows two pair o matched aces. cases. In our experiments, we reveal that a simple dierence o Gaussian (DoG) ilter outperorms the more utilized selquotient image method in handling normal lighting variations. Although there has been great research progress on ace alignment [12], signiicant pose variations may still exist ater alignment. Thereore, besides extreme lighting variation [7], pose, acial expression and partial occlusion remain great challenges. Intuitively, these challenges can largely be alleviated by designing robust ace distance metrics that leverage both elastic and partial matching. By design principle, there are two types o ace distance metrics: learning based metrics [24, 2, 8, 14, 3, 29] and hand-crated metrics [1, 18, 15]. Inspired by the seminal work o Turk and Pentland on Eigen-Faces [24], learning based distance metrics have been a very active topic in ace recognition. The predominant research eorts consist o identiying a discriminative embedding o aces in order to deine a distance metric. Fisher- Faces [2], Laplacian-Faces [8] and their regularized variants [5, 4, 3] are all along this line. Other learned met- rics include those based on SVMs [17] and Bayesian methods [16]. Nevertheless, learned metrics based on strongly supervised learning, such as linear discriminative embeddings [2, 8] or SVMs [17], need to be trained on the speciic data-set that they deal with and most oten the ace images need to be aligned well to acilitate learning o meaningul structure o acial appearances. This makes them more suitable or controlled surveillance scenarios with limited subjects where labeled gallery aces are ready to be used or training. In addition, the training data-set has to be large enough to reduce the risk o over-itting. For many ace recognition tasks on real-lie photos, these conditions may not be satisied. Hence, it is desirable to have a plug-and-play distance metric, which does not need to be trained and can conveniently be used in ace recognition tasks dealing with real-lie photos. Hand-crated distance metrics do not suer rom the problems conronting the learned metrics. They also provide much more lexibility in incorporating elastic and partial matching schemes. For example, Elastic Bunch Graph Matching (EBGM) [28] represents each ace by a graph, the nodes o which are a set o Gabor jets extracted rom acial land-marks. Then a graph matching algorithm is designed to calculate the distance between two ace representations. Notwithstanding their demonstrated success, graph matching is computationally intensive and the Gabor jets may not be discriminative enough or robust matching. Ahonen et al. [1] proposed a distance metric that is calculated by a weighted sum o χ 2 distances, each o which is calculated between histograms o local binary patterns (LBP) on non-overlapping image partitions. However, it does not enable elastic matching at all, and hence, it is not robust to pose variations. Vivek and Sudha [18] proposed a partial Hausdor distance metric where each pixel is represented as a binary vector similar to local binary pattern. The drawback o this method is that the spatial structure is largely discarded since a pixel in one ace could be matched with any other pixel having the same local binary pattern in the other ace. This is not desirable, especially when the aces are roughly aligned with each other. In search or a robust ace distance metric to handle all the challenges in ace recognition, we take a part based ace representation to enable elastic and partial matching, where a set o N local image descriptors [13, 27, 9, 23] are extracted rom overlapping and densely sampled image patches. Then, in the matching process, each local image descriptor in one ace image is compared against descriptors in its spatial neighborhood in the other ace image and the minimal distance is recorded. To perorm partial matching, the list o all recorded minimal distances are then sorted in ascending order and the distance at the αn-th position is picked up as the inal distance metric, where α 1 is a control parameter on how much pixel degradation, acial expression changes and partial occlusions we would allow in the ace images. The optimal parameter settings o our distance metric is extensively studied on real lie photos obtained rom several people s amily & riends photo collections. With these optimal parameter settings, the proposed distance metric exhibits great robustness to pose variation, partial occlusion, as well as acial expression changes. This is demonstrated in our experiments on various ace recognition benchmarks. We present in Figure 1 some matched aces in our experiments on real-lie photo collections to demonstrate how robust our distance metric is to all the dierent visual variations. The design o our distance metric ollows the undamental principle o the generalized Hausdor distance. However, unlike its previous applications in ace recognition, in which it was used to match edge points in the image space [6, 22], our distance metric is deined in the eature space, i.e., the space o the local image descriptors. Furthermore, we reinorce constraints to only allow each local image descriptor to be matched with its spatial neighbors in the image space. This spatial constraint is essential or matching two ace images as shown in our experiments. This way, we perorm explicit elastic matching and implicit partial matching in an eicient and robust way. Our main contributions are two old: 1) We propose a novel robust partial matching metric or ace recognition, which perorms explicit elastic matching and implicit partial matching and shows leading perormance when compared to the state-o-the-art. 2) We empirically show that a simple dierence o Gaussian ilter outperorms the more utilized sel-quotient image and brings signiicant lighting invariance. 2. Part-based Face Representation Figure 2 presents our pipeline used to extract the representation o a ace. As illustrated in the igure, given an input image containing a ace, we irst run a variant [31] o the Viola-Jones ace detector [25]. Next, the detected ace image patch is ed into an eye detector, a convolutional neural network regressor, which localizes the let and right eye locations. Our geometric rectiication step is conducted by warping the ace patch with a similarity transormation that places the two eyes into canonical positions in a patch o size w w (w = 128 in our settings). The geometrically rectiied ace patch I is then passed through a DoG ilter to obtain the photometrically rectiied ace patch Î, i.e., Î = I σ1 I σ2, (1) where I σ is produced by smoothing I with a Gaussian kernel G σ with standard deviation o σ pixels. Our empirical Face Detection Eye Detection Geometric Rectiication Photometric Rectiication Dense Overlapping Partitioning (a) (b) (c) (d) (e) { x x, x x, y y, y y} Descriptor Extraction () K K 2 1K 2K KK Face Representation (g) Figure 2. Our processing pipeline to extract the ace representation. investigations reveal that the combination o σ 1 =, which simply means no smoothing (i.e., I σ1 = I), and σ 2 =1is optimal. Ater the photometric rectiication step, we densely partition the ace image into N = K K overlapping patches o size n n (n =18in our settings), with both the horizontal and vertical step set to s pixels (s =2in our settings). To obtain our part-based representation, we compute a local image descriptor or each o the size n n small patches. We adopt a variant o the descriptor T2-S2-9 proposed by Winder and Brown [27], which essentially accumulates 4 dimensional histograms o rectiied image gradients { x x, x + x, y y, y + y } over 9 spatial pooling regions, as shown in Figure 2(). This descriptor provides excellent perormance when matching image patches subject to dierent lighting and geometric distortions. Ater we extract the local image descriptor or each o the local image patches, our inal ace representation is a matrix o N = K K local image descriptors, i.e., F =[ mn ], 1 m k, 1 n k (2) where mn corresponds to the descriptor extracted rom the patch at location (m s, n s) in pixel coordinates. Given such ace representations we now proceed to deine our distance metric. 3. Robust Elastic and Partial Matching Metric We would like to utilize both elastic and partial matching to handle the dierent visual variations in ace images. To calculate the distance between two ace representations F (1) and F (2), we irst perorm elastic matching or each local descriptor ij in F (1). This is done by inding that descriptor s best match among its spatially neighboring descriptors in F (2). More ormally, or each 1 i, j K, we have d( (1) ij )= min (1) ij (2) kl 1. (3) k,l: i s k s r, j s l s r 11 ij KK match each eature within region each min distance d 11 d 12 d ij d KK sort distances d 73 d 82 d 11 d 34 -th percentile as inal distance Figure 6. Illustration o our robust distance metric. Note however that the α-th percentile selection can be implemented with a quick selection algorithm, i.e. no explicit sorting is needed. where 1 stands or the L1 norm, and r is a parameter controlling how much elasticity we would allow during matching. We name the neighborhood deined by r as r- neighborhood. Then, let [d 1,d 2,,d αn,,d N ]=Sort ascend {d( (1) ij )}K i,j=1 (4) be the sorted distances o all d( (1) ij ) in ascending order, we deine d(f (1) F (2) )=d αn (5) as the directional distance rom F 1 to F 2, where α 1 is a control parameter or partial matching. Note that it is not needed to do an explicit sorting in the implementation, we can instead use a quick selection algorithm. The parameter α controls how much pixel degradation, partial occlusion or acial expression changes we expect in the ace images. Figure 6 visually illustrates how the distance metric is computed. r Faces per person (#persons with one ace = 124) 14 Faces per person (#persons with one ace = 61) 7 #persons (total = 269) #persons (total = 1294) #aces (total = 856) (a) FF #aces (total = 4933) (b) FF2 Figure 3. Histogram o the number o subjects owning a speciic number o aces #persons:269 - #aces:856 DoG Sel quotient #persons: #aces:4933 DoG Sel quotient (a) FF (b) FF2 Figure 4. Comparison o DoG ilter with sel-quotient image. Similarly, we can also deine d(f (2) F (1) ) and it is clear that most oten d(f (1) F (2) ) d(f (2) F (1) ). To make our distance symmetric, our inal robust distance metric is deined as D(F (1),F (2) )=max(d(f (1) F (2) ),d(f (2) F (1) )). (6) The ollowing property o the proposed distance metric is trivially realized. Property 3.1 I D(F (1),F (2) ) V, then at least α portion o the local image descriptors in F (1) (F (2) ) have a matched local descriptor in their r-neighborhood in F (2) (F (1) ) with distance less than V. Property 3.1 relects how the proposed distance metric perorms partial matching, i.e., the distance represents how well α portion o the ace images are matched. 4. Experiments We present extensive experiments to validate the quality o our ace distance metric. Our somewhat optimized C++ implementation executes the distance metric at a speed o.23ms or a pair o aces (on a machine with a single core 3.GHz CPU). This is excluding the time or extracting the ace representations. Below, we irst explore the parameter values o our distance metric on real-lie photo collections. Then, ixing the optimal parameter values, we perorm ex- 8 #persons:269 - #aces: #persons: #aces: step=1px step=2px step=3px step=4px step=1px step=2px step=3px step=4px #persons:269 - #aces: #persons: #aces: range=4px range=px range=all 4 35 range=4px range=px range=all #persons:269 - #aces:856 percentile=2% percentile=5% percentile=1% #persons: #aces:4933 percentile=2% percentile=5% percentile=1% (a) FF (b) FF2 Figure 5. From top row to bottom row, the ROC curves or dierent values o the parameters s, r, and α, respectively. Both on FF1 (Column (a)) and FF2 (Column (b)). tensive evaluation on benchmarks such as Labeled Faces in the Wild (LFW) [11], the Olivetti Research Laboratory (ORL) database [2], the Yale ace database [2], and the CMU PIE database [21]. To make the presentation more concise, we call them LFW, ORL, Yale, and PIE, respectively Parameter Exploration There are three important parameters in our distance metric that we need to explore in order to obtain optimal perormance. The irst one is the patch sampling step parameter s in the ace representation pipeline. It determines how many densely sampled patch descriptors we generate. The parameter s should be careully chosen to balance between speed and recognition quality. The second parameter is the elasticity range parameter r, which deines the r- neighborhood an individual descriptor would match against. The third parameter is the partial matching control parameter α. As discussed beore, α controls how much pixel degradation, partial occlusion, or acial expression changes we would expect. To explore the eects o these parameters, we collected two real-lie photo collections rom several people s amily & riends photo albums. We manually tagged all the aces that were detected by our ace detector in these photos. We call them amily & riends data set one and two, or in short FF1 and FF2, respectively. FF1 has a total o 856 aces o 269 subjects. While FF2 contains 4933 aces o 1294 subjects. The number o aces per subject is not uniorm in these data sets. Figure 3 presents the distribution o how many aces images each subject has. The horizontal axis o the igure displays the number o aces, the vertical axis represents the number o subjects that have that number o aces. For example, in Figure 3 (a), we can see that there are 124 subjects in FF1 who only have 1 ace, and in Figure 3 (b), we can see that there are around 25 subjects in FF2 who have 2 aces. With each data set, we split the aces hal and hal per subject into two subsets. One subset is used as the gallery set and the other is used as the probe set. The recognition rate is evaluated by 1 nearest neighbor classiication. A ROC curve is generated by picking a threshold on a ratio o the distances between the query ace to the best matched gallery ace and the distance between the query ace to the second best matched gallery ace with a dierent identity than the best matched one. I the ratio is below a certain threshold, then we accept the match, otherwise we do not accept the match. The horizontal and vertical axis present the number o alsely and correctly recognized aces among the accepted matches, respectively. Both subsets are served as gallery set once so the inal ROC curve is the aggregation o two tests. We have exhaustively run evaluations with all possible combinations o the two photometric rectiication methods, i.e., DoG ilter and the sel-quotient image [26], with dierent settings o the parameters s, r and α. Our conclusion is that using the DoG ilter, with s =2pixel, r =4pixel, and α =.2 is the optimal setting. To better understand this investigation, in Figure 4 and 5, we present comparative ROC curves by setting each parameter to be a dierent value than the optimal one, while keeping the other settings at their optimal values. We also present in these igures the ground truth ROC curve (red dotted line) that a perect ace recognizer would achieve. Such a recognizer would reuse to accept a match or aces that do not have a corresponding gallery ace, while correctly matching all the other aces which do have corresponding gallery aces. More speciically, Figure 4 presents the recognition ROC curves using either the DoG ilter (blue curve) or selquotient image (green curve) or photometric rectiication. We can clearly observe, on both FF1 and FF2, that using DoG achieves signiicantly better recognition rate than using sel-quotient image. Similarly, the irst, second, and third row in Figure 5 presents the eect o dierent values o the parameters s, r, and α, respectively. We can clearly observe how the recognition perormance degrades when the value is not the optimal one. We did note that overall s = 1 (blue curve) obtained slightly better perormance than s =2. However, s =2 saves almost 4 times the computation time or extracting the local descriptors and also makes the ace representation size 4 times smaller. Thereore, we choose to use s =2at the slight sacriice o recognition perormance. Another observation is that the optimal setting or α, an optimal value o.2, implies that the best matched 2% region in the ace images largely determines the identity o the ace. Last but not least, we would like to emphasize the importance o using the parameter r to control the amount o elasticity we would allow in the distance metric. As clearly observed in the second row o Figure 5, neither allowing no elasticity r =(blue line in the igure) nor allowing maximal elasticity r k(red line in the igure) is desired. The ormer does not handle pose variation well. While the latter does not take into consideration that ace images are very structured, especially ater they are roughly aligned. The matching o the local descriptors should not be over the whole ace image. In the ollowing, we exclusively use the identiied optimal settings or our distance metric to compare with the state-o-the-art on various ace recognition benchmarks Experiments on Recognition Benchmarks Face Recognition on LFW We irst present our recognition result on the LFW dataset [11]. We ollowed the test settings speciied in [11]. The evaluation o the quality o a ace recognition algorithm on LFW is to classiy a pair o aces as either match or nonmatch based on the distance between them. From dierent threshold settings, a ROC curve is generated. Figure
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks