A statistical test to detect tampering with lottery results

Lottery is a well known and popular game around the globe. Though the exact rules vary, the main idea is that the player chooses m integers without replacement among 1,...,n (we will refer to this as an m/n lottery system from now on), the order in
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A statistical test to detect tampering with lottery results Konstantinos Drakakis ∗ , Ken Taylor ∗ , Scott Rickard ∗∗ UCD CASL, University College Dublin, Belfield, Dublin 4, Ireland. The authors are also affiliated withthe School of Electronic, Electrical & Mechanical Engineering, University College Dublin, Ireland.,, Abstract.  We consider the minimal distance between the winning numbers in lottery as a statistical measureto detect tampering with lottery results, and apply it to the past results of eight national lotteries: EU (EuroMil-lions), Belgium, France, Greece, Ireland, Italy, Spain, and the UK. The results show no evidence of tampering,but they also show France to be on the borderline. 1. Introduction Lottery is a well known and popular game around the globe. Though the exact rules vary, the main idea isthat the player chooses  m  integers without replacement among 1 , . . . , n  (we will refer to this as an  m  /  n  lotterysystem from now on), the order in which the numbers are chosen being immaterial, and compares them againstthe uniformly random choice of an “oracle” (namely the lottery organizer). The player “wins” if and only if the two choices match. The player has normally to pay a fee in order to participate in the lottery, and, in return,winning is rewarded by a large amount of money.Real world lottery games are usually more complicated: for example, they usually make provisions forvarious sub-winning levels, whereby a player is considered a sub-winner if a subset of the player’s choicematches a subset of the oracle’s choice, even when the entire choices do not match completely. What ismore, those sub-winning levels may occasionally depend on (usually one or two) “bonus” numbers, chosenas a complement to the main choice of   m  integers, and possibly even chosen from within a different rangeof numbers 1 , . . . , n ′ ̸ =  n . Such variations are immaterial for the analysis carried out in this work and will beignored.Consider now an adversary who wants to substitute the oracle’s choice by his own choice and pass it asthe oracle’s choice. The main motivation for this is obviously the adversary’s personal gain: assuming theadversary is a player himself, the adversary wins by carrying out this attack successfully. Alternatively, a“Denial of Service” attack is conceivably possible, whereby the adversary has prior knowledge of all players’choices and substitutes a choice that corresponds to none of them, so that no winner is found. The latter attack has actually been rumored to take place systematically by the lottery organizers themselves in some lotteries.Indeed, when no winner is found in a certain lottery round, the prize money is added to the prize of the nextround and so on, a situation known as “jackpot”; artificially causing jackpots more frequently than they wouldnaturally occur helps increase the public’s interest, hence participation, in lottery, increasing also the lotteryorganizers’ profits.How does one detect, and hence deter, such attacks? The obvious way is to implement security measuresAnderson (1999), e.g., to ensure the oracle does not get tampered with. In this paper we propose an alternativemethod of verification that the oracle’s results are indeed random, based on statistics, which, though it is unableto verify that a specific choice of the oracle is indeed genuine, it can detect significant tampering carelesslycarried out, as the human perception of a uniformly random choice of   m  among the first  n  integers usually doesnot correspond to reality. 2. Data sets We retrieved publicly available data sets of lottery results over the past years for various countries from thefollowing webpages: EU: Belgium: France: Greece: Ireland:  Italy: Spain: UK: The choice of the countries analyzed was essentially forced by two constraining factors. More specifically,we could only retrieve data about national lotteries whose websites: •  were written in a language intelligible to either the authors or their collaborators and friends, and •  contained the said data in the appropriate format (e.g., in a spreadsheet).We did not find any lottery results posted on private websites.We were quite puzzled by the large number of national lottery websites that either did not contain pastresults at all or displayed them in a format that was unsuitable for batch harvesting and processing; this wasthe case with the various Russian lotteries, Malta, and Iceland, among others. We can only hope this paperwill motivate the various national lotteries to post past results in spreadsheet format for research purposes. 3. Statistics The statistical measure we are going to use is the minimal distance between integers within the choice. Thisrandom variable turns out to have a distribution that can be computed in closed form, which we subsequentlycompare, using the  χ  2 goodness of fit test, with the empirical histogram computed from the data. 3.1 The minimal distance Let  r  1 , . . . , r  m  be the  m  integers chosen within the range 1 , . . . , n ; the minimal distance  d   of the choice is definedas d   =  min 1 ≤ i <  j ≤ m | r   j  − r  i | . For example, for  n  =  49,  m  =  6, and for the choice 5, 15, 18, 23, 30, 44, it follows that d=18-15=3. Whenchoices are random,  d   is clearly a random variable; and when choices are made uniformly from within thesample space of  󰀨 nm 󰀩  m -tuples,  d   can be shown Drakakis (2007) to follow the probability distribution: P ( d   <  k  ) =  1 − 󰀨 n − ( k  − 1 )( m − 1 ) m 󰀩󰀨 nm 󰀩  ,  k   =  1 , . . . , ⌊  n − 1 m − 1 ⌋ . 3.2 The  χ  2 goodness of fit test  The  χ  2 goodness of fit test Cramer (1999) is one of the standard statistical tests used to verify whether twosets of data actually contain sample points drawn from the same probability distribution. We used the Matlabimplementation of the  χ  2 -test as given by the command  chi2gof .The test is relatively simple to describe. Consider two finite data sets consisting of real numbers and a setof   l  bins within which all of the data points fall. Let  A i  and  B i  be the number of points of the first and seconddata set contained in the  i th bin, respectively,  i  =  1 , . . . , l . When the data sets are sufficiently large, the randomvariable  χ  2 = l ∑ i = 1 (  A i −  B i ) 2  A i can be shown to follow approximately a  χ  2 -distribution with  l − 1 − s  degrees of freedom, which is, by defini-tion, the probability distribution of the sum of the squares of   l − 1 − s  independent Gaussian random variablesof mean 0 and variance 1. Here  s  represents the number of free parameters of the distribution from which thetwo data sets allegedly srcinate. For example, a Gaussian distribution has two free parameters, the mean andthe variance.The test hinges upon computing the probability, known as the  p -value  p , that this  χ  2 -distribution producesa value equal to the computed value  χ  2 or larger; this is actually a standard step for many statistical tests. The  p -value is subsequently compared to the desired  significance level α  , which reflects our estimate of the portionof the queue of the distribution we consider statistically impossible to occur: the hypothesis that the two datasets have been collected from the same probability distribution is rejected if   p  <  α  , and accepted otherwise.Note that the  p -value does not depend on  α  .A rule of thumb in specifying the number and boundaries of the bins used is that each bin must contain at  least 5 points from each data set. In our case, binning is straightforward as the data consists of integers, whilethe requirement of at least 5 points per bin is automatically accommodated by  chi2gof  by merging adjacentbins whenever necessary. As we determine the candidate distribution for the data by prior knowledge and thereare no parameters to estimate from the data, we set  s  =  0. 3.3 Popularity of choices The minimal distance seems to be a good candidate for a statistical test to verify the fairness of the lottery,precisely because it seems counterintuitive: for example, in a 6/49 system the probability that  d   =  1 is approx-imately 0 . 495198  ≈  50%. Intuition is of paramount importance in actual lottery games, as it skews probabilitydistributions: it is well known Boland and Pawitan (1999) that there are popular and unpopular choices, and,even if correlation within a choice is disregarded, popular and unpopular numbers. For example, when anaverage person is requested to simulate the random choice of   m  numbers out of 1 , . . . , n , chances are thatthe chosen integers will be much further spaced apart than in a typical random choice Boland and Pawitan(1999): substituted artificially “random” choices by an adversary will skew the distribution, unless, of course,the adversary takes care to simulate uniformly random choice.The same line of reasoning reveals that the public opinion skews the probability of winning; this has noeffect in our study, but it is an interesting observation nonetheless. For example, under uniformly randomchoices by the oracle, any choice is as likely to win as any other. The public, however, assumes erroneouslythat some well structured choices are less likely to be selected by the oracle. For example, people would tendto consider the choice 8, 16, 24, 32, 40, 48 in a 6/49 system extremely unlikely, because the numbers areequispaced by 8, and “the probability of a choice of numbers equispaced by 8 is very small”. The fallacy hereis that the probability of the class of choices in which a specific choice belongs is irrelevant; what is relevantis the probability of the specific choice, which is the same for all choices. When the time comes to play, then,pick a well structured choice such as the one just mentioned: it is as likely to win as any other, but, if it wins,it will be less likely that there will be other winners as well!Incidentally, in a 6/49 system, if 1 and 49 are considered consecutive as well, the probability that  d   =  1becomes 0 . 503203  >  0 . 5 Drakakis (2007). This suggests a simple almost fair game: players A and B agreeto play a game where 6 integers are repeatedly chosen uniformly at random out of 1 ,. . . , 49, and player Agives player B the right to choose a winning position between  d   =  1 and  d   >  1, to be kept unchanged throughthe game. Player A adopts the opposite winning position, and, whenever a player wins, the other pays him e 1. If player B chooses  d   >  1, player A casually mentions that, by the way, 1 and 49 are considered to beconsecutive, as if the numbers were ordered on a ring. This way, player A is guaranteed to make money in thelong run! 4. Results We have analyzed the past lottery results from 8 countries: EU, Belgium, France, Greece, Ireland, Italy, Spain,and the UK. The various relevant parameters of these data sets are shown in Table 1. A few remarks are inorder: •  EuroMillions is a more complicated lottery scheme than what we have so far described, as it is a5/50+2/9 system: five numbers are chosen within the range 1 , . . . , 50, while two more within 1 ,. . . , 9.We only consider the first five numbers here. •  Lottery rules in France changed on 2008/10/06 from a 6/49 to a 5/49 system. As the new system has notcurrently undergone enough rounds for reliable analysis, we considered exclusively the old system. •  Irish lottery rules changed quite frequently: lottery started on 1988/04/16 with a 6/36 system; on1992/08/22 it switched to a 6/39 system; on 1994/09/24 to a 6/42 system; and on 2006/11/04 to a6/45 system. •  Italy has operated the same 5/90 system since 1939/01/07. It is a quite peculiar system, as  n  =  90 isalmost double the  n  used in all other systems considered here. •  Spain operates two 6/49 lotteries, one on Thursday and Saturday and one on Monday, Tuesday, Wednes-day, and Friday: we analyzed both data sets, labeled as Spain(TS) and Spain(D), respectively, as well asthe combined data set, labeled as Spain(C).  Table 1: Information about the various national lotteries: the parameters  n  and  m , the number of roundsconsidered in the data set, and the  p -value of the  χ  2 -testCountry  n m  Rounds  p EU (EuroMillions) 50 5 258 0.2682Belgium 42 6 2425 0.2604France 49 6 4858 0.0662Greece 49 6 1733 0.6370Ireland(36) 36 6 343 0.7235Ireland(39) 39 6 219 0.9248Ireland(42) 42 6 2341 0.9630Ireland(45) 45 6 677 0.4620Italy 90 5 45221 0.2010Spain(TS) 49 6 2096 0.7329Spain(D) 49 6 4347 0.9103Spain(C) 49 6 6443 0.8510UK 49 6 1374 0.1812As is clear from Table 1, with a significance level of   α   =  0 . 05, our statistical analysis cannot reject for anycountry the hypothesis that the national lottery results have not been tampered with: the histograms appear tobe within the error tolerance we have set for the deviation from the theoretical probability distribution, andactually well within this tolerance... with the single exception of France, whose  p  =  0 . 0662 is very close to0.05. Had we used the (relatively uncommon, but certainly not unheard of) value of   α   =  0 . 1, France wouldhave failed the test.Table 2 compares the empirical versus the theoretically expected histogram of   d   for each country. Theproblem with the French data set appears there very clearly: there were significantly more winning choiceswith  d   =  6 and 7 than theoretically expected, taking into account that these are rare events to begin with. 5. Summary and conclusion The minimal distance between any two numbers in the set of   m  integers chosen uniformly at random withinthe range of the first  n  integers 1 , . . . , n  without repetition is a random variable which follows a probability dis-tribution known in closed form. This distribution is quite counterintuitive: it has been shown that the humanperception about such a random choice does not correspond to reality. As this random choice arises in thegame of lottery, we conclude that the minimal distance just described can be used as a statistical measure todiscover evidence of tampering with the lottery results: an adversary carelessly mimicking a “uniformly ran-dom” choice, in order to substitute the lottery organizers’ choice by his own, is bound to skew the distributionof the minimal distance in the long run.We computed the histogram of the minimal distance for the past results of 8 countries, namely the EU(EuroMillions), Belgium, France, Greece, Ireland, Italy, Spain, and the UK, and we compared it to the theoret-ically expected histogram using the  χ  2 -test. Using a significance level of 0.05 we saw that all countries passthe test: the results do not support the conclusion that extensive careless tampering with the lottery results hasoccurred. France, however, just barely passes the test, and clearly fails under the significance level of 0.1: thereason was identified as a significantly larger than expected number of choices with big minimal distances,which is consistent with a human’s erroneous perception of a random choice. References Anderson R. (1999) How to Cheat at the Lottery (or, Massively Parallel Requirements Engineering).  Computer Security Applications Conference  (also available online at ).Boland Ph. and Pawitan Y. (1999) Trying To Be Random in Selecting Numbers for Lotto.  Journal of Statistics Education , 7(3) (also available online at ),Cramer H. (1999) Mathematical Methods of Statistics.  Princeton University Press , 19th printing.Drakakis K. (2007) A note on the appearance of consecutive numbers amongst the set of winning numbers inLottery.  Facta Universitatis: Mathematics and Informatics , 22(1), 1–10.Table 2: A comparison of the empirical vs the expected histogram of the lottery results in each of the countriesconsidered: (a) EU, (b) Belgium, (c) France, (d) Greece, (e) Ireland(42), (f) Italy, (g) Spain(C), (h) UK(a) EU d   Obs. Exp.1 96 90.02512 60 62.59413 50 41.97044 26 26.92145 8 16.3386 6 9.23427 5 4.74758 2 2.13829 2 0.7902410 0 0.2106211 0 0.029607(b) Belgium d   Obs. Exp.1 1310 1350.30862 691 655.78083 274 282.07244 118 102.34655 25 28.77076 7 5.2947 0 0.42390692(c) France d   Obs. Exp.1 2423 2405.67412 1295 1318.88553 653 666.21864 306 302.19915 112 118.26386 55 37.33327 14 8.38258 0 1.0141(d) Greece d   Obs. Exp.1 883 858.17892 461 470.48763 234 237.6614 96 107.80385 46 42.18846 8 13.31797 5 2.99038 0 0.36175(e) Ireland(42) d   Obs. Exp.1 1307 1303.5352 643 633.06513 265 272.30164 96 98.80135 24 27.77416 5 5.11067 1 0.40922(f) Italy d   Obs. Exp.1 9614 9386.95152 7792 7759.17413 6298 6352.90624 5113 5147.07515 4168 4121.66186 3102 3257.7017 2533 2537.28098 1932 1943.54359 1450 1460.684210 1106 1073.952311 768 769.650612 529 535.135413 343 358.816914 205 230.158715 132 139.678116 78 78.946217 37 40.587418 11 18.280119 8 6.75620 2 1.800621 0 0.25311835(g) Spain(C) d   Obs. Exp.1 3207 3190.56362 1748 1749.19293 881 883.58314 393 400.79645 158 156.84936 49 49.51377 6 11.11748 1 1.3449(h) UK d   Obs. Exp.1 648 680.40272 395 373.02363 189 188.42824 93 85.47175 29 33.44896 17 10.5597 3 2.37088 0 0.28681
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks